Add Post   Gallery
This is a community portal. Sign up on the left and start posting about analytics and visualization of security data.



Visual Analytics Workshop at BlackHat 2016


BlackHat 2016 - Las Vegas

Big Data is Getting Bigger - Visualization is Getting Easier - Learn How!
Dates: July 30-31 & August 1-2
Location: Las Vegas, USA

SIGN UP today!


Big data and security intelligence are the two very hot topics in security. We are collecting more and more information from both the infrastructure, but increasingly also directly from our applications. This vast amount of data gets increasingly hard to understand. Terms like map reduce, hadoop, spark, elasticsearch, data science, etc. are part of many discussions. But what are those technologies and techniques? And what do they have to do with security analytics/intelligence? We will see that none of these technologies are sufficient in our quest to defend our networks and information. Data visualization is the only approach that scales to the ever changing threat landscape and infrastructure configurations. Using big data visualization techniques, you uncover hidden patterns of data, identify emerging vulnerabilities and attacks, and respond decisively with countermeasures that are far more likely to succeed than conventional methods. Something that is increasingly referred to as hunting. The attendees will learn about log analysis, big data, information visualization, data sources for IT security, and learn how to generate visual representations of IT data. The training is filled with hands-on exercises utilizing the DAVIX 2014 live CD.

What's New?

The workshop is being heavily updated over the next months. Check back here to see a list of the new topics covered:

  • Elastic Stack - Kibana 5, ElasticSearch 5 - a completely new version!
  • Time-series databases - need to collect metrics? Don't just stuff them into Hadoop!
  • The cloud - What's happening in the cloud with logging? CloudTrail, CloudWatch, etc.
  • Big Data - How do you navigate the ever growing landscape of Hadoop and big data technologies? Tajo, Apache Arrow, Apache Drill, Druid, PrestoDB from Facebook, Kudu, etc. We'll sort you out.


The syllabus is not 100% fixed yet. Stay tuned for some updates.

Day 1:

Log Analysis

  • Data Sources Discussion such as PCAP, Firewall, IDS, Threat Feeds, etc.
  • Data Analysis and Visualization Linux (DAVIX)
  • Log Data Processing (CSVKit, ...)

Log Management, SIEM, and Big Data

  • Log Management and SIEM Overview
  • LogStash (Elastic Stack) and Moloch
  • Big Data - Hadoop, Spark, ElasticSearch, Hive, Impala

Day 2:


  • Information Visualization History
  • Visualization Theory
  • Data Visualization Tools and Libraries (e.g., Mondrian, Gephi, AfterGlow, Graphiti)
  • Visualization Resources

Security Visualization Use-Cases

  • Perimeter Threat
  • Network Flow Analysis
  • Firewall Visualization
  • IDS/IPS Signature Analysis
  • Vulnerability Scans
  • Proxy Data
  • User Activity
  • Host-based Data Analysis

Sample of Tools and Techniques

Tools to gather data:

  • argus, nfdump, nfsen, and silk to process traffic flows
  • snort, bro, suricata as intrusion detection systems
  • p0f, npad for passive network analysis
  • iptables, pf, pix as examples of firewalls
  • OSSEC, collectd, graphite for host data

We are also using a number of visualization tools to analyze example data in the labs:

  • graphviz, tulip, cytoscape, and gephi
  • afterglow
  • treemap
  • mondrian, ggobi

Under the log management section, we are going to discuss:

  • rsyslog, syslog-ng, nxlog
  • logstash as part of the elastic stack, moloch
  • commercial log management and SIEM solutions

The section on big data is covering the following:

  • hadoop (HDFS, map-reduce, HBase, Hive, Impala, Zookeper)
  • search engines like: elastic search, Solr
  • key-value stores like MongoDB, Cassandra, etc.
  • OLAP and OLTP
  • The Spark ecosystem


Raffael Marty is one of the world's most recognized authorities on security data analytics and visualization. Raffy is the founder and CEO of pixlcloud, a next generation visual analytics platform. With a track record at companies including IBM Research and ArcSight, he is thoroughly familiar with established practices and emerging trends in big data analytics. He has served as Chief Security Strategist with Splunk and was a co-founder of Loggly, a cloud-based log management solution. Author of Applied Security Visualization and frequent speaker at academic and industry events, Raffy is a leading thinker and advocate of visualization for unlocking data insights. For more than 14 years, Raffy has worked in the security and log management space to help Fortune 500 companies defend themselves against sophisticated adversaries and has trained organizations around the world in the art of data visualization for security. Zen meditation has become an important part of Raffy's life, sometimes leading to insights not in data but in life.

Introduction Video

screenshot of ip traffic in deep node

screenshot of ip traffic in deep node

Visualizing Live Streams in 3D/VR

We've created a free tool for visualizing live streams of network traffic, using JMonkeyEngine (Java 3D gaming engine).

Please take a look at - we would very much appreciate feedback from this community.

Rather than focusing on mining of static datasets, this tool focuses on seeing activity over time, and controlling the timeline so that a human can connect the dots. Here's a link to information on the concept behind the visualization style.

As for the screenshot, this video explains what you're looking at.

screenshot of ip traffic in deep node

Youtube video using Afterglow, twopi and Nginx logs.

I attended Visual Analytics Workshop last year at BlackHat and have gotten endless use from afterglow, neato, etc to make interesting visualizations.

Here is a short youtube video I put together, with attack data taken from Nginx logs:

(Music is by a local San Francisco band: Vetiver)

DNS Mapping

DNS Mapping

Over at I created a DNS recon tool that generates a DNS map on the fly using 80+GB of DNS data from the project. This map is the domain.

MyDoom botnet

MyDoom botnet

This graph visualization shows the propagation of malware through a deliberately infected computer network. Twelve machines in the network were infected to see how the traffic spread to other machines. Over 7800 machines were included in the dataset.
All network in a single chart. Yellow links indicate benign traffic; red links indicate traffic with at least 1 infected packet. Nodes are sized by volume of traffic.
Data taken from the MyDoom-A.tar.gz, available here
Image generated with KeyLines.

Botnet activity

Botnet activity

Visualization showing botnet activity geographically. The time bar at the bottom shows temporal trends and filters traffic shown on the map.
Data from
Image generated using KeyLines.

Botnet traffic

Botnet traffic

Using a dataset from, this graph shows botnet traffic between 5000 computers at the University of San Diego. Different colors were used to indicate different protocols. Nodes represent computers and were sized by degree. Edges represent packets, weighted by packet size. Image generated using KeyLines.

Visual Analytics Needs a Strong Data Backend

Visual Analytics, especially the exploration of data requires a scalable and flexible data backend. It is not uncommon that gigabytes, maybe even terabytes of data need to be queried for a specific analytics tasks. Furthermore, the more context around log data is available, the more expressive the data gets and the deeper the insight that can be discovered in the data. How can we gather all that context and combine it with both network-based, as well as host-based data? What are the data access requirements? How can we run data mining algorithms, such as clustering across all of the data? What kind of data store do we need for that? Do we need a search engine as a backend? Or a columnar data store?

I recently wrote a paper about the topic of a security data lake that is a concept of a data backend enabling a variety of processing and access use-cases. A short introduction to the topic is available as well.

Maybe at a later point in time, I will try to address the topic of data science and techniques, as well as workflows to make all that big data actionable. How do you take a terabyte of data and find actual insights? Just dropping that data into a network graph visualization is not going to help. You need a bit more to make that happen. But again, more on that later.

If you want to learn more about how to visualize and analyze terabytes of data, attend the Visual Analytics Workshop at BlackHat 2015 in Las Vegas.

Again, here is where you download the paper.

Linked Graphs Showing DNS Traffic on the Network

Linked Graphs Showing DNS Traffic on the Network

This is a screenshot from a tool called Mondrian where we show network traffic. DNS traffic in particular. The bar charts show the breakdown of sources, destinations, and ports. The parallel coordinate shows all three variables at the same time. The red parts highlight an interesting visual pattern. What is it?