This is where you can start discussions around security visualization topics.

NOTE: If you want to submit an image, post it in the graph exchange library!

You might also want to consider posting your question or comment on the SecViz Mailinglist!

Discussion Entries

Visual Analytics Needs a Strong Data Backend

Visual Analytics, especially the exploration of data requires a scalable and flexible data backend. It is not uncommon that gigabytes, maybe even terabytes of data need to be queried for a specific analytics tasks. Furthermore, the more context around log data is available, the more expressive the data gets and the deeper the insight that can be discovered in the data. How can we gather all that context and combine it with both network-based, as well as host-based data? What are the data access requirements? How can we run data mining algorithms, such as clustering across all of the data? What kind of data store do we need for that? Do we need a search engine as a backend? Or a columnar data store?

I recently wrote a paper about the topic of a security data lake that is a concept of a data backend enabling a variety of processing and access use-cases. A short introduction to the topic is available as well.

Maybe at a later point in time, I will try to address the topic of data science and techniques, as well as workflows to make all that big data actionable. How do you take a terabyte of data and find actual insights? Just dropping that data into a network graph visualization is not going to help. You need a bit more to make that happen. But again, more on that later.

If you want to learn more about how to visualize and analyze terabytes of data, attend the Visual Analytics Workshop at BlackHat 2015 in Las Vegas.

Again, here is where you download the paper.

Visual Analytics Workshop at BlackHat 2015


BlackHat 2015 - Las Vegas

Big Data is Getting Bigger - Visualization is Getting Easier - Learn How!
Dates: AUGUST 1,2 & 3,4
Location: Las Vegas, USA



Big data and security intelligence are the two very hot topics in security. We are collecting more and more information from both the infrastructure, but increasingly also directly from our applications. This vast amount of data gets increasingly hard to understand. Terms like map reduce, hadoop, spark, elasticsearch, data science, etc. are part of many discussions. But what are those technologies and techniques? And what do they have to do with security analytics/intelligence? We will see that none of these technologies are sufficient in our quest to defend our networks and information. Data visualization is the only approach that scales to the ever changing threat landscape and infrastructure configurations. Using big data visualization techniques, you uncover hidden patterns of data, identify emerging vulnerabilities and attacks, and respond decisively with countermeasures that are far more likely to succeed than conventional methods. Something that is increasingly referred to as hunting. The attendees will learn about log analysis, big data, information visualization, data sources for IT security, and learn how to generate visual representations of IT data. The training is filled with hands-on exercises utilizing the DAVIX 2014 live CD.

What's New?

The workshop has undergone quite some updates. Here are some highlights:

  • A general overhaul of the data source section with a focus on calling out pitfalls and lesser known features of different sources. It also finally features the topic of threat feeds as an important data source.
  • The log processing section now covers the CSVKit to help process data on the command line. SQL on CSV files anyone?
  • Log management has been tightened up to contain the basics of log management and SIEM. The ELK Stack and Moloch are two tools we look at in depth .
  • The discussion of LogStash has expanded with more information on how to run it, example configurations, and an in-depth exercise where we even use the LogStash APIs to query data via REST
  • Big Data has gotten a revamp to a more up to date version with all the things that happened in the last year. This section discusses the new concept of the bid data lake for security also.
  • If there is time and the class is interested, we will embark on a quick journey into data science with R, where we also run through an exercise.
  • The visualization part has undergone some reorganization. Previously, visualization tools and visualization were separate sections. Now they are much more mixed and the visualization section itself has many more security examples to drive the concepts home.
  • One of the visualization tools, Gephi is discussed in depth and we show how to go through a network traffic analysis, which will also show the short comings that many visualization tools have.
  • When talking about dashboards, we will also look at the topic of the SOC dashboard.
  • And finally the security visualization use-cases across the training have been increased in volume and detail. Have you tried to use an inverse count treemap for IDS signature tuning?


Day 1:

Log Analysis

  • Data Sources Discussion such as PCAP, Firewall, IDS, Threat Feeds, etc.
  • Data Analysis and Visualization Linux (DAVIX)
  • Log Data Processing (CSVKit, ...)

Log Management and SIEM

  • Log Management and SIEM Overview
  • LogStash (ELK Stack) and Moloch
  • Big Data - Hadoop, Spark, ElasticSearch, Hive, Impala

Day 2:


  • Information Visualization History
  • Visualization Theory
  • Data Visualization Tools and Libraries (e.g., Mondrian, Gephi, AfterGlow, Graphiti)
  • Visualization Resources

Security Visualization Use-Cases

  • Perimeter Threat
  • Network Flow Analysis
  • Firewall Visualization
  • IDS/IPS Signature Analysis
  • Vulnerability Scans
  • Proxy Data
  • User Activity
  • Host-based Data Analysis

Sample of Tools and Techniques

Tools to gather data:

  • argus, nfdump, nfsen, and silk to process traffic flows
  • snort, bro, suricata as intrusion detection systems
  • p0f, npad for passive network analysis
  • iptables, pf, pix as examples of firewalls
  • OSSEC, collectd, graphite for host data

We are also using a number of visualization tools to analyze example data in the labs:

  • graphviz, tulip, cytoscape, and gephi
  • afterglow
  • treemap
  • mondrian, ggobi

Under the log management section, we are going to discuss:

  • rsyslog, syslog-ng, nxlog
  • logstash as part of the ELK stack, moloch
  • commercial log management and SIEM solutions

The section on big data is covering the following:

  • hadoop (HDFS, map-reduce, HBase, Hive, Impala, Zookeper)
  • search engines like: elastic search, Solr
  • key-value stores like MongoDB, Cassandra, etc.
  • OLAP and OLTP
  • The Spark ecosystem



Raffael Marty is one of the world's most recognized authorities on security data analytics and visualization. Raffy is the founder and CEO of pixlcloud, a next generation visual analytics platform. With a track record at companies including IBM Research and ArcSight, he is thoroughly familiar with established practices and emerging trends in big data analytics. He has served as Chief Security Strategist with Splunk and was a co-founder of Loggly, a cloud-based log management solution. Author of Applied Security Visualization and frequent speaker at academic and industry events, Raffy is a leading thinker and advocate of visualization for unlocking data insights. For more than 14 years, Raffy has worked in the security and log management space to help Fortune 500 companies defend themselves against sophisticated adversaries and has trained organizations around the world in the art of data visualization for security. Zen meditation has become an important part of Raffy's life, sometimes leading to insights not in data but in life.

How To Use Your Screens in the Security Operations Center

I just wrote a short blog post about how to get value and use out of your large SOC (security operations center) screens. I have seen too many SOCs that have CNN running on the screens and whenever customers or executives walk in, they quickly switch over to some kind of meaningless world maps that look kind of sexy, but have no security relevant purpose at all. From a security analyst's perspective, it is really not very useful to know from where across the globe most of the network packets are hitting our network. All those sexy looking attack maps really don't have that much value. Well, they can be sexy and provoke conversations. But there are ways to get more out of your expensive screens. Read how:

Dashboards in the Security Opartions Center (SOC)

DNS Behavior - Puzzle

I need your help!

I am looking through an old log file of a server with IP address that I operated in 2002. The machine was running SuSE linux 6.0 (i386). It ran bind (9.1.0), sendmail (8.11.2), and was mainly used as a SMTP server to send mails for a number of users. I found these logs from my pf firewall that was in front of the box:

Oct 21 06:06:58.096785 rule 57/0(match): pass in on xl1: > 2520 [1au][|domain] (DF)
Oct 21 06:06:58.401472 rule 57/0(match): pass in on xl1: > 16979 [1au][|domain] (DF)
Oct 21 06:07:00.407500 rule 57/0(match): pass in on xl1: > 47817 [1au][|domain] (DF)
Oct 21 06:07:02.417637 rule 57/0(match): pass in on xl1: > 34849[|domain] (DF)
Oct 21 06:07:11.298946 rule 57/0(match): pass in on xl1: > 20792 [1au] MX? www.com.ar. (39) (DF)
Oct 21 06:07:11.477536 rule 57/0(match): pass in on xl1: > 21611 [1au] MX? www.com.ar. (39) (DF)
Oct 21 06:07:11.804894 rule 57/0(match): pass in on xl1: > 21263 [1au] MX? www.com.ar. (39) (DF)
Oct 21 06:15:19.667120 rule 57/0(match): pass in on xl1: > 60127 [1au] MX? sticksandstones.co.uk. (50) (DF)
Oct 21 06:15:19.691967 rule 57/0(match): pass in on xl1: > 58792 [1au] MX? sticksandstones.co.uk. (50) (DF)
Oct 21 06:20:00.844472 rule 57/0(match): pass in on xl1: > 29396 MX? about.com. (27) (DF)
Oct 21 06:20:00.859900 rule 57/0(match): pass in on xl1: > 14698[|domain] (DF)
Oct 21 06:20:01.021076 rule 57/0(match): pass in on xl1: > 13317 [1au] MX? about.com. (38) (DF)
Oct 21 06:20:01.070317 rule 57/0(match): pass in on xl1: > 14337 [1au] MX? mx13.crazed.com. (48) (DF)
Oct 21 06:21:02.121813 rule 57/0(match): pass in on xl1: > 34672 MX? poetic.com. (28) (DF)
Oct 21 06:21:02.297033 rule 57/0(match): pass in on xl1: > 25081 [1au] MX? poetic.com. (39) (DF)

As you can see, there are a number of DNS lookups. They span a total of about two weeks and ALL of them are using a source port of 1030. Why 1030? Why is it fixed all the time? Shouldn't the source port change?

There are other logs intermixed, where DNS lookups happen from other source ports:

Oct 13 20:46:03.915405 rule 184/0(match): pass in on xl1: > 60676+[|domain]

Those are normal and I completely understand those. Any ideas why all these others have 1030 as a source port?

DAVIX 2014 - Released

Visual Analytics Workshop - Link Collection Part VII - Visualization Tools

NEWS UPDATE! Next Visual Analytics Workshop to be held at BlackHat US in August. Join!

The section probably most anticipated during the Visual Analytics Workshop is probably the one where we get hands-on exposure with a number of visualization tools. We look at both actual tools and programming libraries. Here we go:

These are the tools and libraries we discuss during the workshop. Obviously, there are many more libraries and tools that I like to use in my daily work. But that will be a separate post at some point in the future.

Looking for the previous list of links for the workshop?

- Introductionary Links
- Data Sources
- Data Processing
- Log Management and SIEM
- Big Data
- Visualization

Wanna know more about the visualization workshop? Email me or visit http://pixlcloud.com/training

Visual Analytics Workshop - Link Collection Part VI - Visualization

NEWS UPDATE! Next Visual Analytics Workshop to be held at BlackHat US in August. Join!

Next up: Visualization, the sixth module of the Visual Analytics Workshop. Note, this section is mostly content from books and not related to many Web-based resources that could be linked here. Hence kind of a short collection.

Looking for the previous list of links for the workshop?

- Introductionary Links
- Data Sources
- Data Processing
- Log Management and SIEM
- Big Data

Wanna know more about the visualization workshop? Email me or visit http://pixlcloud.com/training

Next up: Visualization Tools

Workshop: Big Data Visualization for Security

I had the pleasure of attending the Underground Economy Conference this year in Bucharest, Romania. I ran a 90 minute workshop on big data and visualization. The workshop covered a number of tools, such as:

Firewall Log in Gephi

Here are the slides from the workshop [Well, almost all of them. Having attended the workshop, you will have seen some more]. In addition, you can download the DAVIX image that you need for the exercise.

Visual Analytics Workshop - Link Collection Part V - Big Data

NEWS UPDATE! Next Visual Analytics Workshop to be held at BlackHat US in August. Join!

This next module of the Visual Analytics Workshop is about Big Data. And here are the links that show up during this section. Keep in mind that especially this module is constantly evolving and has in the last months. New sections and links will be added to the training class very frequently.

Looking for the previous list of links for the workshop?

- Introductionary Links
- Data Sources
- Data Processing
- Log Management and SIEM

Wanna know more about the visualization workshop? Email me or visit http://pixlcloud.com/training