Using a dataset from http://www.uvic.ca/engineering/ece/isot/datasets/index.php, this graph shows botnet traffic between 5000 computers at the University of San Diego. Different colors were used to indicate different protocols. Nodes represent computers and were sized by degree. Edges represent packets, weighted by packet size. Image generated using KeyLines.
Visual Analytics, especially the exploration of data requires a scalable and flexible data backend. It is not uncommon that gigabytes, maybe even terabytes of data need to be queried for a specific analytics tasks. Furthermore, the more context around log data is available, the more expressive the data gets and the deeper the insight that can be discovered in the data. How can we gather all that context and combine it with both network-based, as well as host-based data? What are the data access requirements? How can we run data mining algorithms, such as clustering across all of the data? What kind of data store do we need for that? Do we need a search engine as a backend? Or a columnar data store?
I recently wrote a paper about the topic of a security data lake that is a concept of a data backend enabling a variety of processing and access use-cases. A short introduction to the topic is available as well.
Maybe at a later point in time, I will try to address the topic of data science and techniques, as well as workflows to make all that big data actionable. How do you take a terabyte of data and find actual insights? Just dropping that data into a network graph visualization is not going to help. You need a bit more to make that happen. But again, more on that later.
If you want to learn more about how to visualize and analyze terabytes of data, attend the Visual Analytics Workshop at BlackHat 2015 in Las Vegas.
Again, here is where you download the paper.
This is a screenshot from a tool called Mondrian where we show network traffic. DNS traffic in particular. The bar charts show the breakdown of sources, destinations, and ports. The parallel coordinate shows all three variables at the same time. The red parts highlight an interesting visual pattern. What is it?
I just wrote a short blog post about how to get value and use out of your large SOC (security operations center) screens. I have seen too many SOCs that have CNN running on the screens and whenever customers or executives walk in, they quickly switch over to some kind of meaningless world maps that look kind of sexy, but have no security relevant purpose at all. From a security analyst's perspective, it is really not very useful to know from where across the globe most of the network packets are hitting our network. All those sexy looking attack maps really don't have that much value. Well, they can be sexy and provoke conversations. But there are ways to get more out of your expensive screens. Read how:
I need your help!
I am looking through an old log file of a server with IP address 184.108.40.206 that I operated in 2002. The machine was running SuSE linux 6.0 (i386). It ran bind (9.1.0), sendmail (8.11.2), and was mainly used as a SMTP server to send mails for a number of users. I found these logs from my pf firewall that was in front of the box:
Oct 21 06:06:58.096785 rule 57/0(match): pass in on xl1: 220.127.116.11.1030 > 18.104.22.168.53: 2520 [1au][|domain] (DF)
Oct 21 06:06:58.401472 rule 57/0(match): pass in on xl1: 22.214.171.124.1030 > 126.96.36.199.53: 16979 [1au][|domain] (DF)
Oct 21 06:07:00.407500 rule 57/0(match): pass in on xl1: 188.8.131.52.1030 > 184.108.40.206.53: 47817 [1au][|domain] (DF)
Oct 21 06:07:02.417637 rule 57/0(match): pass in on xl1: 220.127.116.11.1030 > 18.104.22.168.53: 34849[|domain] (DF)
Oct 21 06:07:11.298946 rule 57/0(match): pass in on xl1: 22.214.171.124.1030 > 126.96.36.199.53: 20792 [1au] MX? www.com.ar. (39) (DF)
Oct 21 06:07:11.477536 rule 57/0(match): pass in on xl1: 188.8.131.52.1030 > 184.108.40.206.53: 21611 [1au] MX? www.com.ar. (39) (DF)
Oct 21 06:07:11.804894 rule 57/0(match): pass in on xl1: 220.127.116.11.1030 > 18.104.22.168.53: 21263 [1au] MX? www.com.ar. (39) (DF)
Oct 21 06:15:19.667120 rule 57/0(match): pass in on xl1: 22.214.171.124.1030 > 126.96.36.199.53: 60127 [1au] MX? sticksandstones.co.uk. (50) (DF)
Oct 21 06:15:19.691967 rule 57/0(match): pass in on xl1: 188.8.131.52.1030 > 184.108.40.206.53: 58792 [1au] MX? sticksandstones.co.uk. (50) (DF)
Oct 21 06:20:00.844472 rule 57/0(match): pass in on xl1: 220.127.116.11.1030 > 18.104.22.168.53: 29396 MX? about.com. (27) (DF)
Oct 21 06:20:00.859900 rule 57/0(match): pass in on xl1: 22.214.171.124.1030 > 126.96.36.199.53: 14698[|domain] (DF)
Oct 21 06:20:01.021076 rule 57/0(match): pass in on xl1: 188.8.131.52.1030 > 184.108.40.206.53: 13317 [1au] MX? about.com. (38) (DF)
Oct 21 06:20:01.070317 rule 57/0(match): pass in on xl1: 220.127.116.11.1030 > 18.104.22.168.53: 14337 [1au] MX? mx13.crazed.com. (48) (DF)
Oct 21 06:21:02.121813 rule 57/0(match): pass in on xl1: 22.214.171.124.1030 > 126.96.36.199.53: 34672 MX? poetic.com. (28) (DF)
Oct 21 06:21:02.297033 rule 57/0(match): pass in on xl1: 188.8.131.52.1030 > 184.108.40.206.53: 25081 [1au] MX? poetic.com. (39) (DF)
As you can see, there are a number of DNS lookups. They span a total of about two weeks and ALL of them are using a source port of 1030. Why 1030? Why is it fixed all the time? Shouldn't the source port change?
There are other logs intermixed, where DNS lookups happen from other source ports:
Oct 13 20:46:03.915405 rule 184/0(match): pass in on xl1: 220.127.116.11.63994 > 18.104.22.168.53: 60676+[|domain]
Those are normal and I completely understand those. Any ideas why all these others have 1030 as a source port?
After 6 years, we finally have a new version of DAVIX available. It was about time to get this distro updated to modern standards. Ubuntu Server 13.10 as the base OS, github scripts to update your own systems, a Virtual Image you can download, new tools, etc.
DAVIX, a live CD for data analysis and visualization, brings the most important free tools for data processing and visualization to your desk. Avoid the hassle of installing an operating system or struggling to build and compile the necessary tools to get started with visualization. You can completely dedicate your time to data analysis.
Now go download it!
This is the last post in the series of links for the Visual Analytics Workshop. This section lists a few resources on security and visualization.
Looking for the rest of links for the workshop?
The section probably most anticipated during the Visual Analytics Workshop is probably the one where we get hands-on exposure with a number of visualization tools. We look at both actual tools and programming libraries. Here we go:
These are the tools and libraries we discuss during the workshop. Obviously, there are many more libraries and tools that I like to use in my daily work. But that will be a separate post at some point in the future.
Looking for the previous list of links for the workshop?