This is my third try at graphing SSHd logs from honeynet.org's Challenge 5. I'm in the process of switching from Perl to Python, so I used Python this time along with Chart Director. However, this is a blatant knock off of Nathan Yau's much better chart: http://flowingdata.com/2011/06/13/largest-data-breaches-of-all-time/. I was just curious to see if I could recreate it with Python and Chart Director using different data.
If you're not familiar with SSHd logs:
"F" stands for "Failed" meaning the wrong password was tried.
"I" for "Invalid" meaning the wrong username was tried.
"A" for "Accepted" meaning the login attempt succeeded.
The numbers show how many "F", "I" or "A" were caused by the IP.
This is my second try at graphing SSHd logs from honeynet.org's Challenge 5. Perl and Chart Director were used to make this chart. The chart has a lot less "chart junk," and is much easier to understand the percentages than my first attempt. Red means there was at least one successful login, while blue means all login attempts failed.
This weekend I spend my time at Data Insight SF. It was a competition where teams were given a data set to visualize. The outcomes were pretty impressive (I might post pictures of the results later). While the teams were working on their projects, various people taught workshops. I was one of them and I talked about the Visualization Lifecycle.
Join us for data in sight: making the transparent visual, a hands-on data visualization competition held June 24-26 at Adobe Systems, Inc.’s office in San Francisco. Coders, programmers, developers, designers, scientists – anyone who believes that data is divine and has ideas for bringing it to life – are invited to join in the fun.
The program begins Friday evening with a session introducing the data sets and tools and a chance to form teams. Saturday kicks off with inspirational talks by data visualization experts from the Netherlands and Switzerland — Dutch graphic designers from Catalogtree and LUST and Switzerland-based interaction designers from Interactive Things. Then it’s down to business, as you roll up your sleeves and get hacking on a data visualization of your own.
Awards will be presented at the end of the weekend for winning projects in the following categories: best dynamic presentation, best fusion of multiple data sets, most actionable, most aesthetically pleasing, most creative, and the ever popular People's Choice award! (Bonus points for the best use of Swiss or Dutch data.)
More details online at www.datainsightsf.com
Plotted exit nodes on maps. Full details on page of Tor Exit Nodes Visualized. Links to images on page are dynamic and updated daily. Uses Google Dynamic Map chart visualization tool.
Mid March I taught a Log Analysis and Visualization class in Taipei, Taiwan. I had a total of about 35 students spread over two classes, each of them lasting for two days.
The first part of the workshop focused on the application and use of log analysis with a number of tools. We looked at Splunk with topics like advanced searches, lookups, and alerting. We then looked at Loggly and learned how to use the logging service to analyze logs and build mashups against it.
The remainder of the workshop explored the world of data analysis and visualization. Using today's state-of-the-art data analysis and visualization techniques, we looked at how we can gain a far deeper understanding of what's happening in our networks. How can visualization techniques be applied to understand packet captures or network flows instead of just producing pretty pictures? We explored how to uncover hidden patterns of data, identify emerging vulnerabilities and attacks, and respond decisively with countermeasures that are far more likely to succeed than conventional methods. As part of the workshop we looked at the insider threat problem and had a brief look at how host-centric (as opposed to network centric) analysis can help completing the picture of an incident.
The entire workshop is based on open source tools, such as AfterGlow or Treemap. The attendees got an overview of log aggregation, log management, visualization, data sources for IT security, and learned how to generate visual representations of log data. The workshop was accompanied by hands-on exercises utilizing Splunk, Loggly, and the DAVIX live CD.
The following is the agenda of the entire two days:
Data analysis relies on data. This section discusses a variety of data sources relevant to computer security. I show what type of data the various devices generate, show how to parse the data, and then discuss some of the problems associated with each of the data sources.
DAVIX is a Linux distribution that is used to analyze log data. This class is using the latest version that also has Splunk installed to provide an environment for the students to work on the exercises.
This section is giving an introduction to log management concepts, such as aggregation, parsing, connectors and agents, log archiving, and correlation. The logging landscape has drastically changed in the last years. We will see where things are at, how the cloud has changed log management, and what tools are being used nowadays. This will cover not only some of the commercial tools, such as Loggly, but also show a number of open source log management tools, such as Snare, syslog-ng, and rsyslog.
In order to make log data actionable, the data has to be manipulated and transformed into a form that can be processed by analysis tools. I will be showing a variety of methods (e.g., regular expressions, UNIX commands) to process logs.
This section on Splunk is going to give an introduction to the Splunk log analysis capabilities with an overview of different data processing methods, such as input configurations, field extractions, the use of event types, and application of tagging for event data.
Once Splunk is setup to receive data and it processes the data correctly, we can start to analyze the data. This section is going into the topics of running statistics on the data, summary indexing, trend reporting, using regular expressions for searching, etc.
This section introduces some basic visualization concepts and graph design principles that help generate visually effective graphs. It also gives an overview of graphs like treemaps, link graphs, or parallel coordinates, and how they can be used to visualize data.
After a short introduction to different data formats used by visualization tools, this section then discusses visualization tools and libraries. The Data Visualization and Analysis UNIX (DAVIX) distribution will be used to show most of the visualization tools. I will show how simple it is to generate visual representations of log data and how tools can be leveraged to visualize information. The theory is then backed by a number of exercises that allow the students to deepen the understanding of the tools and methods.
This section is a collection of use-cases. It starts out with a discussion of use-cases involving traffic-flow analysis. Everything from detecting worms to isolating denial-of-service attacks and monitoring traffic-based policies is covered. The use-cases are then extended to firewall logs, where a large firewall ruleset is analyzed first. In a second part, firewall logs are used to assess the ruleset to find potential misconfigurations or security holes. Intrusion detection signature tuning is the next two use-case. The remainder of the section looks at application layer data. Email server logs are analyzed to find open relays and identify email-based attacks. The section closes with a discussion of visualizing vulnerability scan data.
A topic often forgotten in security data analysis is the treatment of host-based logs. There is a great amount of information that can be collected directly on end systems. This information can be invaluable in forensic investigations. This section explores what these data sources are and how they can be used. In addition, this section will show how this data can be cross-correlated with network-based data.
Packet captures are a very common and rich data source to identify attacks and understand the details of attacks. This section is going to explore how Splunk can be used to analyze packet captures effectively.
This is some software called Gibson that I wrote in python using the Panda3D game engine. It currently takes input from intrusion detection systems and displays their interactions with nodes in your network as it receives them. In addition to 3 axes, it uses direction, color, time, etc. to visually organize the data. I'm working on improving the interface and expanding the types of data it will map. Very much in alpha phase of development, but I'd love feedback! Watch the video, it shows it better than a static picture.
The traceroute program has been a established tool for networking and network trouble shooting. It helps determine the path that packets take as they travel over the Internet, but is limited that it can only give you a linear picture, to a single IP address. With the polytraceroute.sh bash script and the afterglow visualization program, you can get a better idea of how your local computers are connected to each other. It uses a combination of traceroute or tcptraceroute, and arping to gather the information needed to create the plot. It uses arping to find all computers in your local subnet, and then picks random IP addresses to find the possible paths available out onto the Internet. In the picture, my IP is 192.168.0.100.
Through our support of the Honeynet Project, we recently attempted a new approach to visualizing attacks on their VOIP honeypots.
With the increase in popularity of VOIP telephony, attacks are becoming more prevalent. The compromise of a VOIP system can cost the victim over $100,000 in real cash. For example, an Australian based company suffered $120,000 in toll fraud as a result of a VOIP compromise.
The video is intended to be a high level (if not stylized) visualization of the early stages of a cyber criminal compromising a VOIP system.
I used Perl and Chart Director to make this trellis chart. The data are from SSHd logs obtained from Challenge 5 at honeynet.org that has been over for a while now.
The charts are sorted by the total number of bad logins. If there was at least one accepted login and the total number of bad logins were greater than the accepted logins, the attack succeeded and the bars are red. Blue means the attack failed. As you may be able to tell, the Invalid, Failed, and Accepted percentages were calculated by the amount_per_IP / amount_per_all_IPs * 100.