Discussion Entries

warning: Creating default object from empty value in /usr/www/users/zrlram/secviz/modules/taxonomy/taxonomy.module on line 1387.

A Study on Data Visualization and the Security Professional

“A picture is worth a thousand words.” We have all heard that proverb before, and as security professionals, we see examples of this through the charts, graphs and maps we use on a daily basis. RSA, the Security Division of EMC, is currently conducting a study to gain perspective on how security professionals interact with data visualizations. We invite you to participate in our survey.

A student final year project

Hi

I am a final year undergrad student studying computing and information security. I am about to embark on my final year project, for which I have decided to try and create a network monitoring tool that visualizes the logs of the network traffic. I would like to point out that this is still a very young project but would like to ask for any pointers or advice.

I am currently thinking that I will build a piece of software in something like pearl that will call and run tcp dump from a server on the network. This will then log all entries to a data base or I i can read from p cap file s into a script which will help to display the data on a web interface from a host on the network..

have a lot to learn but some points in the correct direction would be great... like is pearl best to use should I dump everything into a data base or will it fall over if i chuck loads of data at it etc ...

your help is much appreciated

Log Templater (Artificial Ignorance Utility)

During the last security incident that I worked on, I needed to grind through 20gb of log files looking for any odd log lines that would indicate the point where the bad guys got in. If I had done it manually, I would still be looking at log data. Instead, I built a tool that converted logs into pattern templates and looked for templates that I had never seen before. This allowed me to zero in on just a few hundred log lines out of all the data.

Templater is a small and fast log processor that provides simple artificial ignorance capabilities. You use the tool to process past log data and store templates that represent normal log line structures. You then run the tool against current or target logs and all normal patterns are automatically ignored. The parser is fast and capable of processing millions of lines per minute. For weblogs and firewall logs, I average 9M lines per minute on a 2GHz x86 machine running *NIX. The template strategy was originally proposed by a friend of mine in 2003 who later built a tool called never before seen (NBS) which also provides artificial ignorance for arbitrary text data as well as text structures.

New Graph Widget

I am optimizing and testing a new QtWidget I have created for force directed graph layouts. Currently it will only utilize the CPU but I am working on a CUDA implementation as well. When release the widget will allow developers to place force directed graph layouts in their qt applications. Additionally this will be included in a new version of inav hopefully out soon. More images on my flickr page:
http://www.flickr.com/photos/scap1784/sets/72157627651310826/with/6168147922/

Web based 3D malware visualization

I had created two visualization under the scope of GSoC 2011 Honeynet Project. The main aim was to create a 3D mesh with heat map tiles on a world map. The first visualization is a quick proof of concept though the second one is created by using Processing visualization framework.

Please check the documentation site to get detail information and leave me your feedback. The site includes links to demo sites also.

Quickparser v0.5 released

After re-implementing some parser code in difftree with bug fixes and improved bounds checking, I have ported those changes back to quickparser. So, I now present the new and improved quickparser v0.5 with less bugs and more bounds checking. Re-energized after touching the code, I am working on the template based version, which will not be limited to syslog style, key=value formatted logs.

The Visualization Lifecycle

This weekend I spend my time at Data Insight SF. It was a competition where teams were given a data set to visualize. The outcomes were pretty impressive (I might post pictures of the results later). While the teams were working on their projects, various people taught workshops. I was one of them and I talked about the Visualization Lifecycle.


View more presentations from Raffael Marty

A Hands-On Data Visualization Competition - June 24-26 in San Francisco!

Join us for data in sight: making the transparent visual, a hands-on data visualization competition held June 24-26 at Adobe Systems, Inc.’s office in San Francisco. Coders, programmers, developers, designers, scientists – anyone who believes that data is divine and has ideas for bringing it to life – are invited to join in the fun.

The program begins Friday evening with a session introducing the data sets and tools and a chance to form teams. Saturday kicks off with inspirational talks by data visualization experts from the Netherlands and Switzerland — Dutch graphic designers from Catalogtree and LUST and Switzerland-based interaction designers from Interactive Things. Then it’s down to business, as you roll up your sleeves and get hacking on a data visualization of your own.

Awards will be presented at the end of the weekend for winning projects in the following categories: best dynamic presentation, best fusion of multiple data sets, most actionable, most aesthetically pleasing, most creative, and the ever popular People's Choice award! (Bonus points for the best use of Swiss or Dutch data.)


REGISTER NOW

More details online at www.datainsightsf.com

Log Analysis and Visualization Training - Taipei

Mid March I taught a Log Analysis and Visualization class in Taipei, Taiwan. I had a total of about 35 students spread over two classes, each of them lasting for two days.

The first part of the workshop focused on the application and use of log analysis with a number of tools. We looked at Splunk with topics like advanced searches, lookups, and alerting. We then looked at Loggly and learned how to use the logging service to analyze logs and build mashups against it.

The remainder of the workshop explored the world of data analysis and visualization. Using today's state-of-the-art data analysis and visualization techniques, we looked at how we can gain a far deeper understanding of what's happening in our networks. How can visualization techniques be applied to understand packet captures or network flows instead of just producing pretty pictures? We explored how to uncover hidden patterns of data, identify emerging vulnerabilities and attacks, and respond decisively with countermeasures that are far more likely to succeed than conventional methods. As part of the workshop we looked at the insider threat problem and had a brief look at how host-centric (as opposed to network centric) analysis can help completing the picture of an incident.

The entire workshop is based on open source tools, such as AfterGlow or Treemap. The attendees got an overview of log aggregation, log management, visualization, data sources for IT security, and learned how to generate visual representations of log data. The workshop was accompanied by hands-on exercises utilizing Splunk, Loggly, and the DAVIX live CD.

The following is the agenda of the entire two days:

Day 1 - Log Analysis

  • Section 1: Data Sources
  • Data analysis relies on data. This section discusses a variety of data sources relevant to computer security. I show what type of data the various devices generate, show how to parse the data, and then discuss some of the problems associated with each of the data sources.

  • Section 2: Data Analysis and Visualization Linux (DAVIX)
  • DAVIX is a Linux distribution that is used to analyze log data. This class is using the latest version that also has Splunk installed to provide an environment for the students to work on the exercises.

  • Section 3: Log Management and Analysis
  • This section is giving an introduction to log management concepts, such as aggregation, parsing, connectors and agents, log archiving, and correlation. The logging landscape has drastically changed in the last years. We will see where things are at, how the cloud has changed log management, and what tools are being used nowadays. This will cover not only some of the commercial tools, such as Loggly, but also show a number of open source log management tools, such as Snare, syslog-ng, and rsyslog.

  • Section 4: Log Data Processing
  • In order to make log data actionable, the data has to be manipulated and transformed into a form that can be processed by analysis tools. I will be showing a variety of methods (e.g., regular expressions, UNIX commands) to process logs.

  • Section 5: Splunk Introduction
  • This section on Splunk is going to give an introduction to the Splunk log analysis capabilities with an overview of different data processing methods, such as input configurations, field extractions, the use of event types, and application of tagging for event data.

  • Section 6: Splunk Data Analysis
  • Once Splunk is setup to receive data and it processes the data correctly, we can start to analyze the data. This section is going into the topics of running statistics on the data, summary indexing, trend reporting, using regular expressions for searching, etc.

Day 2 - Visualization

  • Section 1: Visualization
  • This section introduces some basic visualization concepts and graph design principles that help generate visually effective graphs. It also gives an overview of graphs like treemaps, link graphs, or parallel coordinates, and how they can be used to visualize data.

  • Section 2: Data Visualization Tools
  • After a short introduction to different data formats used by visualization tools, this section then discusses visualization tools and libraries. The Data Visualization and Analysis UNIX (DAVIX) distribution will be used to show most of the visualization tools. I will show how simple it is to generate visual representations of log data and how tools can be leveraged to visualize information. The theory is then backed by a number of exercises that allow the students to deepen the understanding of the tools and methods.

  • Section 3: Perimeter Threat
  • This section is a collection of use-cases. It starts out with a discussion of use-cases involving traffic-flow analysis. Everything from detecting worms to isolating denial-of-service attacks and monitoring traffic-based policies is covered. The use-cases are then extended to firewall logs, where a large firewall ruleset is analyzed first. In a second part, firewall logs are used to assess the ruleset to find potential misconfigurations or security holes. Intrusion detection signature tuning is the next two use-case. The remainder of the section looks at application layer data. Email server logs are analyzed to find open relays and identify email-based attacks. The section closes with a discussion of visualizing vulnerability scan data.

  • Section 4: Host-based Security Analysis with Splunk
  • A topic often forgotten in security data analysis is the treatment of host-based logs. There is a great amount of information that can be collected directly on end systems. This information can be invaluable in forensic investigations. This section explores what these data sources are and how they can be used. In addition, this section will show how this data can be cross-correlated with network-based data.

  • Section 5: Packet Capture Analysis with Splunk
  • Packet captures are a very common and rich data source to identify attacks and understand the details of attacks. This section is going to explore how Splunk can be used to analyze packet captures effectively.

Currently there are no future workshops planned, but contact me if you are interested.

3D Modeling of Real Time Security Events

This is some software called Gibson that I wrote in python using the Panda3D game engine. It currently takes input from intrusion detection systems and displays their interactions with nodes in your network as it receives them. In addition to 3 axes, it uses direction, color, time, etc. to visually organize the data. I'm working on improving the interface and expanding the types of data it will map. Very much in alpha phase of development, but I'd love feedback! Watch the video, it shows it better than a static picture.

need help in 3d treemap

Hi all,

i'm a student and i'm doing project on visualization. can i know is it possible to do a 3d treemap using this DAVIX? hope to heard from you soon. thank you.

Regards,
Adeline

VizSec 2011 Call for Papers Released

This year's VizSec Symposium will be held at Carnegie Mellon University, Pittsburgh, PA, USA on 20 July 2011. VizSec brings researchers and practioners from academia, government, and industry to share insights and present solutions to modern cyber security challenges using visualization techniques. Technical papers, speakers, and presentations will be featured in this year's program. If you are conducting research into security visualization please consider submitting a research paper (due by 1 April 2011) or a panel proposal (due by 15 April 2011).

Security Visualization - State of 2010 and 2011 Predictions

At the recent SANS Incident response and log management summit, I was part of a panel on security visualization. As an introduction, I presented the attached slides on the security visualization trends and where we are today.
I looked at four areas for security visualization: Data, Cloud, Tools, and Security. I started with looking at the log maturity scale that I developed a while ago. Barely any of the present companies could place themselves to the right of correlation point. It's sad, but probably everyone expected it. We have a long way to go with log analysis!

Data

It's very simple. If you don't have the data, you cannot visualize it. A lot of companies are still struggling to collect the necessary data. In some cases, the data is not even available because applications do not generate it. This is where data analysis or security people have to start voicing their needs to the application owners and developers in order to generate the data that they need. In addition, developers and security people have to communicate more to learn from each other. Ideally, it is not even the security folks that visualize and analyze the application logs, but it is the application people. Just a thought!
What we will see next year is that the Big Data movement is going to enable us to crunch more and bigger data sets. Hopefully 2011 will also give us an interoperability standard that is going to ease log analysis.

Cloud

What does the cloud have to do with security visualization? Well, it has to do with processing power and with application development. Applications generate logs and logs are used for security visualization. Cloud services are new pieces of software that are being developed. We have a chance here to build visibility into those applications, meaning we have an opportunity to educate these developers to apply logging in the right way.
Next year we will see a lot of companies that are going to roll their own log analysis systems based on big data technology, such as Hadoop. We have seen a number of companies doing this already in 2010: Facebook, Linkedin, NetFlix, Zynga, etc. Traditional log management solutions just don't scale to these companies' needs. This will continue next year.

Tools

With tools I mean security visualization tools. We are absolutely nowhere with this. There are a couple of simple tools out there, but there is no tool that really does what we need: brushing, linked views, supports large data sets, easy to use, contextualized, etc.
Next year won't really change anything in this area. What we will see is that more and more tools are built on the Web. The cloud movement is kind of responsible for this push, but so is the broad utilization of HTML5 with all of it's goodness (e.g., Websockets, Canvas). We will see advances in the social space with regards to visualization tools. Security will continue utilizing those tools to analyze security data. It's not ideal because these tools are not meant for this, but hey, better than nothing! Maybe this will help creating awareness and will surface some interesting use-cases for security visualization.

Security

What will we see in security visualization? Well, as we saw earlier, we don't have the data. What that means is that we haven't really had a chance to learn how to visualize that data. And because we didn't have that chance, we don't really understand our data. Read that again. I think this is an important point!
Next year will give us more bad security visualization examples. And I am lumping product displays into this. Have you looked at your tool lately? During the SANS summit, I had a chance to look at some of the vendor's dashboards. They are horrible. 3D charts, no legends, bad choice of colors, non actionable dashboards, etc. Note to log management vendors: I offer a security visualization class. You might want to consider taking it! But back on topic. Visualization, just like security, will stay an afterthought. It's being added when everything else is in place already. We know how that generally turns out.

I know, I am painting a gloomy picture. Hopefully 2011 will have some surprises for us!

Equilibrium Networks free/open-source software release

Equilibrium Networks' free/open-source visual network traffic monitoring software is now available for download at http://www.eqnets.com. A video of our enterprise system in action and technical documents detailing our approaches to traffic analysis, real-time interactive visualization and alerting are also available at our website.

Besides a zero-cost download option, we are also offering Linux-oriented installation media and an enterprise version of our system with premium features such as configurable automatic alerting, nonlinear replay, and a 3D traffic display.

Discounts—including installation media for a nominal shipping and handling fee—are available to institutional researchers or in exchange for extensions to our platform.

The software can run in its entirely on a dedicated x86 workstation with four or more cores and a network tap, though our system supports distributed hardware configurations. An average graphics card is sufficient to operate the visualization engine.

Log Visualization in the Cloud - Webinar

On August 19th, at 10am PST I will be giving a Webinar on the topic of visualization. You can register and watch the Webinar right here:

A BrightTALK Channel

Cloud-based Log Analysis and Visualization

I was giving a talk at RMLL 2010, a french free software conference. The title, Cloud-based Log Analysis and Visualization, already gives the content away. But in case, here is the abstract for the talk:




Cloud computing has changed the way businesses operate, the way businesses make money, and the way business have to protect their assets and information. More and more software applications are moving into the cloud. People are running their proxies in the cloud and soon you will be collecting your logs in the cloud. You shouldn't have to deal with log collection and log management. You should be able to focus your time on getting value out of the logs; to do log analysis and visualization.

In this presentation we will explore how we can leverage the cloud to build security visualization tools. We will discuss some common visualization libraries and have a look at how they can be deployed to solve security problems. We will see how easy it is to quickly stand up such an application. To close the presentation, we will look at a number of security visualization examples that show how security data benefits from visual representations. For example, how can network traffic, firewall data, or IDS data be visualized effectively?

Monitoring / Visualisation Stations, & relevance of layer 4 traffic

Opinions sought from those working in the relevant areas - handed this document in as part of a degree project in security visualisation & monitoring, and the feedback was that the network and monitoring station/s are not realistic, and that I should have focused on port 80 and layer 7 traffic only, as layer 4 is not relevant any longer. The link provided below is only part of the document, I presume it's the part they had issues with. I wasn't actually intending to focus on web traffic, which was made clear in the document anyway (tho I did indicate to them that with the likes of Rumints packet contents visualiser, it is certainly viable to utilise that to match up with malware signature databases - but that aspect wasn't the focus of the project).
I don't expect it says anything that people working in those areas will be unaware of, and the general intention was to address what would be required for a monitoring station / network, which includes visualisation software, that would work in real-time as well as offline analysis and traffic capture.
The grouping into 'objectives' is just part of how the work has to be presented to comply with guidelines. Cheers for input, I know you're probably busy.

http://docs.google.com/fileview?id=0B2FJ1rXW3lv4Y2UyMGFlZmYtMWE3OC00MmNlLTk4ZDktMmEyNjdhODYxM2Iy&hl=en

nb - the last part is probably wrong about ad-hoc IPs; I can't remember exactly right now how they are handed out; they probably aren't always dynamic esp. now it's more common to get fixed-IP SIMs.

EDV - Event Data Visualization

Afterglow has been on my list of 'neat tools' for quite some time. Thankfully, last month I finally had a bit of spare time to really play with it.

The result was EDV: http://www.pintumbler.org/code/edv

See the page for more info. Keep in mind, this is BETA!

It currently supports Snort (Sguil DB format). However, even the untrained eye can easily modify it for straight Snort
or anything else you can MySQL query. Once you have your sources defined it will take care of the rest.

The tool is static (controlled by configs and cron) for now but I do plan on adding a query tab to the web page so that you can do on the fly queries. Low priority for now. I have been focusing on 2 parsers that log directly to MySQL. One parses Syslog output from a Barracuda spam firewall and the other URL info captured by URLSnarf. These will be my next additions.

Comments and suggestions welcome.

Thanks.

Interesting patterns World of Warcraft

It's been a pretty quiet day today, but I noticed an intersting pattern emerge. I hadn't seen it before, which is really strange considering I work at a college. Im using Sphere of Influence 3.0 summary window and timeline from a Cisco ASA.
In patten "C" I show the normal allowed network traffic. (the horizontal "bars" of traffic are a p2p program not associated with WoW) This shows traffic both into and out of the college. I noticed the patten and highlighted it some more. This showed me the organization. Now if anyone knows anything about world of warcraft the organization was blizzard communications. I filtered all traffic to and from organizations with the word blizzard in them. As you can see from pattern "A" it shows clearly a world of warcraft traffic patten - updating itself is the easier pattern to spot. I also filtered the traffic in pattern "B" denied window. The traffic being denied is port 3724...voice. The timeline (Pattern c) assured me that traffic was indeed seen on 3724 (WoW port) Although tempted to put in a QOS statement to slowly grind that machine to a crawl, I opted for the easier solution. It came from a library computer. So it was just a simple matter of visiting the library and removing the software off a machine that somehow was unfrozen. Freezen the machine and updating a few rule sets.

Patterns always interest me, just thought I'd share this one with you all.

SOI URL's added

We added a URL's visual to the pix/asa..so now we collect the URL's...this helps when monitoring a system as you not only see the connection, like in the old way, but now you see the urls ....As per usual you can filter it so as to look for particular organizations or countries...but using the key word you can also hunt for anything in the url...be useful if hunting C2 traffic for infections