(This article was first published on Data Driven Security, and kindly contributed to R-bloggers)
This was (initially) going to be a blog post announcing the new mhn R package (more on what that is in a bit) but somewhere along the way we ended up taking a left turn at Albuquerque (as we often do here at ddsec hq) and had an adventure in a twisty maze of Modern Honey Network passages that we thought we’d relate to everyone.
Episode 0 : The Quest!
We find our intrepid heroes data scientists finally getting around to playing with the Modern Honey Network (MHN) software that they promised Jason Trost they’d do ages ago. MHN makes it easy to [freely] centrally setup, control, monitor and collect data from one or more honeypots. Once you have this data you can generate threat indicator feeds from it and also do analysis on it (which is what we’re interested in eventually doing and what ThreatStream does do with their global network of MHN contributors).
Jason has a Vagrant quickstart version of MHN which lets you kick the tyres locally, safely and securely before venturing out into the enterprise (or internet). You stand up the server (mostly Python-y things), then tell it what type of honeypot you want to deploy. You get a handy cut-and-paste-able string which you paste-and-execute on a system that will become an actual honeypot (which can be a “real” box, a VM or even a RaspberryPi!). When the honeypot is finished installing the necessary components it registers with your MHN server and you’re ready to start catching cyber bad guys.
(cyber bad guy)
Episode 1 : Live! R! Package!
We decided to deploy a test MHN server and series of honeypots on Digital Ocean since they work OK on the smallest droplet size (not recommended for a production MHN setup).
While it’s great to peruse the incoming attacks:
we wanted programmatic access to the data, so we took a look at all the routes in their API and threw together an R package to let us work with it.
NOTE: that’s not the real destination_ip so don’t go poking since it’s probably someone else’s real system (if it’s even up).
You can also get details about the attackers (this is just one example):
The package makes it really easy (OK, we’re probably a bit biased) to grab giant chunks of time series and associated metadata for further analysis.
While cranking out the API package we noticed that there were no endpoints for the MHN HoneyMap. Yes, they do the “attacks on a map” thing but don’t think too badly of them since most of you seem to want them.
After poking around the MHN source a bit more (and navigating the view-source of the map page) we discovered that they use a Go-based websocket server to push the honeypot hits out to the map. (You can probably see where this is going, but it takes that turn first).
Episode 2 : Hacking the Anti-Hackers
The other thing we noticed is that—unlike the MHN-server proper—the websocket component does not require authentication. Now, to be fair, it’s also not really spitting out seekrit data, just (pretty useless) geocoded attack source/dest and type of honeypot involved.
Still, this got us wondering if we could find other MHN servers out there in the cold, dark internet. So, we fired up RStudio again and took a look using the shodan package:
Yikes! 141 servers just on the default port (3000) alone! While these systems may be shown as existing in Shodan, we really needed to confirm that they were, indeed, live MHN HoneyMap [websocket] servers.
Episode 3 : Picture [Im]Perfect
Rather than just test for existence of the websocket/data feed we decided to take a screen shot of every server, which is pretty easy to do with a crude-but-effective mashup of R and phantomjs. For this, we made a script which is just a call—for each of the websocket URLs—to the “built-in” phantomjs rasterize.js script that we’ve slightly modified to wait 30 seconds from page open to snapshot creation. We did that in the hopes that we’d see live attacks in the captures.
That makes capture.sh look something like:
Yes, there are far more elegant ways to do this, but the number of URLs was small and we had no time constraints. We could have used a
pure phantomjs solution (list of URLs in phantomjs JavaScript) or used
GNU parallel to speed up the image captures as well.
Sifting through ~140 images manually to see if any had “hits” would not have been too bad, bit a glance at the directory listing showed that many had the exact same size, meaning those were probably showing a default/blank map. We uniq‘d them by MD5 hash and made an image gallery of them:
It was interesting to see Mexico CERT and OpenDNS in the mix.
Most of the 141 were active/live MHN HoneyMap sites. We can only imagine what a full Shodan search for HoneyMaps on other ports would come back with (mostly since we only have the basic API access and don’t want to burn the credits).
Episode 3 : With “Meh” Data Comes Great Irresponsibility
For those who may not have been with DDSec for it’s entirety, you may not be aware that we have our own attack map (github).
We thought it would be interesting to see if we could mashup MHN HoneyMap data with our creation. We first had to see what the websocket returned. Here’s a bit of Python to do that (the R websockets package was abandoned by it’s creator, but keep an eye out for another @hrbrmstr resurrection):
That particular server is very active, hence why we chose to use it.
The output should look something like:
Those are near-perfect JSON records for our map, so we figured out a way to tell iPew/PewPew (whatever folks are calling it these days) to take any accessible MHN HoneyMap as a live data source. For example, to plug this highly active HoneyMap into iPew all you need to do is this:
http://ocularwarfare.com/ipew/?mhnsource=http://128.199.121.95:3000/data/
Once we make the websockets component of the iPew map a bit more resilient we’ll post it to GitHub (you can just view the source to try it on your own now).
Fin
As we stated up front, the main goal of this post is to introduce the mhn package. But, our diversion has us curious. Are the open instances of HoneyMap deliberate or accidental? If any of them are “real” honeypot research or actual production environments, does such an open presence of the MHN controller reduce the utility of the honeypot nodes? Is Greenland paying ThreatStream to use that map projection instead of a better one?
If you use the new package, found this post helpful (or, at least, amusing) or know the answers to any of those questions, drop a note in the comments.
To leave a comment for the author, please follow the link and comment on his blog: Data Driven Security.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...