For those of you who read this blog on a more regular basis you will know what i talking about, for those of you who just landed here let me explain.

My project over the last several months has been looking at Remote Access Trojans / Tools. My aim was not to discover how they operate, RATS for the most part operate in the same way.

What i was interested in was how to detect these RATS and more importantly how could i share the information I was gathering with anyone else who could make use of it. The data set will largely contain Samples from ‘Script Kiddies’ who mass target individuals just to get Slaves, but it will also capture samples that have been used by Crime Packs and potentially even APT Actors.

Most methods of capturing this data use sandboxes to run the sample and observe its behaviour I wanted a more static approach that would be easier to implement and scale nicely

The Config

The malware that is delivered to the victim contains a lot of information hardcoded inside the malware itself. Typically these configuration settings are obfuscated or encrypted so they are not immediately accessible, but they still have to be included inside the malware so it knows what to do, examples of the kind of information embedded are:

  • The Phone Home Or Beacon Information
  • The Install Information
  • Campaign and Password Information

The Malware

I’m constantly generating Decoders for more malware families starting with RATs. You can view the current supported list at

The Output

The logical start point is to have the entire config output so you can see all the Indicators, but the Security Community also has a rather expansive suite of tools that analysts can use that will make use of data like this.So I wanted to have the ability to export the information in a structured way as well as just exposing the configuration settings. The initial export types i decided on where:

  • Snort Rules - These provide Network detection’s for the specific implant. Where available I will also include Snort Rules that will match on any sample.
  • Yara Rules - Yara provides a Host based method for detecting the File.
  • IOC - Indicators of Compromise provide a wide range of use across multiple analysts platforms and tools, Initially designed around the OpenIOC format I will, in the future, look at other IOC formats

The Sharing

Sharing the data was a big part of what I wanted to achieve by launching as well as sharing specific information on the samples in its database i also wanted to provide a comprehensive list of domain names and IP Addresses that have been extracted from all the samples. This could be used by those who implement firewall Blocking, Blacklisting etc. At the time of submission any domain name is queried for its current IP which is stored along side the configuration.

I also wanted to offer a deeper view of the database for those who could make use of it. More specifically I wanted to offer full keyword search functionality across all fields in the config sections and the ability to query this information from within your own tool set or scripts.

At this point I feel it is important to note that once the config has been extracted or if no config is found the raw sample itself is NOT retained. The samples is stored in the queue until it has been processed and then disappears in to the ether never to be seen again.

There is currently no method to re-scan the dataset or to re-scan individual files after it has been processed.

To achieve all these sharing goals I have or very soon will implement the following:

  • Daily Generated CSV of all Domains and IP’s (Complete Snapshot)
  • Daily Generated CSV of all Domains and IP’s submitted in the last 24 Hours. (Daily Snapshot)
  • Public Limited API That will accept Hash / Keyword / Malware Type as a search term and return all Matching results.
  • Private API, this API is only available on request and will push the data OUT to you every time a sample is processed.

The DataSet

For this to be of any use i need to seed the initial dataset and then continue to grow the database keeping as current as I can, The initial seed database will be generated from the 16 Million or so samples currently available from

It’s important to note that only a small percentage of those samples will match a RAT and be decoded ending up in the database. From there i will deploy automated collectors that will look in all the usual places that RATS appear. There is also the ability for the public to submit samples directly via the web interface.

The Timeline

I am setting myself a hard Deadline of Sunday 1st June 2014 to start pushing the samples from virusshare through the list of decoders that are available at the time.

From now until then I will be working on the following items in no particular order

  • Improve CSS and View of the WebSite
  • Add More Decoders
  • Add more IOC Formats
  • Virus Total Integration
  • Geo IP For stats (With HeatMap)
  • Domain / IP Daily export
  • API for search / integration
  • Rescan with New Decoders

The current  Site can be found at Please be aware that the site is still very much in alpha and still has a few bugs I know of and im sure a handful I don’t yet. Please feel free to push samples up and report any issues you find. Every week I will try to push the latest dev version including new decoders live.

With this in mind please be aware that when I push the dataset through in August I will first empty the database, Any samples stored will join the queue and be reprocessed


I will shortly be releasing the template for the decoders so if anyone wishes to contribute decoders they can. Once I have had a chance to properly test the decoders they will all be released individually including the ones I have already written. The code that runs will not be released.

If you wish to receive the feed from the Private API please let me know so I can work out the best way of delivering it to you. Expected output at the moment is json.

for everything else

Questions Queries Comments below or direct to