Setting up a malware Zoo with VXCage

If your going to research Malware your going to need some samples. Any researcher worth his weight will have a well maintained Zoo.  This post is going to explore the world of malware collection and a few nice ways to store your samples.

*DISCLAIMER* There are many ways to collect and share samples, and many ways to store them for your lab. The methods i discuss might not be the best for your setup but hopefully they can point you in the right direction.

Before we dive in to collecting our samples lets figure out how and where we are going to store them


As with most things in life there are a variety of methods that can be used for storing malware samples, each with their own set of pros and cons. Lets start with the where.

Your malware needs to be accessible from where ever your lab is. If you have multiple labs that share the same data sets you need to make sure they can both access it, otherwise we get in to the dark world of data duplication. We need to make sure that our data has restricted access, that is to say only those who need to access it can. The last thing i want is for my wife to stumble across my malware samples and start running them from her own PC.

We also want to make sure that we can’t accidentally infect our own machines. The simplest way to do this is to make sure my samples are saved on a different OS to the one they are designed for. If I’m storing Windows executable lets store them on a Linux Machine, if I’m storing Word Exploits make sure Office is not installed, These all seem like common sense suggestions, but can easily be overlooked.

The final thing we want to do is make sure that whatever our storage solution is we don’t want AV on there that is going to start wiping out our data sets. I recently set up a folder on my drop box where friends and family, or anyone who reads this could drop samples they wanted me to look at. Deployed the widget on the site and everything was looking great, my test file was uploaded and synced with my lab. But by the time i had switched over to my lab, the sample was nowhere to be seen, it had disappeared. . . After several frustrating minutes I log on to one of my other machines and see an AV Alert. My main Rig upstairs is also syncing my drop box it had synced my new folder and the AV on it was instantly removing it.

For me i choose to store all my samples on a Linux OS, and one more than that its a virtual Linux OS. Having my samples stored on a Virtual has its own set of Pros and Cons.

  • Backups are easy with snapshots, cloning and just plain copying the VM Images.
  • Its Hardware independent and relatively portable,  I can carry it around on my external drive (Which is Encrypted of Course) and then anywhere I have a Virtual Player I can access it all.
  • On the downside having virtual hardware means I loose the performance of dedicated hardware. But my sample set is relatively small and my ESX server is more than capable of all the searching and storing I need it to do.

Now we have somewhere to store all our data lets take a look at how we are going to store them.

The simplest approach is to simply store them all in folders. Initially this is very simple but as your zoo expands this is going to become extremely difficult to manage. Some of the initial issues you will face include search and retrieval difficulties, data duplication and categorization, i.e. simply putting items in to the wrong folders or renaming them later on. You can see where this can become tiresome.

The way forward is clear we need some sort of interactive application that will allow us to store, query and retrieve our samples. If you have the pre-requisite knowledge you could build one yourself this has the added benefit of being exactly what you want it to be, but can be time-consuming and difficult, i know I’m building my own zoo to deal explicitly with emails samples.

If your after something free and that does the job, as with most things someone has already built one for us.

Introducing VX Cage, created by ‘nex’ over at it provides a Web Based rest API capable of storing, tagging, searching and retrieving our samples. The install guide provided is basic but works, ill walk you through the steps I took to create my Virtual.

Installing VX Cage

Instead of creating a Linux install from scratch i prefer to grab a turnkey core Linux build, these are lightweight quick and easily deployable. You can grab a VM from here

Power on your VM and run through the initial configuration options.

Clone the latest version from the git repo

cd /opt/
git clone

Grab our Linux dependencies

# apt-get install apache2 libapache2-mod-wsgi mysql-server python-mysqldb

Create our Virtual Host edit /etc/apache2/sites-available/default and replace with the following contents

<VirtualHost *:80>
ServerName localhost

WSGIDaemonProcess yourapp user=www-data group=www-data processes=1 threads=5
WSGIScriptAlias / /opt/vxcage/app.wsgi

WSGIProcessGroup yourgroup
WSGIApplicationGroup %{GLOBAL}
Order deny,allow
Allow from all

AuthType Basic
AuthName "Authentication Required"
AuthUserFile "/opt/vxcage/users"
Require valid-user

ErrorLog /opt/vxcage/error.log
LogLevel warn
CustomLog /opt/vxcage/access.log combined
ServerSignature Off

Create an authenticated user

# htpasswd -c /opt/users username

install python dependencies

pip install bottle sqlalchemy

edit /opt/vxcage/api.conf and set your sql username and password

Create the Database and set permissions on our folders.

# mysql -u root -p
> create database malware;
Query OK, 0 rows affected (0.04 sec)

mkdir /opt/vxcage/malware
chmod 777 /opt/vxcage/malware

We are now ready to submit our first sample and make sure everything is working ok.

# curl -F file=@sample.exe -F tags="tag1 tag2" http://localhost/malware/add

if everything works as intended you should now see {“message”: “added”}

The Read me that’s included has examples for interacting with the API including searching and pulling samples down. I’ll cover these in more detail in a future blog post but there’s enough there to get you started.

Now we have somewhere to store our samples its time to go out and get some


Sample collection can be a difficult task, by their very nature malware samples can be destructive in the wrong hands. it is for this reason you will find a lot of the research communities will require you to validate your need for access, or pay for the privilege. This section is not going to hold your hand through requesting access to these repositories. All I will do is list the sites I use or are a member of, it is up to you to get an account..

]( * MalShare * VirusShare * ContagioDump * malwr * urlquery * wepawet

  • HoneyPots
  • Self Created

Google. This seems like quite an obvious one, but can also be a rabbit hole. The purpose here is get our hands on the actual samples. Google will be great at showing you the samples described in many Research Blogs, but the samples will most likely not be presented to you but typically MD5’s are and these will be useful if we have access to other sites.

With the exception of urlquery and wepawet all the sites i listed are Invite or Subscription based, as I said I’ll leave it up to you to get yourself access.

urlquery and wepawet are not strictly speaking malware repositories, what they do is scan URLs and tell you if they contain exploits or not. To extract the malware you need to CAREFULLY click the link, now when i say carefully i mean from inside you malware lab, from a machine that is designed to download the malware. As with most things I’ll try to cover this in another post at some point in the future.

Honeypots are another great way of collecting samples although can have limited results, deploy yourself a honey pot give it an internet feed and sit back and wait for samples.

The final method I want to talk about is self-generated samples. If you can get hold of the source or copies of the toolkits used to create malware you don’t need to rely on getting samples from the wild. You can create your own, configure them work on your own infrastructure and observe them acting with known variables, in some cases this is more useful in your research than working with other samples.

A good practice is to create your own samples and then compare your expected results with ‘Real’ Malware.

Moving My Lab:

Since i started writing this I have moved the location of my Zoo. One of the things holding me back from deploying my VXCage on to the internet was getting a host that would allow me to install all the python libs i needed. You can get the basic install on pretty much any shared hosting but most will not allow you to install anything that requires compiling. Like SSDeep. This leaves Virtual Private Servers, These have the benefit of allowing you full control, but have limited resources and can be expensive.

Fortunately I managed to find a great deal on a VPS that would be more than capable of holding all my Zoo. $15 a year. You can see the deals below these were provided by

BLUE2 OpenVZ</p>
  • 512MB RAM
  • 512MB vSwap
  • 2 CPU Cores
  • 25GB Diskspace
  • 1000GB Bandwidth
  • 1Gbps Port Speed
  • 1 IPv4 Address
  • OpenVZ/Feathur
  • Code: 512for15
  • $15/Year | Link
BLUE3 OpenVZ</p>
  • 1GB RAM
  • 1GB vSwap
  • 3 CPU Cores
  • 50GB Diskspace
  • 2000GB Bandwidth
  • 1Gbps Port Speed
  • 1 IPv4 Address
  • OpenVZ/Feathur
  • Code: 1024for25
  • $25/Year | Link
BLUE4 OpenVZ</p>
  • 2GB RAM
  • 2GB vSwap
  • 4 CPU Cores
  • 100GB Diskspace
  • 3000GB Bandwidth
  • 1Gbps Port Speed
  • 1 IPv4 Address
  • OpenVZ/Feathur
  • Code: 2048for45
  • $45/Year | Link

The downside to hosting like this is the low Storage spaces, I already had a 100GB Dropbox if I could link my Zoo to my Dropbox, this would allow for some redundancy. If I lost my host I would still retain my samples, and could even access these samples from outside of vxcage if I really needed to.

Dropbox has a Linux application that allows you to run Dropbox on a headless server. So I purchased a Blue2 Server linked my Dropbox excluded all but my storage dir from the auto sync and then set vxcage to use the mounted Dropbox folder.

Will have to watch the Bandwidth, not sure how quickly it will get chewed up but for now at least it's all working smoothly. (If your running vxcage on external hosting make sure you enable authentication on htacess.

As always questions, queries comments below.