From a security analytics and Threat Intelligence perspective Pastebin is a treasure trove of information. All content that is uploaded to pastebin and not explicitly set to private (which requires an account) is listed and can be viewed by anyone.
Using Yara Rules to find and save interesting data from pastebin https://github.com/kevthehermit/PasteHunter
Hackers and script kiddies are quick to push their warez on to the site for the world to see and developers / network engineers are prone to accidentally leaking internal configurations and credentials.
Anyway how can we the lowly security analyst sift through all this data and use it to our advantage?
We can scrape pastebin and check all the data uploaded to see if any of it is of interest to us. There are some tools that do this like dumpmon which monitors paste sites with a set of regular expressions and tweets it out.
This is nice but i wanted a bit more control. Pastebin doesn’t really object to its content being scraped but there are limits to how often you can do it before you IP gets either a temporary ban or a permanent ban.
Fortunately Pastebin offer an API and one that is specifically designed for this kind of task. At the moment they have a lifetime offer that gives you lifetime access for a one of payment of $19.95 (for the next few days)
With a pro account we can hit the API as often as once a second from a white listed IP. In practice you don’t need to query it anywhere near this often.
Great we have access to all the data now what to do with it. That’s where PasteHunter comes in to play.
Its a simple script and a set of Yara rules that will fetch pastes from the pastebin API and store any matching pastes in to an elastic search engine with a nice Kibana front end.
If your not familiar with Yara Rules. Yara is a pattern matching engine that’s mainly used for scanning files and categorising malware families. It makes it simple to build complex rules without getting lost deep in regular expressions.
Installation is relatively simple. Install Elastic Search and Kibana if you want the Web UI for searching content.
From there python3, Yara and the Yara python bindings.
Once you have everything installed clone the repo and set a cronjob to run the script at regular intervals. An example and slightly more detailed instructions are supplied in the readme on Github.
It comes with a collection of rules that scan for the most common things. Password dumps, leaked credentials hacked sites. You can also add your own keywords easily by creating a custom_keywords.yar file that looks something like this.
Looking for your domains, email addresses document names and things of this nature can give you a heads up if any of your data is compromised either deliberately or accidentally.
For more details on creating yara rules make sure you check out the official documentation.
And that’s about all there is to it. With the script up and running you should start to see data coming in.
Here are some examples of the data that gets captured.
Fair warning. The rules can be prone to false positives and don’t trust the value of the data any more than you trust the person who is uploading it to pastebin in the first place.
As usual any questions, queries, comments below.