One of my resolutions for the New Year is to spend more time conducting behavioral and static analysis of malicious PE files. I recently spent time watching some of the Cybrary Malware Reverse Engineering material and wanted to document my efforts here and share my notes and additional thoughts with you. There are plenty of malware sandbox analysis services available and some degree of automation is required, but as I wrote about here, I think it is important to attack knowledge dependencies. Over-reliance on a tool that is limited in ways we do not understand (black-box) or who's output cannot be trusted leads to incorrect findings and shoddy intelligence. It is increasingly common that IAT or OEP patching is necessary in resurrecting recovered files. Some degree of manual analysis and research will always be required to validate the effectiveness of our tools. By doing so we gain valuable insight into both the systems we protect and the methodology and tactics employed by malicious software. There are some very tangible results in doing this:
- Identify Anti-Virtualization Techniques
- Identify Anti-Debugging Techniques
- Identify Exploit Kit Families Anti-Research Efforts (net-block banning of security vendors or IPs observed scanning compromised botnet infrastructure)
- Fully Map Malware Capability (32 and 64 bit root-kit methodology, keylogging functionality, C2 communication, user-mode process hooking, financial theft capability, application whitelisting bypass techniques, method of process injection, memory protection evasion, 0-day exploits, packing techniques, lateral spread, persistence mechanisms etc)
- Identify Byte-code, Strings, Mutexs and other indicators of compromise that may go undetected by automated analysis solutions or behavioral analysis
- Restoring functionality to recovered PE files by patching IAT and OEP entries
Other companies may have developed some form of Incident Response capability whereby they may analyze logs, network traffic, strange processes, and whether the compromised machine/account has impacted additional systems. While this is certainly far more advanced, it still fails to answer the question of malware capability which in turn impacts the potential risk to the corporation. A mature practice will understand the value and necessity in obtaining this information.
It's clear that it is impractical to expect the vast majority of companies to have expert malware analysts on staff. Consequently, part of that malware analysis need is satisfied by utilizing automated sandbox solutions. It is largely up to vendors and research groups to do the heavy lifting with respect to full examination and reverse engineering. But we don't always want to be dependent on a third party, and sometimes we may want to keep analysis in house, especially when dealing with potential privacy violations (uploading samples to malwr or virustotal might not be ideal under those circumstances).
I often see (in documentation and in practice) that when an IR team recovers a sample they will look to do a hand-off to a malware analyst. In theory that is fantastic, but most organizations don't have the resources, budget, or staff to make that happen. So I wanted to write this post to help illustrate how incident responders might be able to take on portions of this responsibility and extract some benefit. As service providers in the incident response area I think we can benefit by having some of the knowledge. However, the intent of this post is to help demystify some of the setup and methodology associated with practical malware analysis to help internal information security staff embrace this challenge (especially for the small to medium sized business).
The Ukraine currently seems to be experiencing a wave of orchestrated information security attacks across a number of industries. So for this post I thought I would take a look at Win32/Potao. Potao is, similar to BlackEnergy, a piece of malware used in targeted espionage primarily in post-Soviet countries such as Ukraine, Georgia, and Belarus. Disclaimer: I'm not a full-time malware analyst, like I said, this post is intended to help Incident Responders embrace the prospect of malware analysis rather than shun it as a job duty.
Malware Lab Setup
In order to safely examine a sample we need to construct a small virtual network. For my purposes I will use freely distributed and open-source tools. To setup our infrastructure we have to download a number of items:
- Virtualbox (https://www.virtualbox.org/)
- A copy of IE9 on Windows 7
- A copy of IE8 on Windows XP
- Both of these can be obtained from Microsoft free of charge (https://dev.windows.com/en-us/microsoft-edge/tools/vms/windows/)
- An archived Java version such as JRE 6 or 7 (http://www.oracle.com/technetwork/java/archive-139210.html)
- Flash 9 or 10 (https://helpx.adobe.com/flash-player/kb/archived-flash-player-versions.html)
- A trial version of Office Suite (https://products.office.com/en-ca/try)
- Kali Virtualbox image (or VMWare if you prefer https://www.offensive-security.com/kali-linux-vmware-virtualbox-image-download/)
Most modern CPU manufacturers include hardware virtualization to accelerate Virtualbox and virtual machine applications. Intel VT-x is unfortunately not always enabled by default so if you get any errors relating to VT-x/AMD-v then you may need to reboot and change this setting in your BIOS.
Configuration
Once Virtualbox is installed you'll need to create two virtual machines; the first will be running Kali and will function as a default gateway that performs network analysis and service simulation, the second will be our victim machine running a vulnerable instance of Windows.
- Create a new Virtual Guest in Virtualbox for the Kali system. In the settings for this host you will need to create two network adapters. The first adapter should be configured to NAT. The second should be set to 'internal network'. Select a unique name for this virtual network. Boot into Kali.
- Verify that eth0 is assigned a NAT IP and that you can ping google.com
- Assign a static IP to the second interface ifconfig eth1 192.168.1.1 netmask 255.255.255.255
- Restart services by issuing 'service networking stop' and 'service networking start'
- Configure INetSim (http://www.inetsim.org/ alternatively you could use FakeNet)
- vim /usr/share/inetsim/conf/inetsim.conf and uncomment/change the service_bind_address to the IP address specified for eth1 on the virtual network
- iptables -t nat -A PREROUTING -i eth1 -j REDIRECT will ensure that new connections are redirected to the eth1 interface for processing
- Start INetSim and verify that all necessary listeners are running.
- You can also spin up Wireshark or tcpdump and start capturing all network sessions.
- Create a new Virtual Guest in Virtualbox for the Windows host. This machine will serve as our victim machine. The Windows machine should have a single adapter enabled and is set to 'Internal Network' with the same name specified for the Kali host.
- Configuring a static IP (192.168.1.10) for the interface connected to the virtual network.
- Open a command prompt and enter route add 0.0.0.0 mask 255.255.255.255 192.168.1.1 to ensure a default route is available and pushes traffic to our Kali host acting as the next hop.
- Specify a primary DNS of 192.168.1.1
- Remove guest additions
- Verify that the interface is not connected to your host OS via NAT or other means.
- Confirm that you are able to successfully ping the Kali host and browse to a website (INetSim should serve up a generic landing page for any HTTP GET Request)
We want to create a large and permissive attack surface on the victim host to ensure that the malware we detonate is able to run properly. At this point you can go ahead and install the older archived and vulnerable versions of; Java, Flash, Office, IE, and Adobe Reader. In addition you should install the Visual C++ Redistributable 2005/2010 (https://www.microsoft.com/en-us/download/details.aspx?id=3387) package to ensure the necessary run-time components are available.
Lastly, be sure to:
- Enable view hidden folders and file systems
- Turn auto-updates off
- Disable internet privacy and turn security settings to low
- Turn the host-based firewall off
- Turn off hide extensions for known file types
- Disable pop-up blocking
Now that our lab is configured we can take the first of three snapshots:
- Pre-Tools - a clean slate prior to installing any applications or malware
- Tools-Installed - static and behavioral analysis tools have been loaded on to the platform
- Malware-Installed - a saved state after malware has been executed so that we can analyze a particular sample again at a later time
Instrumentation
Before actual malware analysis can begin we need to arm ourselves with a suite of tools that will help us to debug, disassemble, monitor the registry, analyze process memory, identify file creation events, and watch network traffic. There are plenty of tools that you can use to perform these functions but some of the popular utilities and ones that I have found to be reliable are listed below:
Before actual malware analysis can begin we need to arm ourselves with a suite of tools that will help us to debug, disassemble, monitor the registry, analyze process memory, identify file creation events, and watch network traffic. There are plenty of tools that you can use to perform these functions but some of the popular utilities and ones that I have found to be reliable are listed below:
- Malcode Analysis Pack - a great tool that installs a number of useful analysis applications.
- OllyDBG - a debugger that will help for run-time analysis
- IDA Pro - a popular disassembler that can be used for static analysis of malicious binaries
- CaptureBat - behavioral analysis tool that can monitor file, process, and registry events
- PEiD - useful lightweight program that can help identify packed executables
- Import Reconstructor - can be used when we need to patch semi-functional recovered malware
- RegShot - for diff'ing registry changes
- 010 - fantastic hex editor for visually analyzing PE files
- 7ZIP - 'nuff said
- OLEDump - go-to tool for analyzing malicious OLE files (doc, xls, etc)
- Sysinternals - suite of applications that can help at various stages of the investigation
Load all of these applications onto the victim Windows guest OS. At this point you can take the second snapshot. This restore point can be used when you want to quickly roll back to a clean slate to analyze a new malware sample or re-run the same sample (but perform different actions and attempt to trigger different functionality).
Disclaimer: Although running malicious code in a virtual instance is relatively safe (assuming that networking and other services are properly configured) there are several well documented vulnerabilities that allow malicious code to escape the context of a Guest OS and infect a host system. You should operate under the assumption that the worst case scenario will happen and take the appropriate precautions. The host system running the virtual malware lab environment should be purpose-driven dedicated hardware (do not do this on your personal or corporate computer). I'd also advise that it is connected on a separate LAN/Guest WiFi/rocket stick to create an additional layer of separation.
Analysis
Disclaimer: Although running malicious code in a virtual instance is relatively safe (assuming that networking and other services are properly configured) there are several well documented vulnerabilities that allow malicious code to escape the context of a Guest OS and infect a host system. You should operate under the assumption that the worst case scenario will happen and take the appropriate precautions. The host system running the virtual malware lab environment should be purpose-driven dedicated hardware (do not do this on your personal or corporate computer). I'd also advise that it is connected on a separate LAN/Guest WiFi/rocket stick to create an additional layer of separation.
Analysis
Now that the lab is configured and we have installed the necessary tools to support and enable our investigation we will need to retrieve a suitable sample for analysis. There are a number of options here but the ones I have had experience with are:
- MWCrawler - this python program will crawl various public malware blacklists and will initiate connections and attempt to download malicious software from the URLs/domains. This is a great way to pull new and current samples without setting up a lot of infrastructure.
- VXCage/Viper - Viper is the successor to VXCage and is a self-described binary management framework to aid you in storing and analyzing malware samples. If you plan to operate honeypots and crawl for your own samples, Viper is a great addition to MWCrawler.
- Reverse.IT - an online service that will analyze malware samples for you, they also allow you to download samples.
- theZoo - a fairly well maintained collection of malware that you can easily download.
As I mentioned near the beginning of the post, for this article I am going to be taking a closer look at Win32/Potao which can be found here. The password to unzip is 'infected'.
The first thing I like to do is open the file in a hex editor and look at the file format headers to verify exactly what we are dealing with.
At a glance we can see that the first two bytes of this file '4D 5A' indicate that it is a PE or executable file. It's outside the scope of this course - but if you'd like to learn more about this file format that is so frequently encountered, you should read this link.
At offset 00170h we can see that the ASCII translation of this memory region contains the characters 'UPX0' and 'UPX1'. These values are flags that are used by the UPX packer during a stub decoding process. Packers are basically compression algorithms that can be used to obfuscate malicious files. UPX is one of the more popular packers freely available. If you'd like to learn more about packers and manual unpacking I'd recommend this paper.
Thankfully UPX comes with the ability to unpack a UPX packed sample. Before we go ahead and do that lets gather a hash, and look at strings in the PE file.
With MAP installed we can right click the sample and select 'strings':
There's not much useful data here. That's because the only 'unpacked' code is a loader routine called a 'stub' that is responsible for unpacking this sample. Strings analysis will not reveal much useful information at this point. This sample has extremely high entropy due to compression.
Again, with MAP we can conveniently right click on the sample and select 'MD5 Hash':
You can take this MD5 and run a search on Virus total to see if it has previously been analyzed. An increasing problem for traditional anti-virus solutions as that it is very easy for malware authors to generate unique hashes for the same malware strain by introducing pseudo-random seed values into a file. It is trivially simple to run a piece of malware through a proprietary crypter/packer and generate a unique sample. For that reason, most malware recovered during an incident will not be identified when searching based on a hash value. More information here. In our case, the sample was recovered several years ago and has been picked up by most AV vendors and cloud based scanning tools so you will in fact see lots of results if you search for this hash.
Let's unpack the sample so we can take a closer look at things before detonating the file.
The file has been successfully unpacked. If we rename the file and add the '.exe' extension Windows loads an icon:
This is a classic trick used in social engineering to get a user to execute an executable file without recognizing the file extension. You can read about how to do this here.
A quick peak at the unpacked code in a disassembler (such as IDA Pro) shows a small number of routines that are primarily designed to do nothing, waste processor cycles, and evade analysis engines. This is evidenced by nonsensical increment and decrement operations on the same register that serve no purpose:
Before we execute the file, start CaptureBat and ensure Wireshark is running on the Kali host. Now, when we run the program, visually the executable disappears and a new file appears with the same name. A new window opens and presents us with a document:
If we analyze the document we can see that the executable wrote a new legitimate text file to disk and opened it. The MZ header is gone. Again, this is a classic tactic designed to trick the user into thinking that the file executed as expected. If a user double clicked a word document and nothing happened they would be more suspicious.
The CaptureBat log file will reveal more information about what transpired during exection.
You can see some files are written to the Temp directory for my user profile. I'd like to take a closer look at the file 'utxm'.
Initially the file was hidden. After changing folder options and opening the file in 010 we see that it is also a PE file.
If we run strings against this file (this time in IDA) we have a lot more information to work with:
At this point there are many different directions that your investigation can go. Of course you can try to map out all functionality within the executable by starting from the entry point and walking through each CALL/JMP operation. I prefer to tackle specific problems first and strings can be used as a launching point. For example rundll32 may be used to load a DLL and inject it into arbitrary process space. CryptStringToBinaryA may be a part of the payloads encryption or decryption routines. NtEnumerateValueKey may be used to establish persistence by setting registry keys. The string 123456789abcdef may be used as a reference point in a decryption routine. You may want to walk down any one of these paths to start your investigation and see where it leads. If you are working an incident than time is of the essence and you should look for function calls to CreateMutexA and identify hard-coded strings that can be used as byte-code in a Yara rule. Our goal during a breach is to find reliable indicators of compromise with a low false-positive ratio so that responders can use this data to scan additional hosts in the environment for infection.
You may also want to map out network communication and anti-analysis techniques using a disassembler:
Atif Mushtaq has a nice write-up here describing how GetTickCount and other behaviors can be used to identify anti-VM and anti-debugging techniques employed by malware. Once you can recognize this functionality you'll be able to set breakpoints and patch instructions while executing a sample in a debugger such as Olly (effectively allowing you to bypass/circumvent these anti-analysis techniques).
While I haven't gone too deep into the analysis side of things, I hope that this post at least helps you to understand the general workflow and components required to manually analyze malicious software!
----
Edit: I wanted to update this post because I got some thoughtful questions on Reddit regarding this article and thought I'd share one of my responses here. To summarize, someone asked me what the real value of malware analysis is, where the jobs are for people with this skillset, and why they had some issues in the hiring process. My response is below.
In terms of demand, I don't think there is anyone looking specifically for malware researchers as an exclusive skillset. The market (despite the growing number of job opportunities overall in security) is growing increasingly complex and competitive at the top (re: research) end of things. There are a lot of people spending a lot of time and money devising complex solutions to very difficult problems in this industry.
Why is this? One of the drivers is that unlike other sciences, security has a unique component in that the rules governing our field of study have undergone significant change. This is exemplified quite well by the back and forth game of cat and mouse between security product vendors and 'hackers'. Human ingenuity is not perfect, and quite often it only takes the determination of another human to circumvent a particular security 'control' or 'feature'.
So what does this increase in complexity result in, for 'malware research' specifically? I think primarily two things; as an analyst you can be hyper-specialized and focus on increasingly complex but narrow subject matter to add value, or you can use this skill to compliment your overall knowledge and add value in other ways.
This is probably better illustrated with examples.
Yes - anti-virus researchers still employ a lot of malware analysts (and support analysts also need some of this skillset along with some IR). There are many approaches they use, some work on unpacking samples, some run executables and try to dump processes from memory. Mostly ALL of them nowadays work on finding ways to automate analysis and verify successful analysis while identifying outliers (which subsequently require closer manual inspection). The volume of data necessitates this approach. But you can see here that this is a combination of skillsets. Being a good programmer on it's own is OKAY. Being a good malware analyst on it's on is OK. But combining industry specific knowledge (malware analysis) with a particular skill (programming) creates greater overall value and allows you to solve greater problems. Threat research also helps to drive the product roadmap in an industry that is constantly undergoing change. You have to have your finger on the pulse to understand where things are going, and one of the best ways to do that is through vulnerability and malware research. The better I understand a problem, the better I can solve it.
Another example - incident response is a very particular skill that some would argue has a very different focus than a pure play malware analyst. You mentioned SIEM as well - this is a complex subject with a lot of knowledge requirements (networking stack, hardware provisioning, logging methods, log formats, solving specific SIEM problems like idle devices, learning the ins-and-outs of the specific SIEM application, compliance reporting requirements, etc). But I often see SIEM programs improperly implemented due to lagging knowledge and fundamental misunderstanding about how to effectively monitor one's environment. A malware researcher, with SIEM capability, can deliver ALOT of value to a SIEM program because they have a better understanding of what to look for. An incident responder is served well by having this knowledge as well.
So you see it's the combination of skill-sets that is generating value. Yes there is still a market for people who only know SIEM, yes there is still a market for people who only know Malware Analysis. But you are going to add a lot of value if you do both well, and you will be in demand.
Alternatively - I mentioned developing a niche skill set. As complexity increases, there are narrowing areas of focus in the research field especially that really require undivided attention. Think Hardware Reversing, IOS OSX Kernel Exploitation, etc. These are important areas of research that help to identify new vulnerable platforms and products that need to be fixed. Malware research by extension also grows increasingly complex and expands to these areas. So we will always need the pure play, highly focused ninjas to some extent.
In terms of the interviews you mentioned, of course all employers will be looking for a specific skill-set, and yours may not be entirely applicable, but your assessment is correct. SIEM is arguably much easier to pick up than assembly. A good employer will recognize desire to learn, good technical aptitude, and understands how different skills have a wonderful cross-over effect.
You mentioned that you have shown initiative cracking games - if an employer took this negatively try phrasing it differently or signing up for specific courses/certification programs. They can help lend legitimacy.
Research gives us information about Data which allows us to derive Analytics/Information.
From a career track perspective you will find people who have malware analysis skills in all sorts of jobs, from front line analysts to security product developers. Think if it as a skill to compliment another area of your development in the security space.