Introduction
Many articles, including my own, make references to code obfuscation. This article takes a look at what this actually is, as well as why it is often used by attackers to hide malicious code.
What is obfuscation?
Obfuscation is the process of converting something to a form where its meaning and purpose is less apparent. When related to computer source code, this can be done for many reasons; sometimes as a challenge, sometimes even for art!. However it can also be used for more sinister purposes which is what we’ll investigate today.
Most coders will tell you that writing clean, understandable, well-structured code is of utmost importance. As a coder myself I wholeheartedly agree! There are times however when we must eschew these rules. One such time is during minification or uglification.
Minification and uglification
While not technically obfuscation, scripts that are minified are often very difficult for a human reader to understand. This has a valid purpose; by removing all of the aids to human coders, file sizes can be reduced dramatically. This is important on the web in particular as more verbose code is larger and takes longer to download. You will often see JavaScript libraries with a name such as jquery.min.js
. This is a common convention and shows that the JavaScript code in this file has been through a minification process and is not meant to be human-readable. While minification makes code difficult to read it does not change the way in which a program works. For example, take the following script: -
/**
* A function which greets the user with an alert()
*
* @param string username
* @return null
*/
function greet_user(username) {
alert('Hi ' + username + "!");
}
greet_user("Simon");
This code contains comments and friendly function and variable names, so it is easy to understand. A minified version might look like this: -
function gu(e){alert("Hi "+e+"!")}gu("Simon");
This version does exactly the same thing, but its meaning is a lot less clear. This is an extremely simple example, but imagine how a large JS library would look when minified.
Minification is obfuscation with the purpose of reducing the unnecessary elements, at least from a computer’s point of view. The malicious form of obfuscation that we will be investigating here however serves an entirely different purpose, and indeed often makes the resulting scripts larger than the original.
Encryption
Obfuscation is sometimes confused with encryption, however they are fundamentally different things. Encryption hides a payload by converting it to a format that is seemingly random and which can only be decoded using a secret key. Obfuscation on the other hand simply hides the payload by making it difficult to read and by converting it from one format to another. One important difference is that an encrypted script would not be executable directly without first decrypting it with a secret key. Obfuscated scripts on the other hand should be executable directly and should produce exactly the same results as the original script.
This does not mean that encryption is not used to hide malicious code however, it certainly is. If code is encrypted there needs to be a way for the attacker to decrypt this code before executing it on the target computer which means that a much higher level of sophistication is required for this type of attack.
Types of obfuscation
There are a number of techniques used to obfuscate scripts, some of which are language-specific and some which will work in almost any language.
String escape sequences
Say that an attacker wants to redirect the user to a new URL, something like "http://evil.com/malware.php"
. The attacker will want to include this URL in a malicious script somewhere, but leaving a string like that in plain sight is fairly conspicuous. Now consider the following string, "\x68\x74\x74\x70\x3a\x2f\x2f\x65\x76\x69\x6c\x2e\x63\x6f\x6d\x2f\x6d\x61\x6c\x77\x61\x72\x65\x2e\x70\x68\x70"
. This, perhaps surprisingly, is exactly the same.
Strings are simply a list of characters, and each character internally is just a number. Computers use an encoding scheme to translate these numbers to actual characters. This used to be ASCII, however it is now more common to use UTF-8 so that more characters can be encoded if necessary. When writing a script, we don’t have to know what these numbers are, we can simply use characters such as “abc” and the script interpreter knows how to convert them. There are some characters that we might want to use for which a symbol does not exist, such as a control character. In order to insert a control character into a string, most languages offer an escape sequence, allowing the character’s code to be entered directly. So in the previous example, the standard printable characters have been replaced by hexadecimal escape sequences which represent those same characters. To the computer, the string will end up being a string of those numbers in both examples, but to a human reader the meaning of the string is a lot harder to discern; it has been obfuscated.
Base64
Sometimes a binary file needs to be communicated over a channel that can only accept standard text. An example of such a situation is e-mail; attachments can be binary files such as images but the e-mail format can only contain text. Therefore we need a way to encode this binary data using only printable characters. One such encoding scheme is base64.
Many languages, including PHP, contain functions (base64_decode()
and base64_encode()
) which can be used to decode and encode base64 data. Therefore another common technique for obfuscating malicious code is to encode it as base64 and include it in another script as a string. The container script can then decode this string with base64_decode()
and then pass the resulting PHP script into eval()
which will execute it. Perhaps this string could also be an encoded executable file which the script then saves to disk and executes.
Imagine a script which is commonly used to exploit a web server. As it is commonly used, most people know what it looks like, its file size, its checksum, etc. Therefore it is very conspicuous and can be detected easily. One way to avoid this simple detection is to encode this script using base64, and then decode and execute it when needed as discussed.
Dynamically building code to execute
Scripting languages are very versatile, even allowing code to be generated and executed within a script. We’ve already discussed how base64 can be used to encode a PHP script which can then be executed by the eval()
function. But why stop with base64? If we can execute any string as a script, why not use the language itself to construct this script?
There are many different ways of building a script dynamically, the only real limitation is the inventiveness of the coder. For example, we could define a string like this in PHP: -
$s = "s4na_cdob;qce26)*zja(";
We could then execute the following code: -
eval("{$s[8]}{$s[3]}{$s[0]}{$s[12]}{$s[14]}{$s[1]}{$s[4]}{$s[6]}{$s[12]}{$s[5]}{$s[7]}{$s[6]}{$s[12]}{$s[20]}{$s[15]}{$s[9]}");
Which is equivalent to: -
eval("base64_decode();");
So here we have managed to get the base64_decode()
function to be executed (as an example) without resorting to string escape sequences or base64-encoded code. If we were wise to the first two techniques then we might be able to spot them, but this third technique is worse as the sequence is decided entirely by the attacker. We might note down the base64-encoded version of a string or the specific series of escape sequences so that we can spot them more easily, but here we can’t blindly follow any such rule.
Of course this is only one example, and many other dynamically generated code tricks exist. What’s important to note however is that this is all perfectly valid code; it has to be otherwise the obfuscated code would not work. So even though this might be difficult us to read, we can with time work through the code (as a computer would) in order to de-obfuscate it and determine its purpose.
Obfuscation as a service
One thing that should be apparent from the above techniques is that obfuscated scripts are not just difficult to understand for a human being, they are also difficult to write, especially with longer scripts! Just as with minification, there are online services which exist to do the dirty work for us.
One such service is the Free Online PHP Obfuscator (FOPO). Why not give it a try with your own scripts and then have a go at reversing it? If you’re like me then you’ll consider this a fun challenge, but then again perhaps you have a better ways of spending your time :)
Why is obfuscation used?
So why go to all the effort of obfuscating malicious code? Surely the most important thing is to get the code onto the target server in a way that it can be executed? Well, this is of course the ultimate goal for an attacker, but most system administrators are wise to such tactics. As mentioned previously, it is a simple matter to search through files for a keyword or for a known malicious piece of code. So obfuscation is used primarily to hide this malicious code making scanning more difficult.
It should be pointed out however that this obfuscation in itself can be conspicuous; it should be clear to anybody with any coding experience that the examples above do not look like ordinary code. For example, why would a normal script bother to escape a string of normal printable characters, especially when it takes more space by doing so? By knowing what to look for, these techniques can make a malicious script stand out like a sore thumb.
For those who are not experienced coders however, obfuscation can certainly cause confusion. There are are circumstances where legitimate scripts will contain base64-encoded data, strings made entirely of escape sequences and other such suspicious code. This does not mean that they have been compromised, and removing this code will stop the script from working as expected. So obfuscation can be effective at hiding malicious code amongst legitimate code.
Conclusion
We have explored some of the main techniques of obfuscation in this article. Hopefully this has helped you to identify scripts that could contain hidden malicious code. We have also shown that obfuscation does not change the functionality of a script, so however heavily a script is obfuscated, its purpose can still be ascertained, albeit with some work.
While writing articles in this blog, I try to de-obfuscate scripts wherever possible so that it is easier to understand what they are doing. This is not a magic trick, any obfuscated script can be decoded in the same way, and I encourage anyone who is interested in coding to try this. Feel free to share any de-obfuscated scripts with me and I will share them here.
Hopefully this article will help somebody; as usual feel free to contact me with any comments, questions or suggestions.