Home RC4 Crypto Usage in Malware
Post
Cancel

RC4 Crypto Usage in Malware

Table of Contents

Intro

During a deep dive analysis of a recent sample, another case of using RC4 came up and I thought it was a good example to show how it appeared in the malware. The RC4 encryption algorithm is encountered quite often in malware due to it’s ease of use and ability to hide strings and information and quickly change hashes in samples (sometimes changed dynamically per infection).

The well-known wiki page with pseudo-code is found here:
https://en.wikipedia.org/wiki/RC4

From the pseudo-code on the wiki I translated it to python that can be used to test data and keys with:
rc4.py

The goal is to be able to spot the use of it very quickly and due to its small KSA/PRGA looping algorithm it is typically easy to spot. Examples below show a specific case of this to help see it in a real-world sample.

The malware sample used in this blog post

The KSA and PRGA functions

The Key-scheduling algorithm (KSA) and the Pseudo-random generation algorithm (PRGA) may be all in one call or split into two separate calls. In the case of this sample it is observed to be a single call.

The RC4 encryption can be thought of as two phases:
1) Generating the S-box array by initializing it with a key
2) Using the S-box array as a stream cipher to apply onto any data you want to encrypt or decrypt

Observing call into function

The key pieces of information that the RC4 algorithm will be looking for is the data, the length of the data, and the key that will be used when initializing the stream cipher. When debugging live, we can see from the call into the function a pointer to the MZ byte at the start of the data that will be encrypted in ecx, the length of this data in edx, and finally the key has been identified in [esp] with the bytes highlighted in the Dump 1 window.

Key-scheduling algorithm (KSA)

KSA identity permutation (initialization)

Pseudo-code

1
2
3
for i from 0 to 255
    S[i] := i
endfor

When looking for the presence of RC4 there are a few indicators to look for:
1) Loops for 256 iterations (look for compares of 100h or sometimes FFh)
2) An array being populated with the value of a counter (0, 1, 2, 3, …)

When debugging live, we can see the buffer that was allocated with the series of bytes from 00 to FF. In this algorithm, the array of bytes is defined as “S” and is often referred to as the “s-box”.

KSA loop to mix in key bytes

Pseudo-code

1
2
3
4
5
j := 0
for i from 0 to 255
    j := (j + S[i] + key[i mod keylength]) mod 256
    swap values of S[i] and S[j]
endfor

You now want to look for a second loop across the same array that:
1) Loops for 100h iterations like before
2) References bytes from the key

The result of this second loop will scramble the s-box array to form the final array that will be used by the PRGA algorithm that will generate the bytes to XOR with the data to be encrypted or decrypted.

Pseudo-random generation algorithm (PRGA)

Pseudo-code

1
2
3
4
5
6
7
8
9
i := 0
j := 0
while GeneratingOutput:
    i := (i + 1) mod 256
    j := (j + S[i]) mod 256
    swap values of S[i] and S[j]
    K := S[(S[i] + S[j]) mod 256]
    output K
endwhile

The PRGA algorithm is where the magic happens and the bytes to XOR with your data bytes happen.

There are several indicators to look for when trying to identify this in the assembly:
1) This will be a third loop but will continue to the length of the data
2) The loop will be iterating over each byte in the data
3) At the very end of the loop you should see a XOR operation against the generated PRGA byte and a byte from the data

As this loop iterates you can watch the plaintext bytes in memory slowly encrypt.

Conclusion

Since RC4 is used so heavily in malware it is important to be able to identify this quickly in static analysis reviews. Fortunately, due to the very distinct sequence of 100h loops it is usually easy to identify and confirm.

This post is licensed under CC BY 4.0 by the author.