
## Decoding a File Encoded Using an Unknown Encoding Algorithm/Methodology

**Author**: 
<br>
Twitter: https://twitter.com/MalwareRE
<br>
GitHub: https://github.com/MalwareRE
<br><br>
**References**:
<br>
https://twitter.com/cyb3rops/status/1270402099025715200
<br>
https://twitter.com/MalwareRE/status/1270484644178386944
<br><br>
**Notebooks Hosted on Binder:**
<br>
https://mybinder.org/v2/gh/malwarere/notebooks/master
<br><br>
**Summary:**
This notebook walks through the thought process and two approaches used to decode an encoded file shared by Florian Roth (@cyb3rops) in the following tweet: https://twitter.com/cyb3rops/status/1270402099025715200. After decoding the file and sharing the decoding XOR-ADD key (https://twitter.com/MalwareRE/status/1270484644178386944), I was asked by multiple folks to share my approach and thought process that went into decoding the file. I tried to fit my approach in a series of tweet but decided to capture everything in a single interactive notebook instead (be sure to use the Binder link above and in my GitHub repo to launch/run an interactive version of this notebook). I was not able to carve out enough time to perfect the approach, code, contents, etc. shared in this notebook (will hopefully find time to revisit this notebook) so treat everything as raw, imperfect and optimizable (feedback, suggestions and comments are always welcomed and appreciated). With that being said let's get started!
<br><br>


In [1]:
# we will use this function later in this notebook
import pandas as pd
import matplotlib

def hex_dump(data):
    #Hex Dump
    line_num = 0
    line_width = 46
    for byte_array in [data[i:i+16] for i in range(0, len(data), 16)]: #or use islice
        s_hex = " ".join([f"{b:02x}" for b in byte_array])    
        s_ascii = "".join([chr(b) if 32 <= b <= 127 else "." for b in byte_array])
        print(f"{line_num * 16:08x}  {s_hex:<{line_width}}  {s_ascii}")
        line_num += 1

<br>
First thing first, let's check out the header of the file shared by Florian.

In [2]:
ciphertext = (b'\xA9\xA4\x6E\xFE\xF3\xFE\xFE\xFE\xF2\xFE\xFE\xFE\xFF\xFF\xFE\xFE'
b'\x46\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xBE\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\x1E\xFE\xFE\xFE'
b'\xE8\xDF\x44\xE8\xFE\x42\xF5\x29\xDD\x46\xFD\xAA\x29\xDD\xA2\x96'
b'\x95\x83\xDE\x8E\x8C\x8F\x97\x8C\x9D\x89\xDE\x93\x9D\x88\x88\x8F'
b'\x82\xDE\x9C\x91\xDE\x8C\x81\x88\xDE\x95\x88\xDE\xB2\xAF\xA3\xDE'
b'\x89\x8F\x92\x91\xC8\xE9\xE9\xF4\xD2\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\x13\xFC\x59\xE2\x57\x93\x03\xB7\x57\x93\x03\xB7\x57\x93\x03\xB7'
b'\x48\xDB\x90\xB7\x40\x93\x03\xB7\x48\xDB\x8E\xB7\x29\x93\x03\xB7'
b'\x48\xDB\x87\xB7\x61\x93\x03\xB7\x7E\x51\x76\xB7\x4A\x93\x03\xB7'
b'\x57\x93\x0C\xB7\x23\x93\x03\xB7\x48\xDB\x85\xB7\x50\x93\x03\xB7'
b'\x48\xDB\x9D\xB7\x50\x93\x03\xB7\x48\xDB\x9C\xB7\x50\x93\x03\xB7'
b'\xAC\x95\x93\x96\x57\x93\x03\xB7\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\xAE\xB1\xFE\xFE\xAA\xFD\xF1\xFE\xCF\x61\x16\xA7\xFE\xFE\xFE\xFE'
b'\xFE\xFE\xFE\xFE\x1E\xFE\xFC\xDD\xEB\xFD\xF5\xFE\xFE\x1C\xFE\xFE'
b'\xFE\x90\xFE\xFE\xFE\xFE\xFE\xFE\x2C\xC6\xFE\xFE\xFE\xEE\xFE\xFE'
b'\xFE\xFE\xFD\xFE\xFE\xFE\xFE\xEE\xFE\xEE\xFE\xFE\xFE\xFC\xFE\xFE'
b'\xF1\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xF1\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\xFE\x5E\xFD\xFE\xFE\xF2\xFE\xFE\x9A\x87\xFD\xFE\xFC\xFE\xBE\xFD'
b'\xFE\xFE\xEE\xFE\xFE\xEE\xFE\xFE\xFE\xFE\xEE\xFE\xFE\xEE\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xEE\xFE\xFE\xFE\x8E\xC7\xFD\xFE\xAD\xFE\xFE\xFE'
b'\x52\xC9\xFD\xFE\x86\xFE\xFE\xFE\xFE\x8E\xFD\xFE\x42\xFD\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\xFE\x7E\xFD\xFE\x76\xEA\xFE\xFE\x1E\xFD\xFD\xFE\xDA\xFE\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\x56\xDC\xFD\xFE\xBE\xFE\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFD\xFE\x56\xFD\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xC8\x82\x91\x86\x82\xFE\xFE\xFE'
b'\x49\x1D\xFE\xFE\xFE\xEE\xFE\xFE\xFE\x1C\xFE\xFE\xFE\xF2\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xDE\xFE\xFE\x9E'
b'\xC8\x8C\x92\x9D\x82\x9D\xFE\xFE\x3D\xC7\xFE\xFE\xFE\xFE\xFD\xFE'
b'\xFE\xC6\xFE\xFE\xFE\x10\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xBE\xFE\xFE\xBE\xC8\x92\x9D\x82\x9D\xFE\xFE\xFE'
b'\xBA\xCA\xFE\xFE\xFE\xBE\xFD\xFE\xFE\xEC\xFE\xFE\xFE\xD8\xFD\xFE'
b'\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xBE\xFE\xFE\x3E'
b'\xC8\x8C\x83\x8C\x93\xFE\xFE\xFE\x42\xFD\xFE\xFE\xFE\x8E\xFD\xFE'
b'\xFE\xFC\xFE\xFE\xFE\xCE\xFD\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xBE\xFE\xFE\xBE\xC8\x8C\x91\x8A\x8F\x93\xFE\xFE'
b'\x90\xE6\xFE\xFE\xFE\x7E\xFD\xFE\xFE\xE4\xFE\xFE\xFE\xCC\xFD\xFE')
print()
hex_dump(ciphertext)
print()


00000000  a9 a4 6e fe f3 fe fe fe f2 fe fe fe ff ff fe fe  ..n.............
00000010  46 fe fe fe fe fe fe fe be fe fe fe fe fe fe fe  F...............
00000020  fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe  ................
00000030  fe fe fe fe fe fe fe fe fe fe fe fe 1e fe fe fe  ................
00000040  e8 df 44 e8 fe 42 f5 29 dd 46 fd aa 29 dd a2 96  ..D..B.).F..)...
00000050  95 83 de 8e 8c 8f 97 8c 9d 89 de 93 9d 88 88 8f  ................
00000060  82 de 9c 91 de 8c 81 88 de 95 88 de b2 af a3 de  ................
00000070  89 8f 92 91 c8 e9 e9 f4 d2 fe fe fe fe fe fe fe  ................
00000080  13 fc 59 e2 57 93 03 b7 57 93 03 b7 57 93 03 b7  ..Y.W...W...W...
00000090  48 db 90 b7 40 93 03 b7 48 db 8e b7 29 93 03 b7  H...@...H...)...
000000a0  48 db 87 b7 61 93 03 b7 7e 51 76 b7 4a 93 03 b7  H...a...~Qv.J...
000000b0  57 93 0c b7 23 93 03 b7 48 db 85 b7 50 93 03 b7  W...#...H...P...
000000c0  48 db 9d b7 50 93 03 b7 48 db 9c b7 50 93 03 b7  H...P...H...P...
000000d0  a

Quick eyeballing (or frequency analysis) of the bytes above suggests that the file was possibly XOR encoded using 0xfe as XOR key. Let's XOR decode the encoded bytes above:

In [3]:
xor_decoded = list()
for c in ciphertext:
    xor_decoded.append(c ^ 0xFE)

print()
hex_dump(xor_decoded)
print()


00000000  57 5a 90 00 0d 00 00 00 0c 00 00 00 01 01 00 00  WZ..............
00000010  b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00  ........@.......
00000020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000030  00 00 00 00 00 00 00 00 00 00 00 00 e0 00 00 00  ................
00000040  16 21 ba 16 00 bc 0b d7 23 b8 03 54 d7 23 5c 68  .!......#..T.#\h
00000050  6b 7d 20 70 72 71 69 72 63 77 20 6d 63 76 76 71  k} prqircw mcvvq
00000060  7c 20 62 6f 20 72 7f 76 20 6b 76 20 4c 51 5d 20  | bo rv kv LQ] 
00000070  77 71 6c 6f 36 17 17 0a 2c 00 00 00 00 00 00 00  wqlo6...,.......
00000080  ed 02 a7 1c a9 6d fd 49 a9 6d fd 49 a9 6d fd 49  .....m.I.m.I.m.I
00000090  b6 25 6e 49 be 6d fd 49 b6 25 70 49 d7 6d fd 49  .%nI.m.I.%pI.m.I
000000a0  b6 25 79 49 9f 6d fd 49 80 af 88 49 b4 6d fd 49  .%yI.m.I...I.m.I
000000b0  a9 6d f2 49 dd 6d fd 49 b6 25 7b 49 ae 6d fd 49  .m.I.m.I.%{I.m.I
000000c0  b6 25 63 49 ae 6d fd 49 b6 25 62 49 ae 6d fd 49  .%cI.m.I.%bI.m.I
000000d0  52

If you have ever dealt with PE files in hexadecimal/binary format you would be able to recognize that the bytes above resemble a PE header structure (albeit an obfuscated one). Let's walk the structure to see if we can glean any useful information that can help us with decoding the file (the structure of the encoded PE header and certain visible bytes above suggest that we are not dealing with a Pascal, Delphi or an uncommon file type hence we can make certain assumptions about the presumed/potential location of certain bytes in the PE header, more on that below). I have already highlighted some of the interesting fields that can assist with creating a character mapping table that maps the highlighted bytes to their potential plaintext values. If we can map a decent number of bytes to their plaintext values, we might be able to learn something about the encoding methodology used to encode the file.

![https://github.com/MalwareRE/notebooks/raw/master/img/PE.png](https://github.com/MalwareRE/notebooks/raw/master/img/PE.png)

**Legend & Highlighted Offsets:**
- 0x00-0x01: 0x57 0x5a (IMAGE_DOS_HEADER.e_magic) maps to plaintext bytes 0x4d 0x5a
- 0x02-0x03: 0x90 0x00 (IMAGE_DOS_HEADER.e_cblp) would usually map to 0x90 0x00 (could also be for example 0x50 0x00)
- 0x04-0x05: 0xd0 0x00 (IMAGE_DOS_HEADER.e_cp) would usually map to 0x03 0x00
- 0x08-0x09: 0x0c 0x00 (IMAGE_DOS_HEADER.e_cparhdr) would usually map to 0x04 0x00
- 0x10-0x11: 0xb8 0x00 (IMAGE_DOS_HEADER.e_sp) would usually map to 0xb8 0x00
- 0x18-0x19: 0x40 0x00 (IMAGE_DOS_HEADER.e_lfarlc) would usually map to 0x40 0x00
- 0x3c-0x3f: 0xe0 0x00 0x00 0x00 (IMAGE_DOS_HEADER.e_lfanew) contains the file offset of NT Headers (aka PE signature), as shown above, offset 0xe0 does appear to contain the PE signature hence we can assume that 0xe0 maps to 0xe0
- 0x4e-0x74: 0x5c .... .... 0x36 (part of DOS Stub) would usually map to "This program cannot be run in DOS mode." (there are also a few 16-bit x86 real-mode machine code bytes before/after this offsets that we can safely use)
- 0xd0-0xde: 0x52 0x6b 0x6d 0x68 (part of the Rich header) would usually map to 0x52 0x69 0x63 0x68 ("Rich")
- 0xe0-0xe1: 0x50 0x4f 0x00 0x00 (IMAGE_NT_HEADERS.Signature) would usually map to 0x50 0x45 0x00 0x00 ("PE\0\0")
- 0xe6-0xe7: 0xf0 0x00 (IMAGE_FILE_HEADER.NumberOfSections) contains
- 0xf4-0xf5: 0xe0 0x00 (IMAGE_FILE_HEADER.SizeOfOptionalHeader) would usually map to 0xe0 0x00
- 0xf8-0xf9: 0x15 0x03 (IMAGE_OPTIONAL_HEADER.Magic) would usually map to either 0xb1 0x01 (PE32) or 0xb1 0x02 (PE64), either way we can map 0x15 to 0xb1
- 0x1d8-0x1dc: 0x36 0x7c 0x6f 0x78 0x7c would usually map to 0x2e 0x74 0x65 0x78 0x74 (".text"), note the presence of 0x7c at 2nd and 5th positions of the encoded bytes which would match the repeating t's in .text
- 0x200-0x205: 0x36 0x72 0x6C 0x63 0x7C 0x63 would usually map to 0x2e 0x72 0x64 0x61 0x74 0x61 (".rdata"), note the presence of 0x63 at 4th and 6th positions of the encoded bytes which would match the repeating a's in .rdata
- 0x228-0x22c: 0x36 0x6c 0x63 0x7c 0x63 would usually map to 0x2e 0x64 0x61 0x74 0x61 (".data")
- 0x250-0x254: 0x36 0x72 0x7d 0x72 0x6d would usually map to 0x2e 0x72 0x73 0x72 0x63 (".rsrc"), we already know from above that the encoded byte 0x72 maps to 'r', there are two 0x72 at 2nd and 4th positions hence ".rsrc" should be a safe guess
- 0x278-0x27d: 0x36 0x72 0x6f 0x74 0x71 0x6d would usually map to 0x2e 0x72 0x65 0x6C 0x6f 0x63 (".reloc"), we have already mapped 'r', 'e' and 'c' above so ".reloc" should be a safe guess
- 0xf4-0xf5: 0xe0 0x00 (IMAGE_FILE_HEADER.NumberOfSections) should map to 0x05 0x00 since we have 5 decoded 5 section names above

There are other fields in the PE header that we might be able to map with lower confidence if needed but for now let's use the character mappings we have discovered above to create a Pandas DataFrame named `df_all_known_mappings` (where the Ciphertext column contains bytes from the encoded file shared by Florian, XOR_Decoded column contains the XOR 0xFE decoded bytes used above and Plaintext column contains the plaintext bytes we have discovered above):


In [4]:
d = {'Ciphertext': [0xFE, 0xFF, 0xEB, 0xE9, 0xDE, 0xDD, 0xD6, 0xC8, 0xBE, 0xBC, 0xB1, 0xAE, 0xAF, 0xAC, 0xAA, 0xA9, 0x9E, 
                    0x9C, 0x9D, 0x96, 0x97, 0x8E, 0x8F, 0x8C, 0x8A, 0x88, 0x89, 0x86, 0x6E, 0x93, 0xF4, 0xF2, 0xF3, 0xF1, 
                    0xD2, 0xB2, 0xB1, 0xA4, 0xA2, 0xA3, 0x95, 0x92, 0x91, 0x82, 0x83, 0x81, 0x3E, 0x29, 0x1E],
    'XOR_Decoded': [0x00, 0x01, 0x15, 0x17, 0x20, 0x23, 0x28, 0x36, 0x40, 0x42, 0x4F, 0x50, 0x51, 0x52, 0x54, 0x57, 0x60, 
                    0x62, 0x63, 0x68, 0x69, 0x70, 0x71, 0x72, 0x74, 0x76, 0x77, 0x78, 0x90, 0x6D, 0x0A, 0x0C, 0x0D, 0x0F, 
                    0x2C, 0x4C, 0x4F, 0x5A, 0x5C, 0x5D, 0x6B, 0x6C, 0x6F, 0x7C, 0x7D, 0x7F, 0xC0, 0xD7, 0xE0], 
     'Plaintext':  [0x00, 0xFF, 0x0B, 0x0D, 0x20, 0x21, 0x28, 0x2E, 0x40, 0x42, 0x45, 0x50, 0x4F, 0x52, 0x4C, 0x4D, 0x60, 
                    0x62, 0x61, 0x68, 0x67, 0x70, 0x6F, 0x72, 0x6C, 0x6E, 0x6D, 0x78, 0x90, 0x63, 0x0A, 0x4, 0x3, 0x5, 
                    0x24, 0x44, 0x45, 0x5A, 0x54, 0x53, 0x69, 0x64, 0x65, 0x74, 0x73, 0x75, 0xC0, 0xCD, 0xE0]}

df_all_known_mappings = pd.DataFrame(data=d).sort_values(by=['XOR_Decoded']).reset_index(drop=True)
df_all_known_mappings.style.format({'Ciphertext':"{:#x}", 'XOR_Decoded':"{:#x}", 'Plaintext':"{:#x}"})

Unnamed: 0,Ciphertext,XOR_Decoded,Plaintext
0,0xfe,0x0,0x0
1,0xff,0x1,0xff
2,0xf4,0xa,0xa
3,0xf2,0xc,0x4
4,0xf3,0xd,0x3
5,0xf1,0xf,0x5
6,0xeb,0x15,0xb
7,0xe9,0x17,0xd
8,0xde,0x20,0x20
9,0xdd,0x23,0x21


Closer inspection of all XOR_Decoded-Plaintext pairs above revealed that the distance between the numbers in each pair ranges between 0 and 10. To get a better idea of the distance/difference/sub values, we can subtract each XOR_Decoded value from its corresponding Plaintext value and store the resulting value under a new column named Sub:


In [5]:
df_all_known_mappings['Sub'] = (df_all_known_mappings['Plaintext'] - df_all_known_mappings['XOR_Decoded'] & 0xFF) # using 0xFF as mask, we can also use np.ubyte (or a list of ctypes c_ubyte's) to store bytes/chars as unsigned char
df_all_known_mappings[['XOR_Decoded', 'Plaintext', 'Sub']].style.format({'XOR_Decoded':"{:#x}", 'Plaintext':"{:#x}", 'Sub':"{:#x}"})

Unnamed: 0,XOR_Decoded,Plaintext,Sub
0,0x0,0x0,0x0
1,0x1,0xff,0xfe
2,0xa,0xa,0x0
3,0xc,0x4,0xf8
4,0xd,0x3,0xf6
5,0xf,0x5,0xf6
6,0x15,0xb,0xf6
7,0x17,0xd,0xf6
8,0x20,0x20,0x0
9,0x23,0x21,0xfe


A closer inspection of the values in the Sub column above revealed a repeating pattern consisting of 0xfe, 0xf8, 0xf6, and 0. Another notable observation is the relationship between the least-significant octet (LSO) of the values in the XOR_Decoded column and their corresponding values in the Sub column. To make the pattern in the Sub column more noticeable/visible, let's add a colormap to the dataframe (if you the colormap is not visible e.g. if you are viewing this on Github, try downloading a copy of this notebook or use the Binder link in the header of this notebook to launch an interactive version of this notebook):

In [6]:
df_all_known_mappings[['XOR_Decoded', 'Plaintext', 'Sub']].style.format({'Ciphertext':"{:#x}", 'XOR_Decoded':"{:#x}", 'Plaintext':"{:#x}", 'Sub':"{:#x}"}).background_gradient(cmap='plasma')

Unnamed: 0,XOR_Decoded,Plaintext,Sub
0,0x0,0x0,0x0
1,0x1,0xff,0xfe
2,0xa,0xa,0x0
3,0xc,0x4,0xf8
4,0xd,0x3,0xf6
5,0xf,0x5,0xf6
6,0x15,0xb,0xf6
7,0x17,0xd,0xf6
8,0x20,0x20,0x0
9,0x23,0x21,0xfe


As depicted above, there is a direct relationship between the least-significant octet (LSO) of the values in the XOR_Decoded column and their corresponding values in the Sub column. For example, if the LSO of the XOR_Decoded value is 4 (e.g. in 0x04, 0x14, 0x24, etc.) then by adding 0xF8 (or its signed char equivalent -0x8) to the XOR_Decoded value we can obtain its plaintext value!

**Formula:**
<br>
Plaintext = XOR_Decoded + Sub
<br><br>
**Example:**
<br>
XOR_Decoded: 0x54
<br>
LSO: 4  --> Sub: 0xf8
<br>
Plaintext = 0x54 + 0xf8 = 0x4c
<br><br>
The observed relationship between the LSO and Sub values are captured in the table below (note the repeating and symmetrical pattern between 0x0-0x7 and then again 0x8-0xf):


| LS Octet 	| Sub Value 	|
|:--------:	|:---------:	|
|     0    	|     0     	|
|     1    	|     0xfe    	|
|     2    	|     0     	|
|     3    	|     0xfe    	|
|     4    	|     0xf8    	|
|     5    	|     0xf6    	|
|     6    	|     0xf8    	|
|     7    	|     0xf6    	|
| -------- 	| --------- 	|
|     8    	|     0     	|
|     9    	|     0xfe    	|
|    0xa   	|     0     	|
|    0xb   	|     0xfe    	|
|    0xc   	|     0xf8    	|
|    0xd   	|     0xf6    	|
|    0xe   	|  Unknown  	|
|    0xf   	|     0xf6    	|


Since our known character mapping table does not contain an XOR_Decoded value that ends with the octet 0xe (i.e. LSO set to 0xe), the Sub value for 0xe is intentionally set to Unknown in the table above; however, using symmetry to our advantage, we can see that the Sub value of 0xe is most likely identical to that of 6 (i.e. 0xf8 from the table above).

At this point we can use the LSO-Sum mapping in the table above to generate a new dataframe that maps ***all*** possible XOR_Decoded bytes (i.e. 0x00-0xFF) to their corresponding Sum values and better yet, to their corresponding plaintext values!

In [7]:
lso_sub_mapping = {
    0: 0,
    1: 0xfe,
    2: 0,
    3: 0xfe,
    4: 0xf8,
    5: 0xf6,
    6: 0xf8,
    7: 0xf6,

    8: 0,
    9: 0xfe,
    0xa: 0,
    0xb: 0xfe,
    0xc: 0xf8,
    0xd: 0xf6,
    0xe: 0xf8,
    0xf: 0xf6
}

sub_bytes = list()
plaintext_bytes = list()

for xor_decoded_byte in range(0, 256):
    sub_byte = lso_sub_mapping[xor_decoded_byte & 0xf]
    sub_bytes.append(sub_byte)
    plaintext_byte = (xor_decoded_byte + sub_byte ) & 0xFF # AND with 0xFF to discard carry out
    plaintext_bytes.append(plaintext_byte)

d = {'XOR_Decoded': range(0, 256), 'Plaintext':plaintext_bytes, 'Sub': sub_bytes}
df_final = pd.DataFrame.from_dict(data=d,dtype='int').sort_values(by=['XOR_Decoded']).reset_index(drop=True)

df_final.style.format({'XOR_Decoded':"{:#x}", 'Plaintext':"{:#x}", 'Sub':"{:#x}"}).background_gradient(cmap='plasma')

Unnamed: 0,XOR_Decoded,Plaintext,Sub
0,0x0,0x0,0x0
1,0x1,0xff,0xfe
2,0x2,0x2,0x0
3,0x3,0x1,0xfe
4,0x4,0xfc,0xf8
5,0x5,0xfb,0xf6
6,0x6,0xfe,0xf8
7,0x7,0xfd,0xf6
8,0x8,0x8,0x0
9,0x9,0x7,0xfe


As depicted above, the repeating Sub pattern is now easier to spot. Now let's bring back the Ciphertext column to view the full mapping between all Ciphertext, XOR_Decoded and Plaintext values!

In [8]:
df_final.insert(loc=0, column='Ciphertext', value=df_final['XOR_Decoded'] ^ 0xFE)

df_final = df_final.sort_values(by=['Ciphertext']).reset_index(drop=True)

df_final.style.format({'Ciphertext':"{:#x}", 'XOR_Decoded':"{:#x}", 'Plaintext':"{:#x}", 'Sub':"{:#x}"}).background_gradient(cmap='plasma')

Unnamed: 0,Ciphertext,XOR_Decoded,Plaintext,Sub
0,0x0,0xfe,0xf6,0xf8
1,0x1,0xff,0xf5,0xf6
2,0x2,0xfc,0xf4,0xf8
3,0x3,0xfd,0xf3,0xf6
4,0x4,0xfa,0xfa,0x0
5,0x5,0xfb,0xf9,0xfe
6,0x6,0xf8,0xf8,0x0
7,0x7,0xf9,0xf7,0xfe
8,0x8,0xf6,0xee,0xf8
9,0x9,0xf7,0xed,0xf6


As this point, we have a table that maps each ciphertext/encoded value to its corresponding plaintext value. Let's grab the first 4 bytes of the file shared by Florian and use the mapping table above to decode the bytes:

**Ciphertext** = 0xa9 0xa4 0x6e 0xfe
<br>
0xa9 --> 0x4d
<br>
0xa4 --> 0x5a
<br>
0x6e --> 0x90
<br>
0xfe --> 0x00
<br><br>
**Plaintext** = 0x4d 0x5a 0x90 0x00 (MZ\x90\x00), ***SUCCESS!***
<br><br><br>
Now let's use the char mapping/lookup table to decode the first 256 bytes of the encoded file:

In [9]:
plaintext = list()
ciphertext = (b'\xA9\xA4\x6E\xFE\xF3\xFE\xFE\xFE\xF2\xFE\xFE\xFE\xFF\xFF\xFE\xFE'
b'\x46\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xBE\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\x1E\xFE\xFE\xFE'
b'\xE8\xDF\x44\xE8\xFE\x42\xF5\x29\xDD\x46\xFD\xAA\x29\xDD\xA2\x96'
b'\x95\x83\xDE\x8E\x8C\x8F\x97\x8C\x9D\x89\xDE\x93\x9D\x88\x88\x8F'
b'\x82\xDE\x9C\x91\xDE\x8C\x81\x88\xDE\x95\x88\xDE\xB2\xAF\xA3\xDE'
b'\x89\x8F\x92\x91\xC8\xE9\xE9\xF4\xD2\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\x13\xFC\x59\xE2\x57\x93\x03\xB7\x57\x93\x03\xB7\x57\x93\x03\xB7'
b'\x48\xDB\x90\xB7\x40\x93\x03\xB7\x48\xDB\x8E\xB7\x29\x93\x03\xB7'
b'\x48\xDB\x87\xB7\x61\x93\x03\xB7\x7E\x51\x76\xB7\x4A\x93\x03\xB7'
b'\x57\x93\x0C\xB7\x23\x93\x03\xB7\x48\xDB\x85\xB7\x50\x93\x03\xB7'
b'\x48\xDB\x9D\xB7\x50\x93\x03\xB7\x48\xDB\x9C\xB7\x50\x93\x03\xB7'
b'\xAC\x95\x93\x96\x57\x93\x03\xB7\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE'
b'\xAE\xB1\xFE\xFE\xAA\xFD\xF1\xFE\xCF\x61\x16\xA7\xFE\xFE\xFE\xFE'
b'\xFE\xFE\xFE\xFE\x1E\xFE\xFC\xDD\xEB\xFD\xF5\xFE\xFE\x1C\xFE\xFE')

for c in ciphertext:
    plaintext.append(df_final.iloc[c]['Plaintext'] & 0xFF)

print('\n[+] Shared by Florian:\n')
hex_dump(ciphertext)
print('\n\n[+] Decoded:\n')
hex_dump(plaintext)
print('\n')


[+] Shared by Florian:

00000000  a9 a4 6e fe f3 fe fe fe f2 fe fe fe ff ff fe fe  ..n.............
00000010  46 fe fe fe fe fe fe fe be fe fe fe fe fe fe fe  F...............
00000020  fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe  ................
00000030  fe fe fe fe fe fe fe fe fe fe fe fe 1e fe fe fe  ................
00000040  e8 df 44 e8 fe 42 f5 29 dd 46 fd aa 29 dd a2 96  ..D..B.).F..)...
00000050  95 83 de 8e 8c 8f 97 8c 9d 89 de 93 9d 88 88 8f  ................
00000060  82 de 9c 91 de 8c 81 88 de 95 88 de b2 af a3 de  ................
00000070  89 8f 92 91 c8 e9 e9 f4 d2 fe fe fe fe fe fe fe  ................
00000080  13 fc 59 e2 57 93 03 b7 57 93 03 b7 57 93 03 b7  ..Y.W...W...W...
00000090  48 db 90 b7 40 93 03 b7 48 db 8e b7 29 93 03 b7  H...@...H...)...
000000a0  48 db 87 b7 61 93 03 b7 7e 51 76 b7 4a 93 03 b7  H...a...~Qv.J...
000000b0  57 93 0c b7 23 93 03 b7 48 db 85 b7 50 93 03 b7  W...#...H...P...
000000c0  48 db 9d b7 50 93 03 b7 48 db 9c b7 50 93 03 b7  H...

At this point I was still curious to know if there was anything else to the XOR+ADD operation logic so I wrote a tiny brute force script to implement a simple known-plaintext attack (KPA) tool. Instead of using the whole file or even the first 256 bytes header shown above I decided to only use the first 4 bytes of the encoded file as ciphertext to speed up the brute force process.
<br><br>
**Ciphertext:** 0xa9 0xa4 0x6e 0xfe
<br>
**Known plaintext:** 0x4d 0x5a 0x90 0x00 (just an educated guess, the last two bytes could vary)

Using the values above, I wrote a script that uses all combinations of single-byte XOR keys and ADD keys to decode the ciphertext value above (can be optimized but left unrolled for readability).

**1st attempt**: XOR Key: 0x01, ADD Key: 0x01
<br>
**2nd attempt**: XOR Key: 0x01, ADD Key: 0x02
<br>
**3rd attempt**: XOR Key: 0x01, ADD Key: 0x03
<br>
...
<br>
...
<br>
**5608th attempt:** XOR Key: 0xff, ADD Key: 0xfd
<br>
**5609th attempt:** XOR Key: 0xff, ADD Key: 0xfe
<br>
**5610th attempt:** XOR Key: 0xff, ADD Key: 0xff
<br><br>
After each decoding attempt, the script compares the decoded value with the known plaintext value to assess the success of the decoding process (i.e. if the decoded and known plaintext values match then we have the correct XOR and ADD key). While I was at it, I also wrote an XOR-SUB routine.


In [10]:
cipher = '\xA9\xA4\x6E\xFE'
plaintext = '\x4d\x5a\x90\x00'

# XOR-ADD Routine
print('\n[*] (c ^ xor_key) + add_key:')
for xor_key in range(1, 256): #no need to XOR with 0
    for add_key in range(1, 256): #no need attemp add 0
        if  ''.join(chr(((ord(c) ^ xor_key) + add_key) & 0xFF) for c in cipher) == plaintext:
            print('\t[+] (c ^ {0:#2X}) + {1:#2X}'.format(xor_key, add_key))

# XOR-SUB Routine
print('\n[*] (c ^ xor_key) - sub_key:')
for xor_key in range(1, 256):
    for sub_key in range(1, 256):
        if  ''.join(chr(((ord(c) ^ xor_key) - sub_key) & 0xFF) for c in cipher) == plaintext:
            print('\t[+] (c ^ {0:#2X}) - {1:#2X}'.format(xor_key, sub_key))
print()


[*] (c ^ xor_key) + add_key:
	[+] (c ^ 0X5B) + 0X5B
	[+] (c ^ 0X7B) + 0X7B
	[+] (c ^ 0XDB) + 0XDB
	[+] (c ^ 0XFB) + 0XFB

[*] (c ^ xor_key) - sub_key:
	[+] (c ^ 0X5B) - 0XA5
	[+] (c ^ 0X7B) - 0X85
	[+] (c ^ 0XDB) - 0X25
	[+] (c ^ 0XFB) - 0X5



As shown above, for the XOR-ADD operation 0x5B, 0x7B, 0xDB and 0xFB are all valid and matching XOR and ADD key values (i.e. all 4 matching keys pairs can be used to XOR-ADD decode the encoded file). There is certainly a relationship between the Sub pattern described in the sections above and the matching XOR-ADD keys above (especially considering XOR is technically a modulo-2 addition) but more research and analysis is required to discover the relationship.
