Plaintext Attack on Zip

The Vulnerability

The PKZIP program is one of the more widely used archive/compression programs on personal computers. PKZIP provides a stream cipher which allows users to scramble files with variable length keys ( passwords ). We can find the internal representation of the key within a few hours on a PC using a few bytes of known plaintext. This Stream cipher was designed by Roger Schlafly.

We will take a look at the Forensics 3 challenge from RiftCTF2020, But first lets understand the weakness. The attack primarily finds the 96-bit internal representation of the key, which suffices to decrypt the whole file and any other file encrypted under the same key. Later the original key can be constructed.

The PKZIP Stream Cipher

PKZIP manages a ZIP file which is an archive containing many files in a compressed form, along with file headers describing (for each file) the file name, the compression method, whether the file is encrypted , the CRC-32 value, the original and compressed sizes of the file, and other auxiliary information. You can find the above mentioned information in the Local File header of each file. You can fuck with the zip files by changing these headers, for example you can modify the bit which tells whether the zip is encrypted or not, by doing so you can confuse the extracting software into asking password for an unencrypted zip file ;).

The cipher is byte-oriented, encrypting under variable length keys. It has a 96-bit internal memory, divided into three 32-bit words called key0, key1, and key2. An 8-bit variable key3 (not part of the internal memory) is derived from key2. The key initializes the memory: each key has an equivalent internal representation as three 32-bit words. The plaintext bytes update the memory during encryption.

The main function of the cipher is called update_keys, and is used to update the internal memory and to derive the variable key3, for each given input (usually plaintext) byte:

update_keys(char):
    local unsigned short temp
    key0 <-- crc32(key0, char)
    key1 <-- (key1 + LSB(key0)) * 0x8088405 + 1 (mod 2^32)
    key2 <-- crc32(key2, MSB(key1))
    temp <-- key2 | 3 (16 LS Bits)
    key3 <-- LSB((temp * (temp ^ 0x1)) >> 0x8)
end update_keys

Our Goal is to find out the internal representation of these keys.

Under a known plaintext attack, both the ciphertext and plaintext are known. In the PKZIP cipher, given a plaintext byte and the corresponding ciphertext byte, the value of the variable key3 can be calculated by

key3 = P ^ C

Where P and C are plaintext and ciphertext bytes respectively.

To know how to derive key1 and key2 i would recommend these Papers about the attack

Once we get the keys, we can decrypt all the files in the zip file. One great tool to do just that is pkcrack

Lets see the Forensics3 challenge from RIFTCTF2020

Given Files

We have enough files to decrypt the zip file and get the flag.txt. if you look inside the flag.zip you will find there are 2 files

  • flag.txt
  • readme.txt

and we do have a copy of readme.txt and readme.txt is infact long enough to support this attack. So, we run pkcrack and wait, and wait, and some more wait.

After some time we see this.

pkcrack

Decrypted Files

and the flag is

flag