Cryptopals Set 1 Write-Up

A few weeks ago, I started solving the Cryptopals challenges in my spare time. I have almost completed the first six sets of challenges, each set containing about 6–7 challenges.

We'll be solving the first set of them in this post. This set is relatively easy, that's why I'll solve all of them in one blog post.

This set kicks off with basic conversions between bytes and hexes, continues with XOR operations and ends with basic AES, a type of cryptographic algorithm and encryption/decryption challenges.

 

Table of Contents

Crypto Challenge Set 1

This is the qualifying set. We picked the exercises in it to ramp developers up gradually into coding cryptography, but also to verify that we were working with people who were ready to write code.

This set is relatively easy. With one exception, most of these exercises should take only a couple minutes. But don't beat yourself up if it takes longer than that. It took Alex two weeks to get through the set!

If you've written any crypto code in the past, you're going to feel like skipping a lot of this. Don't skip them. At least two of them (we won't say which) are important stepping stones to later attacks.

- Cryptopals Creators

 

Convert hex to base64

In this very first challenge, we'll just convert a hex string to base64. In all challenges, we'll use Python. Before we start, look at the challenge.

The string (hex):

49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d

Should produce:

SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

To convert those properly, we must operate on raw bytes, not encoded strings as they have mentioned. We can use the base64 library for encoding and the built-in bytes.fromhex function to convert the hex string into raw bytes.

import base64

hex_string = "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d"
raw = bytes.fromhex(hex_string)
result = base64.b64encode(raw)
#SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

Buckle up! We have a long way to go!

Fixed XOR

This challenge wants a function that takes two equal-length buffers and produces their XOR combination.

Before we start solving the challenge, let's dive into the world of XOR.

Exclusive-or (XOR) is a logical operator. A and B are bits (1 or 0). If A and B are not the same, the output is 1. Otherwise, the output is 0. In Python, we can XOR like this:

a  = 0
b = 1
c = a^b
print(c)
# -> c=1  

Don't confuse it with the power symbol, which is ** .

We can XOR anything that can be represented as bits, such as bytes, integers or strings.

 

a = 7
b = 8
c = a^b
print(c)

What would be the variable c?

Calculate it by converting a and b into binaries by hand.

7 -> 0111
8 -> 1000

Starting from the least significant bit, XOR one by one. If two of them are equal, the output is 0. Otherwise, the output is 1.

    0111
1000
XOR
----------
1111 (15)

Meaning that 7 ^ 8 = 15.

Back to our challenge, we will write a function that takes two equal-length buffers and XORs them.

 

def xor(b1 : bytes, b2: bytes) -> bytes:
   result = b''
   for i in range(len(b1)):
      result += chr(b1[i] ^ b2[i]).encode()
   return result

b1 = bytes.fromhex("1c0111001f010100061a024b53535009181c")
b2 = bytes.fromhex("686974207468652062756c6c277320657965")

print ( xor(b1,b2) )
#746865206b696420646f6e277420706c6179
#b"the kid don't play"

 

Single-byte XOR cipher

We are given a hex-encoded string and our goal is to decrypt it. We know that it's XOR'd against a single byte. If we brute force all possible bytes, which are the total possible combinations, the plain text is easily reachable.

One more thing: we have to generate a function that gets a string buffer and says whether it's meaningful or not in English. For instance, it must distinguish these two:

buff1 = b'hello world'
buff2 = b'\0x\\t0x.}{='

Which one seems like meaningful :D?

Before creating it, start with brute force. I'll introduce an XOR function that's very flexible and can be used across all challenges. This function comes from the pwn, which is a powerful exploitation library function though we'll use it only for XOR'ing.

#pip install pwntools
from pwn import xor

#It always returns bytes 
a = 7
b = 8
c = xor(a,b)
print(c[0])
# 15

# we can also pass parameters, it'll XOR consecutively
d = xor(a,b,c)
#  d is 0, why?

The XOR function can also perform single-byte XOR operations. If we provide two bytes of unequal length, it will copy the shorter one and make it equal in length to the other.

from pwn import xor
# b"hello" and b"aaaaa" will be XORed
single = xor(b"hello",b"a")

Thus, it is easy to create a single-byte cracker.

However, we need to create a scoring system for buffers. We can create a function based on Bhattacharyya distance.

import math

# Frequency dictionary for English letters and space
FREQ = {'a': 0.0651738, 'b': 0.0124248, 'c': 0.0217339, 'd': 0.0349835, 'e': 0.1041442, 'f': 0.0197881, 'g': 0.0158610,
        'h': 0.0492888, 'i': 0.0558094, 'j': 0.0009033, 'k': 0.0050529, 'l': 0.0331490, 'm': 0.0202124, 'n': 0.0564513,
        'o': 0.0596302, 'p': 0.0137645, 'q': 0.0008606, 'r': 0.0497563, 's': 0.0515760, 't': 0.0729357, 'u': 0.0225134,
        'v': 0.0082903, 'w': 0.0171272, 'x': 0.0013692, 'y': 0.0145984, 'z': 0.0007836, ' ': 0.1918182}

def score_string(word: bytes) -> float:
    curr_freq = {letter: 0 for letter in FREQ.keys()}

    num_letters = 0
    for i in word:
        if chr(i).lower() in FREQ.keys():
            curr_freq[chr(i).lower()] += 1
            num_letters += 1

    if num_letters != 0:
        curr_freq = {letter: val / num_letters for letter, val in curr_freq.items()}
    else:
        return 0

    distance = bhattacharyya_distance(FREQ, curr_freq)
    return 1 / distance

def bhattacharyya_distance(dist1: dict, dist2: dict) -> float:
    bc_coeff = 0
    for letter in FREQ.keys():
        bc_coeff += math.sqrt(dist1[letter] * dist2[letter])
    return -math.log(bc_coeff)


The score_string function calculates a similarity score based on the Bhattacharyya distance between the letter frequency distribution in the input byte sequence and a predefined distribution.

You can test the function by passing some values and observe the returned value.

Since it's XOR'ed with one byte, it's possible to retrieve the plaintext by trying all possible asci bytes, from 0 to 255, and the plaintext that has the highest Bhattacharyya distance score will be the probable plaintext.

from pwn import xor
ciphertext = bytes.fromhex("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736")
highest_score = 0
plaintext = ""
for i in range(256):
    result = xor(ciphertext,bytes([i]))   
    score = score_string(result)
    if score > highest_score:
        highest_score = score
        plaintext = result
print(plaintext)
#b"Cooking MC's like a pound of bacon"

If all possible keys are not so much at all, it's very common in cryptography to brute force all possible keys.

Detect single-character XOR

We know that one of the 60-character strings in the given file has been encrypted by single-character XOR. Using the code that we used for previous challenge, the plaintext is easily retrievable. Why? Still, the number of all possible keys are so low, which are total 256*60= 15360 keys. It is not a big number in cryptography. 

import requests
from pwn import xor

url = "https://cryptopals.com/static/challenge-data/4.txt"
ciphertexts = requests.get(url).text.split("\n")

highest_score = 0
plaintext = ""
for ciphertext in ciphertexts:
    for i in range(256):
        result = xor(bytes.fromhex(ciphertext),bytes([i]))   
        score = score_string(result)
        if score > highest_score:
            highest_score = score
            plaintext = result
print(plaintext)
#b'Now that the party is jumping\n'

Implement repeating-key XOR

plaintext = b"Burning 'em, if you ain't quick and nimble\nI go crazy when I hear a cymbal"

KEY = b'ICE'

We need to extend our key as much as the length of the plaintext, such as ICEICEICE until until it has the same length as the plaintext or the ciphertext.

KEY = KEY * (len(plaintext) // len(KEY)) + KEY [: len(plaintext) % len(KEY)]
#b'ICEICEICEICEICEICEICEICEICEICEICEICEICEICEICEICEICEICEICEICEICEICEICEICEIC'

Now, we can encrypt with the repeating key.

Since we are working with the XOR function of the pwn library, we actually don't have to extend our key. It'll automatically apply repating-key XOR.

plaintext = b"Burning 'em, if you ain't quick and nimble\nI go crazy when I hear a cymbal"
KEY = b'ICE'

print(xor(plaintext,KEY).hex())
#0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f

 

Break repeating-key XOR

 
Write a function to compute the edit distance/Hamming distance between two strings. The Hamming distance is just the number of differing bits.
 
To calculate the hamming distance, two strings must be converted into binaries. This is why so many people failed at that challenge. 
Here is the function that calculates the hamming distance.
s1 = b'this is a test'
s2 = b"wokka wokka!!!"

def hamming_distance(s1:bytes, s2:bytes) -> int:
    s1_bin  = bin(int(s1.hex(), base=16))[2:]
    s2_bin  = bin(int(s2.hex(), base=16))[2:]

    if len(s1_bin) > len(s2_bin):
        s2_bin = "0" *  abs( len(s1_bin) - len(s2_bin)   )  + s2_bin
    elif len(s1_bin) < len(s2_bin):
        s1_bin = "0" *   abs( len(s1_bin) - len(s2_bin)   )  + s1_bin
    
    count = 0
    for i in range(len(s2_bin)):
        if s1_bin[i] != s2_bin[i]:
            count +=1
    return count

assert hamming_distance(s1,s2) == 37

That was easy. Calculating the probable key size is more tricky. Just follow the instructions given by CryptoPals:

For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.

The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.

import statistics

ciphertext = requests.get("https://cryptopals.com/static/challenge-data/6.txt").content.strip()
ciphertext = base64.b64decode(ciphertext)

KEY_SIZE_MIN = 2
KEY_SIZE_MAX = 40

def calculateKeySize( ) -> int:
    min_dist = KEY_SIZE_MAX * 8
    best_key_size = KEY_SIZE_MIN

    for key_size in range(KEY_SIZE_MIN, KEY_SIZE_MAX):
        dist_list = []

        for i in range(len(ciphertext) // key_size - 1):
            block1 = ciphertext[i * key_size : (i + 1) * key_size]
            block2 = ciphertext[(i + 1) * key_size : (i + 2) * key_size]
            dist_list.append(hamming_distance(block1, block2))

        total_dist = statistics.mean(dist_list) / key_size
        if total_dist < min_dist:
            min_dist = total_dist
            best_key_size = key_size

    return best_key_size

keySize = calculateKeySize()
#29

We successfully calculated the size of the key as 29.

Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length:

keySize = calculateKeySize()
CIPHED_BLOCKS = []
for k in range(keySize):
    BLOCK = []
    for i in range(k,len(ciphertext),keySize ):
        BLOCK.append(ciphertext[i])
    CIPHED_BLOCKS.append(BLOCK)

Solve each block as if it was single-character XOR. You already have code to do this.

We can use the code we wrote, which is one of the challenges we have completed so far.

 

from pwn import xor

def oneByteXor(ciphertext:bytes):
    highest_score = 0
    temp_key = b''
    for i in range(256):
        result = xor(  ciphertext ,bytes([i]))   
        score = score_string(result)
        if score > highest_score:
            highest_score = score
            temp_key = bytes([i])
    return temp_key

For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.

key = b''
for block in CIPHED_BLOCKS:
    key = key + oneByteXor(block)  
    print(key)

It's time to decrypt!

print(xor(ciphertext,key))
#b"I'm back and I'm ringin' the bell \nA rockin' on the mike while the fly girls yell ..."

This challenge was kind of semi-guided. However, it was still one of the hardest challenges in this set.

Cooldown. The following challenges won't be as painful.

 

AES in ECB mode

Advanced Encryption Standard is one of the pioneers of symetric encryption. It has several encryiption modes of operation. GCM, CCM, ECB, CBC, CTR are some of the modes. You can visit that Wikipedia page to see all common modes.

In this challenge, we will use Electronic codebook mode. The first thing you need to know about this mode is that you must not use ECB mode in your projects and you probably encounter this mode in CTFs mostly.

The reason you must not use this mode can be summurised in one popular image about it.

The general idea behind block-ciphering modes is to split the data (to be encrypted) blocks, which are 16 bytes per block for this mode, and encrypt separately applying AES along with the special operations depends on the mode.

Identical input and identical key will produce an identical result.

We will use the pycryptodome library to encrypt and decrypt data in Python.

To install pycryptodome util:

 pip install pycryptodome 

Allright, let's write some code.

#AES in ECB mode
from Crypto.Cipher import AES
from Crypto.Util.Padding import unpad
import base64,requests
from Crypto.Util.Padding import pad

inp = requests.get('https://cryptopals.com/static/challenge-data/7.txt').content.strip()
key = b'YELLOW SUBMARINE' # The key  

# Read the data from the file
ciphered_data = base64.b64decode(inp) # decode base64
cipher = AES.new(key,AES.MODE_ECB) # initate cipher
original_data =  cipher.decrypt(ciphered_data)   # Decrypt  
# dont forget to unpad the data
original_data = unpad(original_data,AES.block_size)
print(original_data.decode())
"""
I'm back and I'm ringin' the bell 
A rockin' on the mike while the fly girls yell 
In ecstasy in the back of me 
Well that's my DJ Deshay cuttin' all them Z's 
...
"""

 

Detect AES in ECB mode

In this challenge, we are given a bunch of ciphertexts, and our goal is to detect ciphertexts that are produced by AES in ECB mode. This is a trivial one. Remember what the weakness of ECB was:

Identical input and identical key will produce an identical result.

In order to detect, we just need to split a ciphertext into 16-byte blocks and then search for identical bytes.

def detectAES_ECB(ciphertext):
    blocks = []
    for _ in range(len(ciphertext) // 16):
        if ciphertext[:16] in blocks:
            print('Detected AES_ECB!: ' + ciphertext)
            print('Block: ' + ciphertext[:16])
            print('-----------')
        blocks.append(ciphertext[:16])
        ciphertext = ciphertext[16:]

Now we can try our function.

import requests
ciphertexts = requests.get('https://cryptopals.com/static/challenge-data/8.txt').text.split('\n')

for ciphertext in ciphertexts:
 detectAES_ECB(ciphertext)

End of The Set

Frankly, this set was boring for me. To me, the second set is more intriguing. See you there!