Encoding Messages in Fake Spam Using Polymorphic Interfaces

There’s something interesting about spam. I’m not talking about the barrage of poorly-disguised junk mail everbody gets – I’m talking about the special kind of spam that combines just the right amount of unlikely phrase pairings and absurdist imagery to create a new kind of confusion unique to the modern age.

A few weeks ago, I made a steganography project, SpamEncoder, using this caliber of ‘spam as art’ as the medium to hide secret messages in.

It’s really simple to use – you just put the message you want to hide in the ‘Encode’ field, then press the button to instantly produce a coded version of the message. When you want to decode it, just copy the text in the ‘Decode’ field and press that button. Couldn’t be simpler!

However, getting this to work just how I wanted was a little trickier. Here’s how I did it.

Step 1: Case Statement

For the very first version of the app, I wanted to get something working as quickly as possible. Once I had a prototype up, it’d be much easier to see where the issues and unexpected problems would come from.

The first incarnation was a straight letter substitution, using a hideous case statement to swap out each letter of the original message for a new encoded word.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  # Encode method

  def encode_letter_replace(letter)
    case letter
    when "a"
      "cialis "
    when "b"
      "loans "
    when "c"
      "enhancement "
    when "d"
      "pills "
    when "e"
      "sale "
    when "f"
      "nigeria "
    ...

  # Decode method

  def message_reverse_engineer(message)
    message.gsub!("cialis ","a")
    message.gsub!("loans ","b")
    message.gsub!("enhancement ","c")
    message.gsub!("pills ","d")
    message.gsub!("sale ","e")
    message.gsub!("nigeria ","f")
    ...

Despite being as clunky as possible, it worked well enough to prove out the concept and bring a few issues to light I hadn’t anticipated:

1. The algorithm would need to handle lowercase and uppercase letters, as well as some special characters;

2. Challenges in swapping out individual characters with spam phrases instead of single spammy words; and

3. Preserving carriage returns in the original message.

Handling lowercase and uppercase letters was easy – I just added different phrase substitutions for all 52 case variations of the English alphabet, plus a few of the more common special characters. The others, however, were much more interesting.

2. Polymorphic Phrase Library

Replacing single characters with encoded phrases took a fair chunk of time to nail down. While the first version duplicated the entire character/spam word key in both the encode and decode methods, any time I made a change to one method chain, the other would break.

After working on this for a bit, I realized that a ‘phrase library’ store made the most sense. Using a hash for the data structure meant I could remove all duplication and condense the phrase library into just one method.

Depending on the operation the user wanted to do, this method could return either an encoded phrase for a character to be encoded, or the original character for all occurrences of a given spam phrase. I’ll come back to this in a bit.

1
2
3
4
5
def library(input) # takes a hash of either { letter: letter } or { message: message } key value pair
  phrase_library = {
    'a' => ["hilarious spam phrase 1 ","mildly alumsing spam phrase 2 "],
    'b' => ["delightful spam phrase 3 ","hackneyed spam phrase 4"],
    ...

3. Preserving Carriage Returns

The last issue was something I dealt with almost as soon as the first version started working. Because the encode operation strips out all spaces in the message so it can iterate over each character and make the substitutions, I was losing any carriage returns the user included in the original message. Since my goal was to preserve the original message completely, including superfluous carriage returns, this wouldn’t do.

To handle this, the program replaces all \n returns with the ♤ UTF-8 character, which is then replaced with spam phrases just like any other character. When decoding, the phrases associated with the spade character are replaced with the spade, then finally replaced with \n characters right before the whole message is restored to its decoded form and output back to the user.

1
2
3
message.gsub(" ","‡").gsub(/\n/,"♤").split("").each do |letter| # replacing all spaces and carriage returns with temp characters
  # encode each individual character of the message
end

4. Random Phrase Selection

Everything was looking good, except for one final problem. If you encoded a message, then added just one more character (or even a short word), it would just append more phrases to the existing encoded message. This wasn’t good enough! Even if you just changed one character, the decoded message should be totally different.

Solving this last problem turned out well. I couldn’t use a random number to select which phrase to use to replace a character, since it would introduce an element of randomness that felt out of place in a formalized cryptography system.

Instead, I relied on an obvious constant – the length of the message.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# in initialize, set the message text to @message and set the @message_length_key

def initialize(message_input)
  @message = message_input
  @message_length_key = message.length.to_s[-1].to_i
end

# in the encode method, iterate over each character in the message and replace each with a phrase
  message_length_key += 1 # increment the key for selecting phrases
  message_length_key -= 11 if message_length_key > 9

# in the library method
# if sending a character to the library to be encoded, we need to return a phrase

character = input[:letter] if input[:letter]
if character
  if !phrase_library[character].nil?
    return phrase_library[character][message_length_key]
  else
    return character
  end

# otherwise, we have a message to be decoded, and we need the original character in the key

else
  phrase_library.each do |char,phrase_list|
    phrase_list.each do |phrase|
      message.gsub!(phrase,char) if message.include?(phrase)
    end
  end
  message
end

When encoding, the message_length_key variable is initialized with the last digit of the length of the message. Then, for every character substitution, the key increases by one. Since I only wrote/found 10 spam phrases for each character, a check runs on message_length_key for every loop and reduces the value if it every gets too high.

And that’s it! It was a pretty fun project, and a good excuse to see what some challenges are when developing crytography systems. Plus, writing lots of gibberish turns out to be pretty fun.

The whole thing’s open sourced and up on github here: https://github.com/kamoh/spam-encoder

sign your pitty virtuous checkboox thousands dogs chasing squirrels in TARGET 

dungeons and potlucks i am real person plus homelessness accreditation pine cone living wage certainly, –

bagels, and severe windblown acne. Chris is dead, Job for you! more and bots have been replaced with people new flavor? Pepsi Google told me in whispers the truth savings, a bust iced tea and beverage rappers opportunity of a lifetime, my bagels, and denial of sunrise attack textile gremlins eating more and 

classifieds? Job now on website valley of the getting rich quick was dead? explicit, yet full of romance filed in slowly, all

Comments