There’s something interesting about spam. I’m not talking about the barrage of poorly-disguised junk mail everbody gets – I’m talking about the special kind of spam that combines just the right amount of unlikely phrase pairings and absurdist imagery to create a new kind of confusion unique to the modern age.
It’s really simple to use – you just put the message you want to hide in the ‘Encode’ field, then press the button to instantly produce a coded version of the message. When you want to decode it, just copy the text in the ‘Decode’ field and press that button. Couldn’t be simpler!
However, getting this to work just how I wanted was a little trickier. Here’s how I did it.
Step 1: Case Statement
For the very first version of the app, I wanted to get something working as quickly as possible. Once I had a prototype up, it’d be much easier to see where the issues and unexpected problems would come from.
The first incarnation was a straight letter substitution, using a hideous case statement to swap out each letter of the original message for a new encoded word.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Despite being as clunky as possible, it worked well enough to prove out the concept and bring a few issues to light I hadn’t anticipated:
1. The algorithm would need to handle lowercase and uppercase letters, as well as some special characters;
2. Challenges in swapping out individual characters with spam phrases instead of single spammy words; and
3. Preserving carriage returns in the original message.
Handling lowercase and uppercase letters was easy – I just added different phrase substitutions for all 52 case variations of the English alphabet, plus a few of the more common special characters. The others, however, were much more interesting.
2. Polymorphic Phrase Library
Replacing single characters with encoded phrases took a fair chunk of time to nail down. While the first version duplicated the entire character/spam word key in both the encode and decode methods, any time I made a change to one method chain, the other would break.
After working on this for a bit, I realized that a ‘phrase library’ store made the most sense. Using a hash for the data structure meant I could remove all duplication and condense the phrase library into just one method.
Depending on the operation the user wanted to do, this method could return either an encoded phrase for a character to be encoded, or the original character for all occurrences of a given spam phrase. I’ll come back to this in a bit.
1 2 3 4 5
3. Preserving Carriage Returns
The last issue was something I dealt with almost as soon as the first version started working. Because the
encode operation strips out all spaces in the message so it can iterate over each character and make the substitutions, I was losing any carriage returns the user included in the original message. Since my goal was to preserve the original message completely, including superfluous carriage returns, this wouldn’t do.
To handle this, the program replaces all
\n returns with the ♤ UTF-8 character, which is then replaced with spam phrases just like any other character. When decoding, the phrases associated with the spade character are replaced with the spade, then finally replaced with
\n characters right before the whole message is restored to its decoded form and output back to the user.
1 2 3
4. Random Phrase Selection
Everything was looking good, except for one final problem. If you encoded a message, then added just one more character (or even a short word), it would just append more phrases to the existing encoded message. This wasn’t good enough! Even if you just changed one character, the decoded message should be totally different.
Solving this last problem turned out well. I couldn’t use a random number to select which phrase to use to replace a character, since it would introduce an element of randomness that felt out of place in a formalized cryptography system.
Instead, I relied on an obvious constant – the length of the message.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
When encoding, the
message_length_key variable is initialized with the last digit of the length of the message. Then, for every character substitution, the key increases by one. Since I only wrote/found 10 spam phrases for each character, a check runs on
message_length_key for every loop and reduces the value if it every gets too high.
And that’s it! It was a pretty fun project, and a good excuse to see what some challenges are when developing crytography systems. Plus, writing lots of gibberish turns out to be pretty fun.
The whole thing’s open sourced and up on github here: https://github.com/kamoh/spam-encoder
sign your pitty virtuous checkboox thousands dogs chasing squirrels in TARGET
dungeons and potlucks i am real person plus homelessness accreditation pine cone living wage certainly, –
bagels, and severe windblown acne. Chris is dead, Job for you! more and bots have been replaced with people new flavor? Pepsi Google told me in whispers the truth savings, a bust iced tea and beverage rappers opportunity of a lifetime, my bagels, and denial of sunrise attack textile gremlins eating more and
classifieds? Job now on website valley of the getting rich quick was dead? explicit, yet full of romance filed in slowly, all