Utterly Insane Data Structures

The process of learning something completely new is a strangely tiered experience. More than anything else, the past seven weeks have felt like falling down a very deep pit, and gradually climbing out. The tricky part is, since you’re so focused on climbing up, you can’t actually tell when you’ve climbed past where you fell in – all you know is to keep going.

Once you get into this groove, it can start to feel comfortable. On the other hand, staying humble and keeping a beginner’s mind is paramount to staying receptive and curious.

Which brings us to data structures!

At this point, I’ve grown comfortable using the common data structures used in software. Arrays, hashes (objects in JS), stacks and queues seem like reasonable enough entities. Even binary search trees make sense, despite being more complicated to wrangle.

Which led me to thinking – are there any stranger data structures out there? Could there be odd classifications of information out there in the wild, their mysteries coiled tight inside dense documentation, waiting to be stuffed with bits?

Of course there are – this would be a pretty short blog post if there weren’t!

Bloom Filters

The bloom filter is almost exactly what I was looking for – a bizarre, ungainly data structure with plenty of good points, and a few dashes of tragic weirdness.

Bloom filters bring a heaping helping of optimization to the table. Instead of storing long chunks of data verbatim into a hash table or some other literal configuration, they hold values in multiple places in memory in order to check for ‘set membership,’ or whether your queried item belongs in the particular set being searched. This cuts down significantly on the memory required to store the structure, as well as the number of queries needed to find the result.

When you want to access a value, the bloom filter will check all of the memory locations associated with that search term, and if the return values are positive, it’s a good indication that it exists.

Unfortunately, because memory locations are reused by multiple entries in the filter, it’s possible to generate false positives depending on how many values are being stored.

Oh, and you can never delete anything ever, because the memory locations corresponding to the term may be in use by other values. So weird!

So what the heck do you use this for? Until about two years ago, the Chrome browser used a bloom filter to detect malicious URLs. On the other hand, as of February 2014 Google appears to still use bloom filters to collect metrics stored from users.

For more clarification, Cube Drone has a pretty good three.5 minute summary of how these things work:

Tries

Pronounced “tree” (see? weird right off the bat), tries appear to ape the structure of binary search trees, but they’re unique animals. While binary search trees reshuffle themselves to accommodate new values with respect to the current makeup of the tree, tries traverse an existing selection of data to return a specific value – which is where they get their name, from reTRIEval. Tries also specialize in text queries, which comes into play regarding their real-world use cases.

And because they are used to retrieve values instead of generating an updated order, they offer a space-saving alternative to hash tables and other basic data indexing structures.

So how do they work? Given a list of words that all start with the same few letters – like ace, actor and aces – the trie would list utilize a separate node for every letter, following all the way down until each traversal terminates in a final word.

If you’re thinking that something like this would be good for spell-checking or autocomplete, you’re right! Tries are generally used for such text-intensive scenarios.

rabblerabbleRubble on YouTube breaks down the concept in under two minutes:

Skip Lists

Skip lists are the coolest things in the world. Skip lists provide fast searching capabilities within an ordered series of elements, commonly numbers. In essence, skip lists compare a current element with other elements you’ve previously inserted into your collection – including not only INFINITY on the right, but NEGATIVE FREAKING INFINITY on the left!

Once you’ve slotted where an element is supposed to go, the element may or may not be ‘promoted’ up to the next level of comparison based on a random decision by the search algorithm. Since some numbers don’t get promoted, future selections can skip entire swaths of set data, speeding up the entire query time – thanks to the flip of a coin!

In other words: Skip lists don’t care about what you just did. Skip lists don’t care about your plans for the future of your little algorithm. Skip lists are gonna do skip lists just how skip lists likes it.

Skip lists are so cool that this video showing them in action, even though the narrator’s voice is literally skipping because he set the speed too fast and the visual quality is near-abysmal, is by far a better explanation than any words I could provide.

Watch, enjoy – and learn.

Comments