System Design Algorithm and Interview Cheat Sheet

References

Steps to System Design Interview

Clarify Requirements
- Ask lots of questions
- Very important step
- Spend good amount of time on this step
Adjust scope and state assumptions
- Can limit scope to simplify and explain that you will come back to it at a later time
Back of the envelope calculations
- This should set upper bounds on the high level design
High Level Design
- Do drawings at this time
- Keep it high level and explain trade offs
- There is no one right answer
- Stay in scope but try to leave room for future scale and features
Detailed Design
- Discuss failure scenarios, how redundancy is provided, bottle necks,
- Discuss prevention of issues like thundering herd, hotspots, single point of failures etc
- Focus on security as well in this step

Algorithms and their Usage

Bloom filter

A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set.

Build a Web Crawler

Consistent Hashing

It is used to evenly divide work among a bunch of resources while giving flexibility for resources to be added or removed. Mostly used in database sharding.

Hash Ring

Geohash / S2 Geometry

Geohash can be used by 1) dating apps to find romantic matches within a particular cell, and to create chat apps.2) Find nearby locations, and identify places of interest, restaurants, shops and accommodation establishments in an area. 3) Geohashers go on global expeditions to meet people and explore new places.

Location-based search results with DynamoDB and Geohash

Leaky bucket / Token bucket

A mechanism to control the amount and the rate of the traffic sent to the network

Lossy Counting

The lossy count algorithm is an algorithm to identify elements in a data stream whose frequency count exceeds a user-given threshold.

Operational transformation

Operational transformation (OT) is a technology for supporting a range of collaboration functionalities in advanced collaborative software systems.

Quadtree / Rtree

Spatial Indexing with Quadtrees
Find nearby interest points

Ray casting

Ray casting is the most basic of many computer graphics rendering algorithms that use the geometric algorithm of ray tracing. Given a point with longitude and latitude, return the Country of the point.

Ray Casting Algorithm

Reverse index

Reverse Index: a reverse index is an index of keywords which stores records of documents that contain the keywords in the list.

Rsync algorithm

The rsync algorithm is a technique for reducing the cost of a file transfer by avoiding the transfer of blocks that are already at the destination.

Streaming File Synchronization

Trie algorithm

Trie is an efficient information reTrieval data structure. Using Trie, search complexities can be brought to optimal limit (key length)

Frugal Streaming

Frugal Streaming uses only one unit of memory per group to compute a quantile for each group

Find the nth percentile of the data stream

Count-Min Sketch

Count frequency in a stream of data. Approximate algorithm similar to bloom filter.

Approximate counts with Hashing

Least Recently Used (LRU)

Very popular eviction algorithm for caches or other resources.

Implementation using doubly linked list and hash map

B-Trees

Self balancing tree used in systems where the contents of the tree cannot completely fit in main memory. The data structure is designed to reduce amount of disk accesses. It is heavily used in MySQL to keep track of indexes.

Intro to B-tree

Merkle Tree

Merkle tree or hash tree is a tree in which every leaf node is labelled with the cryptographic hash of a data block, and every non-leaf node is labelled with the cryptographic hash of the labels of its child nodes. Hash trees can be used to verify any kind of data stored, handled and transferred in and between computers. They can help ensure that data blocks received from other peers in a peer-to-peer network are received undamaged and unaltered, and even to check that the other peers do not lie and send fake blocks.

Used in Bitcoin implementation
Apache Cassandra uses Merkle trees to detect inconsistencies between replicas of entire databases.
Introduction to Merkle Trees