Steps to System Design Interview
- Clarify Requirements
- Ask lots of questions
- Very important step
- Spend good amount of time on this step
- Adjust scope and state assumptions
- Can limit scope to simplify and explain that you will come back to it at a later time
- Back of the envelope calculations
- This should set upper bounds on the high level design
- High Level Design
- Do drawings at this time
- Keep it high level and explain trade offs
- There is no one right answer
- Stay in scope but try to leave room for future scale and features
- Detailed Design
- Discuss failure scenarios, how redundancy is provided, bottle necks,
- Discuss prevention of issues like thundering herd, hotspots, single point of failures etc
- Focus on security as well in this step
Algorithms and their Usage
A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set.
It is used to evenly divide work among a bunch of resources while giving flexibility for resources to be added or removed. Mostly used in database sharding.
Geohash / S2 Geometry
Geohash can be used by 1) dating apps to find romantic matches within a particular cell, and to create chat apps.2) Find nearby locations, and identify places of interest, restaurants, shops and accommodation establishments in an area. 3) Geohashers go on global expeditions to meet people and explore new places.
Leaky bucket / Token bucket
A mechanism to control the amount and the rate of the traffic sent to the network
- Everything You Need To Know About API Rate Limiting
- How to Design a Scalable Rate Limiting Algorithm
- How we built rate limiting capable of scaling to millions of domains
- Rate-limiting strategies and techniques
The lossy count algorithm is an algorithm to identify elements in a data stream whose frequency count exceeds a user-given threshold.
Operational transformation (OT) is a technology for supporting a range of collaboration functionalities in advanced collaborative software systems.
Quadtree / Rtree
- Spatial Indexing with Quadtrees
- Find nearby interest points
Ray casting is the most basic of many computer graphics rendering algorithms that use the geometric algorithm of ray tracing. Given a point with longitude and latitude, return the Country of the point.
Reverse Index: a reverse index is an index of keywords which stores records of documents that contain the keywords in the list.
The rsync algorithm is a technique for reducing the cost of a file transfer by avoiding the transfer of blocks that are already at the destination.
Trie is an efficient information reTrieval data structure. Using Trie, search complexities can be brought to optimal limit (key length)
- How to Design an Autocomplete System
- prefix matching words (IP Addresses, Phone Numbers)
- Auto-complete feature using Trie
Frugal Streaming uses only one unit of memory per group to compute a quantile for each group
- Find the nth percentile of the data stream
Count frequency in a stream of data. Approximate algorithm similar to bloom filter.
Least Recently Used (LRU)
Very popular eviction algorithm for caches or other resources.
Self balancing tree used in systems where the contents of the tree cannot completely fit in main memory. The data structure is designed to reduce amount of disk accesses. It is heavily used in MySQL to keep track of indexes.
Merkle tree or hash tree is a tree in which every leaf node is labelled with the cryptographic hash of a data block, and every non-leaf node is labelled with the cryptographic hash of the labels of its child nodes. Hash trees can be used to verify any kind of data stored, handled and transferred in and between computers. They can help ensure that data blocks received from other peers in a peer-to-peer network are received undamaged and unaltered, and even to check that the other peers do not lie and send fake blocks.
- Used in Bitcoin implementation
- Apache Cassandra uses Merkle trees to detect inconsistencies between replicas of entire databases.
- Introduction to Merkle Trees