If you are new to this blog, then read the How to ace the system design interview blog first. I have described the guidelines involved in solving a system interview problem. We will be applying those steps here.
Interviewer: Walks in the room. After the initial introductions, “ I would like to test you on your system design skills”
You: Sure! (Internally you are just hoping that the problem is a easy one :))
Interviewer: Let’s start by designing a system for a URL shortening service like bitly. How would you go about designing such a system?
You: OK. Here is what I would do
Then you follow the steps given below.
We will use the following high level steps to solve this problem.
High Level Steps:
- Scope the problem and clarify the requirements
- Do estimations
- High-Level Design
- Design core components
- Define API
- Detailed design (Depends on the time allocated for the interview)
- Resolve any bottlenecks in the design
- Summarize the solution
Step 1: Scope the problem and clarify the requirements
Define the functionality of the product (Bitly) to the interviewer so that you are aligned. Bitly is a URL shortening service to shorten, create and easily share links. The short links will redirect the user to the original URL.
The scope is limited to:
- The user enters a URL (https://www.myproductthoughts.com/index.php/2020/04/16/how-would-you-design-pastebin/) and app generates a unique short URL such as bit.ly/2ViKqTVbi
- User will be redirected to the original URL when the bitly link is clicked
- User will need to register for an account
- Links have a fixed expiry time
- The system is always available
- System is reliable
- Unique/ random URL should not be an easily guessable value
Out of scope:
- Custom URL links by the user
You can proceed to the next step only after confirming these assumptions with the interviewer
If the interviewer challenges you on out of scope requirements, then you can still stick to your script by letting them know that you will revisit the requirements at the end
Step 2: Do estimations
Before starting estimations, you would need to state some base assumptions to kickstart the calculations.
In this case, we are looking at the following assumptions and estimations…………
- Avg URL size is 1KB
- Read to write ratio is 100:1
- 10 million users
- 1 million paste a day and 365M paste a year
- 100 million reads per million writes/pastes
- Paste per sec = 1M/24*3600 = 12 pastes/sec
- Read per sec = 100M/24*3600 = 1160 reads/sec
- Data storage – Assume links are stored for 3 year
- Data stored in 1 day = 1KB*1M = 1 GB/day
- Data stored in 3 year = 1GB * 3* 365 = ~1.1TB data to be stored
You can use the above calculations to create a high level design.
Step 3: High Level Design
What are the components involved in this design? You have a web server that talks to the end client. The server takes the URL given by the user and shortens the URL with a unique string. The server then routes the encoded data to the database.
We have seen a similar service earlier by designing a system for Pastebin. The difference between Pastebin and Bitly design is that we do not need an Object storage container here. We only need to store the URL table data and the user table data for Bitly
Step 4: Design core components
We would store two separate data tables. One for the URL data and other for User data. URL table would include information such as the actual URL, timestamp, IP address, and expiry timeline. The URK table will use the encoded URL data as the key. The User data would use the user ID as the key and the table would include data such as user name, email, timestamp, and other details collected during sign up.
How do we generate the unique shortened URL?
Solving this piece of the puzzle is critical in the design process. We need to generate the URL string from a known data. We will hash the input URL provided by the user. Applying an algorithm such as MD5 should produce a 128-bit hash value. Next, we can encode the hash into Base 62 or 64.
Base 62 encodes to [a-zA-Z0-9] without any special characters. Base 64 includes special characters. There is only one hash result for the original input and Base 62 is deterministic. We can then pick the first 6 digits out of the hash to use as the URL string. The Bitly uses up to 9 characters in their encoding. We are sticking to six.
How many links are estimated if we save the data for 3 years = 1.1 B pastes. The number of combinations is 62^6 or 64^6. If we pick Base 62 encoding the number of keys generated is
62^6 = ~57 Billion strings should cover the possible cases that we need. This holds true even if we need to store the URL records for more than 3 years.
The encoding produces 128/6 = produces 21 characters. We are planning to use 6 characters to define the unique URL. We can pick the first 6 characters and then add that to the URL. We should have a check to ensure that the URL is not duplicated, before sending it to the client. If the character string is duplicated then we need to discard the data and regenerate the keys again.
API for read and write:
There are two API needed, one to write and one to read.
For writing, we can use a Write API with the following parameters – Original URL, timestamp, IP address, expiry date.
For read, we can use the Read API with the following parameters – shortened URL link
Step 5: Detailed Design
How do we speed up the design and how do we improve the responsiveness of the design?
We can improve the speed of the design by generating keys offline and then using those keys in real-time. No need to encode or create a short URL every time a new long URL is provided by the user. We can leverage an offline key service that generates the keys in advance and stores them in a database.
Every time we get a new paste request, we can append the 6 digit keys generated to the URL. No need to encode the original URL, timestamp or IP address. The encoded keys once used will be stored in a different database. Hence there will be two databases for the keys. One with the new keys and other with the used keys. We can use the MD5 hash on a random value and then use base 62 encodings to generate the keys. We need to allocate storage for the keys.
What is the storage size for offline keys? We have calculated earlier that the total possible combination of keys is 57 billion strings. There are six characters per key and assuming 1 byte per character
1 Key = 6 Bytes
57 Billion keys = 57 *6 = 342 GB storage for all possible keys.
Once we get a read request the server will first check the keys datastore to check if the key is used and then use the key to find the original URL string stored in the database.
We will add a load balancer to the design to improve the responsiveness. Load balancers can be added between the client and the webserver. The load balancers can also be added between the server to the data storage.
Step 6: Resolve Bottlenecks
The key generation service is a bottleneck and we will solve it by adding a backup key server that will be a mirror copy of the original server. We will cache 20% of the requests for faster response time.
Step 7: Summary
Finally, summarize the detailed design to the interviewer by going through the flow and confirming that the design meets the initial assumptions and constraints. Acknowledge that the next steps would be to work on excluded scope such as the custom URL option.
Hopefully, this example helps you understand solving system design questions. If you would like me to attempt other questions, then please leave a comment or reach out at firstname.lastname@example.org