If you are new to this blog, then read How to ace the system design interview blog first. I have described the guidelines involved in solving a system interview problem. We will be applying those steps here.
Interviewer: Walks in the room. After the initial introductions, “ I would like to test you on your system design skills”
You: Sure! (Internally you are just hoping that the problem is a easy one :))
Interviewer: Let’s start by assuming that you are a program manager working at Amazon. Are you familiar with the best seller category? Amazon updates the best selling products in each category every hour. The customer can check out these products and buy them via one click. How would you go about designing the best seller category? I am looking for technical system design for this feature. I am also interested in your thoughts about scaling this system to support the millions of Amazon customers.
You: OK. Here is what I would do
This design question can be asked in multiple ways. For example,
- How do you design a scale the bestseller list for Amazon
- How do you design a best seller category for products sold in eBay
We will use the following high level steps to solve this problem.
High Level Steps:
- Scope the problem and clarify the requirements
- Do estimations
- High-Level Design
- Design core components
- Define API
- Detailed design (Depends on the time allocated for the interview)
- Resolve any bottlenecks in the design
- Summarize the solution
Step 1: Scope the problem and clarify the requirements
Define the product and service first. What is your startup/ service?
Amazon’s bestseller list is the list of top-selling items for each category. The list is updated hourly and the category encompasses toys, clothing, electronics, garden equipment, food items, etc. Each category will contain its own bestseller list. There could be multiple subcategories within each category too. For example, there can be a TV subcategory within the electronics category.
The scope is limited to:
- The amazon service that will sort the bestseller list every hour
- User will be able to view the best seller items by clicking on the category
- Service will be highly available
- System is reliable
Out of scope:
- Search capabilities
- Design layout
You can proceed to the next step only after confirming these assumptions with the interviewer
If the interviewer challenges you on out of scope requirements, then you can still stick to your script by letting them know that you will revisit the requirements at the end
Step 2: Do estimations
Before starting estimations, you would need to state some base assumptions to kickstart the calculations.
In this case, we are looking at the following assumptions and estimations…………
- An item can appear in multiple categories
- We are ignoring the subcategories for now
- The results are updated hourly (This is a real-world design feature. Amazon does update its best seller item list every hour)
- Assume there are 100 million products selling on Amazon currently
- Assume there are 1000 categories
- 10 million transactions an hour on average. We will judge the popularity of the item based on the transaction data
- The customer invokes the read API every time, they click on the category to find the bestseller list. Assume there are 1 Billion reads per hour
- Read to write ratio is 100:1
- Writes per sec = 10M/3600 = 2800 transactions or writes/sec
- Read per sec = 1B/3600 = 280,000 reads/sec
- Data storage. What is included in the data? Each data here is the item ID, category ID, timestamp, seller ID, Buyer ID, quantity, price, item description – 1 KB
- Data stored in an hour = 1KB*10M = 10 GB/hour
- The transaction data stored for 10 years = 10GB * 24* 365 * 10 = ~3876TB data to be stored
You can use the above calculations to create a high level design.
Step 3: High Level Design
What are the components involved in this design?
The transaction sales data is written to the database via the application server or the webserver. This is done using the write API. The written data include the data table with the customer ID, product ID, timestamp, product description, number of products purchased, and the price.
We need to leverage the sales data and create a ranking table. The best-seller ranking service will work on the sales data to create the ranking data based on the number of transactions and the products purchased in each transaction.
Whenever the user clicks on the best seller items for a category, the read API is invoked. The data table created from the ranking service is sent back for every read API. The sales ranking table should be refreshed every hour.
Step 4: Design core components
We would store two separate data tables for this feature or service. One for the sales transaction data and the other is the ranking table.
API for read and write:
There are two API needed, one to write and one to read.
Step 5: Detailed Design
In the detailed design, we will talk about scaling the system. We have a couple of options to scale the design
We will add memory caching design to speed up the data read and reduce latency in the design. Data stored in an hour = 1KB*10M = 10 GB/hour. We will cache 20% of the daily requests for faster response time. We will use the least recently used (LRU) scheme for replacing the cache data. The application servers, before hitting backend storage, can quickly check if the cache has the desired URL. Whenever there is a cache miss, our servers would be hitting the database. Then we will update the cache and pass the new entry to all the cache replicas.
We will add a load balancer to the design to improve the responsiveness. Load balancers can be added between the client and the webserver. The load balancers can also be added between the server to the data storage.
As we start scaling to accommodate more sales transactions, more categories, and bestseller lists, we need to plan for storing a large amount of data. We need to store these efficiently and scalability is a big issue. We should follow data sharding.
Sharding at the core is splitting your data up to where it resides in smaller chunks, spread across distinct separate buckets. A bucket could be a table, a Postgres schema, or a different physical database. Then as you need to continue scaling you’re able to move your shards to new physical nodes thus improving performance.
While data sharding, we need a key to use for the data. We can use userID, or product ID to create data sharding.
Step 6: Resolve Bottlenecks
Add redundancy to the design by adding backup servers to the design. We would also have multiple distributed databases, due to data sharding. Hence, we need servers that would aggregate the data back from different shards. These aggregator servers will be connected to the application servers. We need to add load balancers to the design for traffic distribution.
We will add content distribution network (CDN) to the design for scaling purposes. A content delivery network (CDN) refers to a geographically distributed group of servers that work together to provide fast delivery of Internet content.
Step 7: Summary
Finally, summarize the detailed design to the interviewer by going through the flow and confirming that the design meets the initial assumptions and constraints. Acknowledge that the next steps would be to work on excluded scope such as the custom URL option.
Hopefully, this example helps you understand solving system design questions. If you would like me to attempt other questions, then please leave a comment or reach out at [email protected]