Designing systems to display the best selling products on Amazon

Designing systems to display the best selling products on Amazon

If you are new to this blog, then read How to ace the system design interview blog first. I have described the guidelines involved in solving a system interview problem. We will be applying those steps here.

Interview Scenario:

Interviewer: Walks in the room. After the initial introductions, “ I would like to test you on your system design skills”

You: Sure! (Internally you are just hoping that the problem is a easy one :))

Interviewer: Let’s start by assuming that you are a program manager working at Amazon. Are you familiar with the best seller category? Amazon updates the best selling products in each category every hour. The customer can check out these products and buy them via one click. How would you go about designing the best seller category? I am looking for technical system design for this feature. I am also interested in your thoughts about scaling this system to support the millions of Amazon customers.

You: OK. Here is what I would do

This design question can be asked in multiple ways. For example,

  • How do you design a scale the bestseller list for Amazon
  • How do you design a best seller category for products sold in eBay

We will use the following high level steps to solve this problem.

High Level Steps:

  1. Scope the problem and clarify the requirements
  2. Do estimations
  3. High-Level Design
  4. Design core components
    1. Define API
  5. Detailed design (Depends on the time allocated for the interview)
  6. Resolve any bottlenecks in the design
  7. Summarize the solution

Step 1: Scope the problem and clarify the requirements

Define the product and service first. What is your startup/ service?

Amazon’s bestseller list is the list of top-selling items for each category. The list is updated hourly and the category encompasses toys, clothing, electronics, garden equipment, food items, etc. Each category will contain its own bestseller list. There could be multiple subcategories within each category too. For example, there can be a TV subcategory within the electronics category.

The scope is limited to:

  • The amazon service that will sort the bestseller list every hour
  • User will be able to view the best seller items by clicking on the category
  • Service will be highly available
  • System is reliable

Out of scope:

  • Search capabilities
  • Subcategories
  • Design layout

You can proceed to the next step only after confirming these assumptions with the interviewer

If the interviewer challenges you on out of scope requirements, then you can still stick to your script by letting them know that you will revisit the requirements at the end

Step 2: Do estimations

Before starting estimations, you would need to state some base assumptions to kickstart the calculations.

In this case, we are looking at the following assumptions and estimations…………

  • An item can appear in multiple categories
  • We are ignoring the subcategories for now
  • The results are updated hourly (This is a real-world design feature. Amazon does update its best seller item list every hour)
  • Assume there are 100 million products selling on Amazon currently
  • Assume there are 1000 categories
  • 10 million transactions an hour on average. We will judge the popularity of the item based on the transaction data
  • The customer invokes the read API every time, they click on the category to find the bestseller list. Assume there are 1 Billion reads per hour
  • Read to write ratio is 100:1
  • Writes per sec = 10M/3600 = 2800 transactions or writes/sec
  • Read per sec = 1B/3600 = 280,000 reads/sec
  • Data storage. What is included in the data? Each data here is the item ID, category ID, timestamp, seller ID, Buyer ID, quantity, price, item description – 1 KB
  • Data stored in an hour = 1KB*10M = 10 GB/hour
  • The transaction data stored for 10 years = 10GB * 24* 365 * 10 = ~3876TB data to be stored

You can use the above calculations to create a high level design.

Step 3: High Level Design

What are the components involved in this design?

The transaction sales data is written to the database via the application server or the webserver. This is done using the write API. The written data include the data table with the customer ID, product ID, timestamp, product description, number of products purchased, and the price.

We need to leverage the sales data and create a ranking table. The best-seller ranking service will work on the sales data to create the ranking data based on the number of transactions and the products purchased in each transaction.

Whenever the user clicks on the best seller items for a category, the read API is invoked. The data table created from the ranking service is sent back for every read API. The sales ranking table should be refreshed every hour.

System Design For Amazon Sales Ranking Feature

Step 4: Design core components

Database Design:

We would store two separate data tables for this feature or service. One for the sales transaction data and the other is the ranking table.

API for read and write:

There are two API needed, one to write and one to read.

Step 5: Detailed Design

In the detailed design, we will talk about scaling the system. We have a couple of options to scale the design

We will add memory caching design to speed up the data read and reduce latency in the design. Data stored in an hour = 1KB*10M = 10 GB/hour. We will cache 20% of the daily requests for faster response time. We will use the least recently used (LRU) scheme for replacing the cache data. The application servers, before hitting backend storage, can quickly check if the cache has the desired URL. Whenever there is a cache miss, our servers would be hitting the database. Then we will update the cache and pass the new entry to all the cache replicas.

We will add a load balancer to the design to improve the responsiveness. Load balancers can be added between the client and the webserver. The load balancers can also be added between the server to the data storage.

Data Storage:

As we start scaling to accommodate more sales transactions, more categories, and bestseller lists, we need to plan for storing a large amount of data. We need to store these efficiently and scalability is a big issue. We should follow data sharding.

Sharding at the core is splitting your data up to where it resides in smaller chunks, spread across distinct separate buckets. A bucket could be a table, a Postgres schema, or a different physical database. Then as you need to continue scaling you’re able to move your shards to new physical nodes thus improving performance.

While data sharding, we need a key to use for the data. We can use userID, or product ID to create data sharding.

Detailed System Design For Amazon Sales Ranking Feature

Step 6: Resolve Bottlenecks

Add redundancy to the design by adding backup servers to the design. We would also have multiple distributed databases, due to data sharding. Hence, we need servers that would aggregate the data back from different shards. These aggregator servers will be connected to the application servers. We need to add load balancers to the design for traffic distribution.

We will add content distribution network (CDN) to the design for scaling purposes. A content delivery network (CDN) refers to a geographically distributed group of servers that work together to provide fast delivery of Internet content.

A CDN allows for the quick transfer of assets needed for loading Internet content including HTML pages, JavaScript files, stylesheets, images, and videos. The popularity of CDN services continues to grow, and today the majority of web traffic is served through CDNs, including traffic from major sites like Facebook, Netflix, and Amazon.

Step 7: Summary

Finally, summarize the detailed design to the interviewer by going through the flow and confirming that the design meets the initial assumptions and constraints. Acknowledge that the next steps would be to work on excluded scope such as the custom URL option.

Hopefully, this example helps you understand solving system design questions. If you would like me to attempt other questions, then please leave a comment or reach out at [email protected]

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments