If you are new to this blog, then read How to ace the system design interview blog first. I have described the guidelines involved in solving a system interview problem. We will be applying those steps here.
Interviewer: Walks in the room. After the initial introductions, “ I would like to test you on your system design skills”
You: Sure! (Internally you are just hoping that the problem is a easy one :))
Interviewer: Let’s start by designing a system for a service like Twitter timeline. How would you go about designing such a system?
You: Ok. Here is what I would do
Then you follow the steps given below.
We will use the following high level steps to solve this problem.
High Level Steps:
- Scope the problem and clarify the requirements
- Do estimations
- High-Level Design
- Design core components
- Define API
- Detailed design (Depends on the time allocated for the interview)
- Resolve any bottlenecks in the design
- Summarize the solution
Step 1: Scope the problem and clarify the requirements
Define the product and service first. What is Twitter and hows does the Twitter timeline work?
Twitter is a social network where a user can post short posts or “tweets” up to 280 character limit. Earlier it used to be 140 characters and now Twitter has changed the limit to 280. Users can also post, like, retweet and share tweets from other users in the network. You have to be a registered user to post tweets, share and follow other users and un-registered users can only read the tweets.
The scope is limited to:
- User registration
- User can post tweets
- User can Like tweets
- User can view the timeline
- The timeline consists of top tweets from other twitter accounts followed
- Timeline updates and refreshes every time the user logs in
- Service has high availability
- Tweets might include media such as photos or videos
Out of scope:
- Tweet search
- Replying to tweets
- Trending topic or Explore option in Twitter
- Private Twitter lists
- Twitter suggestions
- Twitter ads
You can proceed to the next step only after confirming these assumptions with the interviewer
If the interviewer challenges you on out of scope requirements, then you can still stick to your script by letting them know that you will revisit the requirements at the end
Step 2: Do estimations
Before starting estimations, you would need to state some base assumptions to kickstart the calculations.
In this case, we are looking at the following assumptions and estimations…………
- Number of users – 100 M users
- Number of tweets a day – 200 M/ day
- Tweets are 280 characters but most tweet average is still 140 characters.
- Tweet size is 140 *2 bytes = 280 bytes. Adding metadata, tweet size is 300 bytes
- Storage for tweets per day = 200 M * 300bytes = 60 GB/day
- Media tweets assumption
- Photo or Video – One is ten tweets have photo or video included
- Photo tweet 500 KB and Video tweet 5 MB
- Media tweet storage per day = (200 M/10 * 500KB) + (200M/10 * 5 MB) ~= 110 TB/day
- Data storage – Assume links are stored for 5 year
- Data stored in 1 day = 110TB/day
- Data stored in 5 year = 110 TB * 5* 365 = ~200 PB of data to be stored
- Twitter Timeline Calculations:
- The user follows other users and also favorites tweets
- Timeline for a user include top tweets from the Twitter account followed
- Assume each tweet is Fan out or sent to 10 subscriber feeds
- Number of fan outs a day is 10* 200M/day = 2B/ day
You can use the above calculations to create a high level design.
Step 3: High Level Design
At a high level, the user sends a tweet from his system, PC or mobile device. The web or application server leverages the write API to capture the tweet and store it to the database. The server will push the media content to a separate object-store.
The design will also include a server for creating the content that needs to be displayed on the user’s timeline. The content for the timeline comes from the accounts that the user follows. We pick the latest content from the accounts (based on creation time) and group them up to display the tweets on the timeline. We keep a limit of 50 tweets to display on the timeline.
Step 4: Design Core Components
We will design core components that are essential to define the timeline service and then create a detailed design.
The data tables would include three (there will be more in a real design)
Tweet data: Includes the Tweet ID as key, user ID ,tweet data, creation time, GPS info
User data: Includes user ID as key, name, email, other user info captured
Followers: User ID of all accounts followed by the user
Timeline generation is quite complex. You need a separate timeline generation server that connects to the web or application servers. The timeline service keeps track of the latest tweets from the users in the followers table and updates or refreshed the timeline of the user, every time the user logs in. We do not design any ranking service here but assume that the latest top 5 tweets from the user followers are displayed in the timeline, based on the creation time. We can keep a cutoff of 50 tweets refresh. Once that is reached, we still stop refreshing or creating a timeline until the user refreshes the page.
Live generation of user feed will result in high latency and performance issues. Instead, the solution to speed up performance is to create an offline feed that can be displayed instantaneously. Run dedicated timeline servers that constantly ping the application server to update the feed based on creation time.
The ranking algorithm should look at key signals and create a weight to ensure that a timeline for a user is not dominated by content from one or accounts that the user follows. More specifically, we can select features that are relevant to the importance of any feed item such as the number of likes, comments, shares, time of the update, and whether the post has images/video. We should use each of these features to rank the tweet and then use that rank to display tweets on the timeline
If one of the followers pushes content periodically more than other users, then there is a high chance that a timeline or feed is filled only by the user generating high content. In this case, we need to have a strategy other than creation time only to fill the feed.
Should we always notify users if there are new posts available for their newsfeed? It could be useful for users to get notified whenever new data is available. However, on mobile devices, where data usage is relatively expensive, it can consume unnecessary bandwidth. Hence, at least for mobile devices, we can choose not to push data, instead, let users “Pull to Refresh” to get new posts.
We have a lot of data (PetaByte range) and we need to store these efficiently and scalability is a big issue. We should follow data sharding. Sharding at the core is splitting your data up to where it resides in smaller chunks, spread across distinct separate buckets. A bucket could be a table, a Postgres schema, or a different physical database. Then as you need to continue scaling you’re able to move your shards to new physical nodes thus improving performance.
While data sharding, we need a key to use for the data. We can use userID or tweet ID or tweet creation time or combinations of these different ID to create data sharding.
Step 5: Detailed Design
Let’s put together a detailed flow including the application servers, and timeline generation servers. We need multiple aggregation servers distributed across the different Geos for faster data aggregation and responding to API requests.
We would also have multiple distributed databased and hence need servers that would aggregate the data back from different shards. These aggregator servers will be connected to the application servers. Finally, we also need to add load balancers to the design for traffic distribution.
Step 6: Resolve Bottlenecks
The key generation service is a bottleneck and we will solve it by adding a backup key server that will be a mirror copy of the original server. We will cache 20% of the requests for faster response time.
Step 7: Summary
Finally, summarize the detailed design to the interviewer by going through the flow and confirming that the design meets the initial assumptions and constraints. Acknowledge that the next steps would be to work on excluded scope such as the search option (need a search API), twitter lists, etc.
Hopefully, this example helps you understand solving system design questions. If you would like me to attempt other questions, then please leave a comment or reach out at firstname.lastname@example.org