# Study Guide: Designing Facebook Newsfeed This guide synthesizes insights from **HelloInterview**, **Grokking the System Design Interview**, and **ByteByteGo**. It highlights the key concepts, requirements, challenges, and optimal solutions for designing a scalable and performant newsfeed system. --- ## Problem Understanding and Requirements ### Functional Requirements - Users should: 1. Be able to create posts with text, images, or videos. 2. Follow other users, pages, or groups. 3. View a newsfeed of posts from their connections, ordered chronologically or based on relevance. 4. Fetch additional posts while scrolling (pagination). ### Non-Functional Requirements - **High scalability** to support billions of users. - **Low latency**: - Feed generation under 2 seconds. - Post propagation to followers’ feeds within 5 seconds. - **Consistency**: - Eventual consistency acceptable with tolerable delays. - **High availability**. --- ## High-Level Design ### Core Components 1. **Feed Publishing** - Handles post creation and storage in the database. - Updates followers’ feeds for new posts. 2. **Feed Generation** - Assembles and ranks posts for a user’s newsfeed. - Employs caching and precomputed feeds for efficiency. 3. **Notification Service** - Alerts users to new posts or activity. 4. **Media Storage** - Stores images and videos in a content delivery network (CDN) for fast access. 5. **Database and Cache** - Stores user data, posts, and feed data. - Uses distributed cache for high-frequency data retrieval. --- ## Database Design ### Tables - **User**: Stores user details. - **Follow**: Tracks relationships between users. - **Post**: Contains post data with user and timestamp references. - **Feed** (optional): Precomputed feeds for users. ### Indexes - Use partition keys for efficient lookups: - User ID for posts. - Follower-followed relationships in the Follow table. --- ## Key Design Challenges and Solutions ### 1. Scaling Feed Generation #### Challenges - High fan-out for users with many followers. - Efficient ranking and sorting of posts. #### Solutions - **Fan-Out Models**: - **Fan-out-on-Write** (Push): - Precompute feeds when a post is created. - Efficient for frequent users; costly for large followings. - **Fan-out-on-Read** (Pull): - Assemble feeds dynamically on request. - Avoids wasted computation for inactive users. - **Hybrid**: - Push for regular users. - Pull for high-follower accounts (e.g., celebrities). - **Async Workers**: - Use message queues and workers to process fan-out asynchronously. - Stripe tasks by follower segments for balanced load distribution. ### 2. Handling Users with Many Followers #### Challenges - Writing posts to millions of feeds. #### Solutions - Partition followers for parallel processing. - Limit precomputation for inactive or low-priority users. ### 3. Ensuring Fast Reads #### Challenges - Large data volume leads to potential delays. #### Solutions - Use distributed caches (e.g., Redis) for hot data. - Implement a replicated cache for highly popular posts to reduce hotspot issues. ### 4. Live Updates and Notifications #### Challenges - Delivering real-time updates for new posts. #### Solutions - **Polling**: Periodic client requests for updates. - **Push Notifications**: Notify active users of new content. - **Server-Sent Events (SSE)**: Efficient server-to-client updates for active users. --- ## APIs ### Feed Publishing API - **Endpoint**: `POST /v1/posts` - **Payload**: `{"content": "Hello World!", "media": "image.jpg"}` ### Feed Retrieval API - **Endpoint**: `GET /v1/feed` - **Params**: `user_id`, `last_post_id` (for pagination). --- ## Comparative Insights from Sources | Aspect | HelloInterview | Grokking | ByteByteGo | |-----------------------|-----------------------------------------------|-------------------------------------------|------------------------------------------| | **Fan-out Model** | Hybrid (Push & Pull) | Focus on Fan-out-on-Write | Fan-out-on-Write with optimizations | | **Caching** | Distributed feed cache | Cache precomputed feeds | Separate feed and post caches | | **Updates** | Polling and Push for real-time updates | Long polling for live feeds | Hybrid of polling and server-side push | | **Scaling Solutions** | Partition followers for write efficiency | Pre-generate feeds offline | Consistent hashing to balance hot keys | | **Latency Focus** | Low latency with eventual consistency | 2-second limit for feed generation | Push for active users, pull for inactive | --- ## Final Tips for Interview Success 1. **Clarity in Assumptions**: - Clarify use cases and constraints (e.g., follower limits, active user ratio). 2. **Iterative Design**: - Start with naive solutions, then incrementally optimize. 3. **Trade-offs Discussion**: - Highlight the pros/cons of design choices (e.g., Push vs. Pull). 4. **Focus on Scalability**: - Stress handling of fan-out and large user bases. 5. **Visualization**: - Use diagrams to show system components and flows. --- This study guide combines the strengths of all three resources to provide a comprehensive overview for designing a scalable and robust newsfeed system.