Category: System Design

  • What Is Back-of-the-Envelope Estimation?

    Back-of-the-envelope estimation is a rough calculation used to estimate the feasibility, cost, or scale of a project or problem without delving into highly detailed analysis or precise data. It’s a quick and practical way to get an approximate answer based on reasonable assumptions.

    This technique is often used in system design, engineering, product planning, and brainstorming sessions to:

    • Determine whether an idea is worth pursuing.

    • Spot potential bottlenecks or challenges.

    • Validate the feasibility of solutions or architectures.

    Steps for a Back-of-the-Envelope Estimation:

    1. Define the Problem Clearly:

    • Understand what you’re estimating (e.g., storage requirements, bandwidth, number of servers).

    • Break the problem into smaller, manageable components.

    2. Make Assumptions:

    • Use approximate values based on industry standards or prior knowledge.

    • Err on the side of simplicity and round numbers for quick calculations.

    3. Perform Calculations:

    • Use basic arithmetic to combine assumptions and get the result.

    4. Validate Assumptions:

    • Ensure assumptions are reasonable.

    • Adjust estimates if the results seem unrealistic.

    5. State Assumptions Explicitly:

    • Communicate the assumptions clearly so others can understand and refine your calculations.

    Example 1: Estimating Storage for a Chat Application

    Problem: Estimate the annual storage requirements for a messaging app with 1 million daily active users (DAUs).

    Assumptions:

    1. Each user sends 50 messages per day.

    2. Average message size is 200 bytes.

    3. Messages are stored for 1 year.

    Calculation:

    1. Total messages per day = 1,000,000 users × 50 messages = 50,000,000 messages/day.

    2. Total storage per day = 50,000,000 messages × 200 bytes = 10 GB/day.

    3. Annual storage = 10 GB/day × 365 days = 3.65 TB/year.

    Result: You would need approximately 3.65 TB of storage per year.

    Example 2: Bandwidth for a Video Streaming Service

    Problem: Estimate the bandwidth required to stream 1080p videos to 10,000 concurrent users.

    Assumptions:

    1. 1080p video uses 5 Mbps.

    2. All users are watching simultaneously.

    Calculation:

    1. Bandwidth per user = 5 Mbps.

    2. Total bandwidth = 5 Mbps × 10,000 users = 50,000 Mbps = 50 Gbps.

    Result: You need a network capable of handling 50 Gbps of traffic.

    Common Use Cases:

    1. System Design:

    • Estimating server requirements (CPU, memory, disk).

    • Predicting data throughput or storage needs.

    2. Capacity Planning:

    • Assessing whether existing infrastructure can handle growth.

    3. Financial Projections:

    • Quick revenue or cost estimations for business models.

    4. Engineering Problems:

    • Estimating power consumption, materials, or load capacities.

    Tips for Effective Back-of-the-Envelope Estimations:

    1. Use Reasonable Numbers:

    • Simplify complex calculations by rounding numbers (e.g., use 1M instead of 1,048,576).

    2. Focus on Key Factors:

    • Ignore minor details or edge cases unless they significantly impact the result.

    3. Cross-Check Results:

    • Compare your estimate to known benchmarks or real-world data.

    4. Iterate:

    • Refine your assumptions and calculations if more precise data becomes available.

    Back-of-the-envelope calculations are invaluable in system design, initial project discussions, and rapid brainstorming.

  • Securely Storing Passwords in System – Password with Salt

    Using a password with salt is a common best practice for securely storing passwords in a system. It protects against rainbow table attacks by introducing a random value (the salt) to each password before it is hashed.

    What Is Salt?

    • A salt is a unique, random value that is added to a password before it is hashed.

    • It ensures that even if two users have the same password, their hashed values will be different because the salt is unique for each user.

    Steps for Storing Passwords with Salt:

    1. Generate a Salt:

    • Use a cryptographically secure random generator to create a unique salt for each user.

    2. Combine Salt and Password:

    • Append or prepend the salt to the user’s password.

    3. Hash the Combined Value:

    • Use a secure hashing algorithm (e.g., SHA-256, bcrypt, Argon2).

    4. Store the Salt and Hash:

    • Save both the salt and the resulting hash in your database.

    • Ensure the salt is stored in plaintext, as it is not secret.

    Example Implementation in Python

    Here’s an example using Python with the bcrypt library:

    Code Example:

    import bcrypt
    
    # Function to hash a password with salt
    
    def hash_password(password: str) -> tuple:
    
        # Generate a salt
    
        salt = bcrypt.gensalt()
    
        # Hash the password with the salt
    
        hashed_password = bcrypt.hashpw(password.encode('utf-8'), salt)
    
        return salt, hashed_password
    
    # Function to verify a password
    
    def verify_password(password: str, salt: bytes, stored_hash: bytes) -> bool:
    
        # Recompute the hash using the provided password and salt
    
        computed_hash = bcrypt.hashpw(password.encode('utf-8'), salt)
    
        return computed_hash == stored_hash
    
    # Example usage
    
    if __name__ == "__main__":
    
        user_password = "SecurePassword123!"
    
        # Hash the password
    
        salt, stored_hash = hash_password(user_password)
    
        print(f"Salt: {salt}")
    
        print(f"Stored Hash: {stored_hash}")
    
        # Verify the password
    
        is_valid = verify_password("SecurePassword123!", salt, stored_hash)
    
        print("Password is valid:", is_valid)

    Example Database Storage:

    Store the salt and hash separately in your database:

    User IDSaltHash
    1random_salt_valuehashed_password

    Best Practices:

    1. Use a Secure Salt Generator:

    • Always use cryptographically secure random number generators for salts.

    • In Python, os.urandom() or libraries like bcrypt are recommended.

    2. Avoid Weak Hashing Algorithms:

    • Do not use MD5 or SHA-1, as they are vulnerable to attacks.

    • Use algorithms designed for passwords, such as:

    bcrypt: Adds internal salt and is computationally expensive.

    Argon2: Modern and highly secure (recommended).

    PBKDF2: Widely used and secure.

    3. Don’t Reuse Salts:

    • Always generate a unique salt for each password.

    4. Limit Login Attempts:

    • Use rate-limiting or captchas to protect against brute-force attacks.

    Why Is Salting Important?

    Without a salt:

    • Identical passwords will have identical hashes.

    • Attackers can use precomputed hashes from rainbow tables to crack passwords.

    With a salt:

    • Each password gets a unique hash, making it infeasible for rainbow tables to be effective.

  • What is Kafka? Why is Kafka fast?

    What is Kafka?

    Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant, and real-time data processing. It is commonly used to build real-time data pipelines and streaming applications. Kafka was originally developed at LinkedIn and later open-sourced as part of the Apache Software Foundation.

    Kafka serves three primary purposes:

    1. Message Broker: Decouples producers (senders) and consumers (receivers) by acting as an intermediary.

    2. Stream Processor: Provides tools to process streams of data in real-time.

    3. Storage System: Durable storage of message streams for later retrieval and processing.

    Core Concepts of Kafka

    1. Producers:

    • Applications or services that write data (messages) to Kafka topics.

    2. Consumers:

    • Applications or services that read data from Kafka topics.

    3. Topics:

    • Categories or logical channels to which producers send messages and consumers subscribe.

    4. Partitions:

    • A topic is divided into one or more partitions for scalability and parallelism. Each partition stores an ordered log of messages.

    5. Brokers:

    • Kafka servers that store topic data. A Kafka cluster consists of multiple brokers.

    6. Replication:

    • Kafka ensures fault tolerance by replicating partitions across multiple brokers.

    7. Zookeeper (or KRaft in newer versions):

    • Used to manage Kafka cluster metadata, leader elections, and configurations (KRaft is Kafka’s native replacement for Zookeeper).

    Why is Kafka Fast?

    Kafka’s exceptional performance stems from its architectural design and optimized use of resources. Here are the key reasons:

    1. Sequential I/O and Batching

    • Kafka writes and reads messages sequentially to/from disk using append-only logs.

    • Sequential writes are much faster than random writes due to reduced disk seek overhead.

    • Messages are batched before being written to disk or sent over the network, further reducing I/O operations.

    2. Zero-Copy Mechanism

    • Kafka uses the sendfile() system call to transfer data directly from disk to network sockets without copying it into application memory.

    • This reduces context switches between user space and kernel space, leading to lower latency and higher throughput.

    3. Efficient Data Storage

    • Kafka does not track individual message acknowledgments; instead, it uses offset-based tracking.

    • Each message in a partition has a unique offset, and consumers maintain their own offsets.

    • This simplifies the storage model, reduces overhead, and increases throughput.

    4. Partitioning and Parallelism

    • Topics are split into partitions, which allow parallel processing.

    • Each partition can be independently processed by different consumers in a consumer group, making Kafka highly scalable.

    5. Replication with Leader-Follower Model

    • Kafka partitions have a leader that handles all reads and writes, while replicas act as backups.

    • The leader handles client requests, ensuring consistent and efficient operations.

    6. Asynchronous Processing

    • Producers and consumers communicate asynchronously, reducing blocking and latency.

    • Producers do not wait for consumers to process messages, and vice versa.

    7. Backpressure Handling

    • Kafka can handle a large number of messages without overwhelming the system by allowing consumers to process messages at their own pace.

    • Messages remain in Kafka until consumers explicitly commit that they’ve been processed.

    8. High Compression

    • Kafka supports message compression (e.g., GZIP, Snappy, LZ4) at the producer level.

    • Compressed messages reduce storage and network bandwidth usage.

    9. Minimized Coordination Overhead

    • Kafka reduces inter-node coordination by storing and replicating data at the partition level rather than managing individual messages.

    • Consumers maintain their own offsets, eliminating the need for Kafka to track message delivery statuses.

    Kafka vs Traditional Message Brokers

    FeatureKafkaTraditional Brokers
    (e.g., RabbitMQ)
    Data PersistenceStores messages persistently by defaultOften optimized for in-memory storage
    ScalabilityHorizontally scalable with partitionsLimited scalability
    Message Consumption Pull-based (consumers fetch messages)Push-based (brokers send messages)
    Performance Optimized for high throughput and low latencyGenerally slower
    Fault ToleranceBuilt-in replicationDepends on implementation

    Typical Use Cases for Kafka

    1. Log Aggregation:

    • Collect and centralize application logs for monitoring or analysis.

    2. Real-Time Data Pipelines:

    • Stream data between microservices or systems (e.g., ETL pipelines).

    3. Stream Processing:

    • Process data in real-time using Kafka Streams or tools like Apache Flink.

    4. Event Sourcing:

    • Store a record of every change to an application state.

    5. IoT and Sensor Data:

    • Collect and analyze data from IoT devices.

  • Designing an API rate limiter

    Designing an API rate limiter is crucial for maintaining system performance and preventing abuse. Here’s a high-level overview of how you can design a scalable API rate limiter:

    Components:

    1. Rate Limiter Middleware: This sits between the client and the server, handling rate limiting logic.
    2. Counter Storage: Stores the count of requests made by each user or IP address. In-memory data stores like Redis are commonly used for fast access.
    3. Rate Limiting Algorithm: Determines how to count and limit requests (e.g., token bucket, leaky bucket).

    High-Level Design:

    1. Client Request: The client sends a request to the API endpoint.
    2. Middleware Check: The rate limiter middleware checks the counter for the client’s IP/user ID.
    3. Counter Update: If the counter is below the limit, the middleware allows the request and increments the counter. If the limit is reached, the middleware blocks or throttles the request.
    4. Counter Expiry: The counter expires after a set time period, allowing the client to make requests again.

    Example Implementation:

    Here’s a simplified example using Redis for counter storage:

    #include <iostream>
    #include <string>
    #include <unordered_map>
    #include <chrono>
    
    class RateLimiter {
    public:
        RateLimiter(int limit, int timeWindow) : limit(limit), timeWindow(timeWindow) {}
    
        bool allowRequest(const std::string& clientId) {
            auto now = std::chrono::steady_clock::now();
            auto& counter = counters[clientId];
    
            if (now - counter.lastRequestTime > timeWindow) {
                counter.count = 0;
            }
    
            if (counter.count < limit) {
                counter.count++;
                counter.lastRequestTime = now;
                return true;
            } else {
                return false;
            }
        }
    
    private:
        struct Counter {
            int count = 0;
            std::chrono::steady_clock::time_point lastRequestTime;
        };
    
        int limit;
        int timeWindow;
        std::unordered_map<std::string, Counter> counters;
    };
    
    int main() {
        RateLimiter rateLimiter(10, std::chrono::seconds(60));
        std::string clientId = "client123";
    
        for (int i = 0; i < 15; ++i) {
            if (rateLimiter.allowRequest(clientId)) {
                std::cout << "Request allowed" << std::endl;
            } else {
                std::cout << "Request denied" << std::endl;
            }
        }
    
        return 0;
    }

    Explanation:

    • RateLimiter Class: Manages counters for each client.
    • allowRequest Method: Checks if the client has exceeded the rate limit.
    • Counter Expiry: Resets the counter after the time window expires.