Designing Data-Intensive Applications by Martin Kleppmann

4 fundamental ideas that we need in order to design data-intensive applications. Reliable, scalable, maintainable applications. Reliability means continuing to work correctly, even when things go wrong. Common faults and preventions include: Hardware faults: hard disks crash, blackout, incorrect network configuration,… Add redundancy to individual hardware components to reduce the failure rate. As long as we can restore a backup onto a new machine quickly, the downtime is not fatal....

July 5, 2020 · 25 min

High Level System Design Walkthrough

Photo/Video Sharing Service Photo/Video Sharing Service lets users upload photos/videos and share them with other users. Similar services Instagram Youtube Netflix Requirements clarifications Functional requirements: Users can upload, view, like/dislike, comment, download photos/videos. Photos/videos stats are recorded: numbers of likes/dislike, views, etc. Users can upload as many photos/videos as they like. Users can search for photos/videos titles, and other usernames. Users can follow other users. Users can have a News Feed consisting of recent photos/videos from all of their following users....

October 15, 2020 · 7 min

Modern Web Architectural Components

Tiers A tier is a logical separation of components in an application or service - database, backend app, user interface, messaging, caching Single tier: user interface, backend business logic, database reside in the same machine Pros: no network latency Cons: hard to maintain once is shipped Two-tier: client (user interface, business logic) & server (database) Communication happens over the HTTP protocol (request-response model & stateless) REST API takes advantage of the HTTP methodologies to establish communication between the client and the server Three-tier: user interface, application logic, database reside in different machines N-tier: more than 3 components involved - cache, message queues, load balancers,… Single Responsibility Principle: a component has only a single responsibility Separation of concerns: keep components separate, make them reusable Scalability Ability to withstand increased workload without sacrificing the latency Latency can be divided into 2 parts: Network latency: amount of time the network takes to send data packet from point A to B Application latency: amount of time the application takes to process a user request Type of scalability Vertical scaling/scaling up: adding more power to server Pros: not a lot of overhead on monitoring, operating and maintaining Cons: single point of failure Horizontal scaling/scaling out: adding more hardware to the existing resource pool Pros: cheaper, better fault-tolerance Cons: managing server is hard, writing distributed computing program is also challenging Common bottlenecks that hurt scalability Database latency Poor application architecture Not caching wisely Inefficient configuration and load balancing Adding business logic to the database Badly written code Common strategies to improve and test the scalability Profiling Cache wisely Use a CDN Compress data Avoid unnecessary round trips between client and sever Run load & stress tests High Availability Ability to stay online despite having failures at the infrastructural level in real-time Common reasons for system failures Software crashes Hardware crashes Human error Planned downtime A common way to add more availability is to have redundancy - duplicating the components & keeping them on standby to take over in case the active instances go down Monolithic & Microservices Monolithic: entire application code in a single service Pros: simple to develop, test, deploy as everything resides in one repo Cons: Continuous deployment means re-deploying the entire application Single point of failure Hard to scale Microservices: tasks are split into separate services forming a larger service as a whole Pros: No single point of failure Easier to scale independently Cons: Difficult to manage No strong consistency Database Forms of data: Structured: conforms to a certain structure, stored in a normalized fashion Unstructured: no definite structure, could be text, image, video, multimedia files, machine-generated data Semi-structured: mix of structured and unstructured data, stored in XML or JSON User state: user logs and activity on the platform Why the need for NoSQL while relational database is still doing fine?...

March 28, 2020 · 6 min

A Brief History of Scaling LinkedIn

Started as a single monolithic application, Leo, that hosted various pages, handled business logic and connected to a handful of databases Needed to manage a network of member connections and scale independent of Leo so built a new system for their member graph Used Java RPC for communication, Apache Lucene for search capabilities Introduced replica DBs as the site grew To keep replica DBs in sync, built data capture system, Databus, then open-sourced it Observed that Leo was often going down in production, difficult for the team to troubleshoot, recover, release new code Killed Leo Broke it up into many small services Frontend: fetch data models from different domains, presentation logic Mid-tier: provide API access to data models and add more layer of cache (memcache/couchbase/Voldemort) Backend: provide consistent access to its database Developed data pipelines for streaming and queueing data that later became Apache Kafka Empowered Hadoop jobs Built realtime analytics Improved monitoring and alerting In 2011, kicked off an internal initiative, Inversion Paused on feature development Focused on improving tooling and deployment, infrastructure, and developer productivity Got rid of Jave RPC because it was inconsistent across team as well as tightly coupled and built Rest....

March 24, 2020 · 2 min

Object Oriented Design Patterns

Facebook Admin: add/modify members Member: seach for other members, groups, pages, posts, as well as send friend requests, create posts System: send notifications for new messages, friend requests Amazon Admin: add/modify products and users Member: search the catalog, add/remove items to the shopping cart, options to pay System: send notifications for orders and shipping updates LinkedIn Member: search for other members, companies or jobs, send requests for connection, create posts System: send notifications for new messages, connections invites Stack Overflow Admin: add/modify members Member: search/view/add/modify questions, answers, and comments Moderator: same as member, in addition to which one can close/delete/undelete any question System: send notifications, assign badges to members Library Management System Librarian: add/modify books and users Member: search the catalog, check-out, reserve, renew, return books System: send notifications for overdue books, cancel reservations Parking Lot Admin: add/modify parking spots and attendants Customer: have parking tickets, options to pay Movie Ticket Booking Admin: add/modify movies, tickets, customers Customer: view movie schedules, book/cancel tickets System: send notifications for new movies, bookings, cancellations Card Game Dealer: deal cards and game resolution Player: places bets, accept/decline offered resolution Hotel Management System Member: search the available rooms, make bookings Receptionist: add/modify rooms, create room bookings, check-in, and check-out customers System: send notifications for room booking, cancellation Manager: add/modify housekeeping/service record of rooms Restaurant Management System Receptionist: add/modify tables, layout, reservations Waiter: take/modify orders Manager: add/modify the menu Chef: view/work on an order Cashier: generate checks and process payments System: sending notifications, table reservations, cancellations Stock Brokerage System Admin: add/modify members Member: search the stock inventory, buy/sell stocks System: send notifications for stock orders References:...

February 15, 2020 · 2 min