A Brief History of Scaling LinkedIn

Started as a single monolithic application, Leo, that hosted various pages, handled business logic and connected to a handful of databases Needed to manage a network of member connections and scale independent of Leo so built a new system for their member graph Used Java RPC for communication, Apache Lucene for search capabilities Introduced replica DBs as the site grew To keep replica DBs in sync, built data capture system, Databus, then open-sourced it Observed that Leo was often going down in production, difficult for the team to troubleshoot, recover, release new code Killed Leo Broke it up into many small services Frontend: fetch data models from different domains, presentation logic Mid-tier: provide API access to data models and add more layer of cache (memcache/couchbase/Voldemort) Backend: provide consistent access to its database Developed data pipelines for streaming and queueing data that later became Apache Kafka Empowered Hadoop jobs Built realtime analytics Improved monitoring and alerting In 2011, kicked off an internal initiative, Inversion Paused on feature development Focused on improving tooling and deployment, infrastructure, and developer productivity Got rid of Jave RPC because it was inconsistent across team as well as tightly coupled and built Rest....

March 24, 2020 ยท 2 min