• Tiers
    • A tier is a logical separation of components in an application or service - database, backend app, user interface, messaging, caching
    • Single tier: user interface, backend business logic, database reside in the same machine
      • Pros: no network latency
      • Cons: hard to maintain once is shipped
    • Two-tier: client (user interface, business logic) & server (database)
      • Communication happens over the HTTP protocol (request-response model & stateless)
      • REST API takes advantage of the HTTP methodologies to establish communication between the client and the server
    • Three-tier: user interface, application logic, database reside in different machines
    • N-tier: more than 3 components involved - cache, message queues, load balancers,…
      • Single Responsibility Principle: a component has only a single responsibility
      • Separation of concerns: keep components separate, make them reusable
  • Scalability
    • Ability to withstand increased workload without sacrificing the latency
    • Latency can be divided into 2 parts:
      • Network latency: amount of time the network takes to send data packet from point A to B
      • Application latency: amount of time the application takes to process a user request
    • Type of scalability
      • Vertical scaling/scaling up: adding more power to server
        • Pros: not a lot of overhead on monitoring, operating and maintaining
        • Cons: single point of failure
      • Horizontal scaling/scaling out: adding more hardware to the existing resource pool
        • Pros: cheaper, better fault-tolerance
        • Cons: managing server is hard, writing distributed computing program is also challenging
    • Common bottlenecks that hurt scalability
      • Database latency
      • Poor application architecture
      • Not caching wisely
      • Inefficient configuration and load balancing
      • Adding business logic to the database
      • Badly written code
    • Common strategies to improve and test the scalability
      • Profiling
      • Cache wisely
      • Use a CDN
      • Compress data
      • Avoid unnecessary round trips between client and sever
      • Run load & stress tests
  • High Availability
    • Ability to stay online despite having failures at the infrastructural level in real-time
    • Common reasons for system failures
      • Software crashes
      • Hardware crashes
      • Human error
      • Planned downtime
    • A common way to add more availability is to have redundancy - duplicating the components & keeping them on standby to take over in case the active instances go down
  • Monolithic & Microservices
    • Monolithic: entire application code in a single service
      • Pros: simple to develop, test, deploy as everything resides in one repo
      • Cons:
        • Continuous deployment means re-deploying the entire application
        • Single point of failure
        • Hard to scale
    • Microservices: tasks are split into separate services forming a larger service as a whole
      • Pros:
        • No single point of failure
        • Easier to scale independently
      • Cons:
        • Difficult to manage
        • No strong consistency
  • Database
  • Caching
    • Ensure low latency and high throughput
    • Strategies
      • Cache Aside:
        • First look in the cache, return if present, else fetch from the database and update cache
        • Has a TTL (Time To Live) period to sync up data
        • Works well for read-heavy workloads like user profile data
      • Read-through
        • Similar to Cache Aside, but the cache is always up-to-date
      • Write-through
        • Cache before writing to database
        • Works well for write-heavy workloads like MMOs
      • Write-back
        • Similar to Write-through, but add some delay before writing to database
  • Message queue
  • Stream processing
    • Layers of data processing setup:
      • Data collection/query layer
      • Data standardization layer
      • Data processing layer
      • Data analysis layer
      • Data visualization layer
      • Data storage layer
      • Data security layer
    • Ways to ingest data:
      • Real-time
      • Batching
    • Challenges:
      • Formatting, standardizing, converting data from multiple resources is a slow and tedious process
      • It’s resource-intensive
      • Moving data around is risky
    • Use cases:
      • Moving data into Hadoop
      • Streaming data to Elastic search
      • Log processing
      • Real-time streaming
    • Distributed data processing
      • Diverge large amounts of data to several different nodes for parallel processing
      • Popular frameworks:
        • MapReduce - Apache Hadoop
        • Apache Spark
        • Apache Storm
        • Apache Kafka
    • Architecture
      • Lambda leverages both real-time and batching process that consists 3 layers
        • Batch: deals with results from the batching process
        • Speed: gets data from the real-time streaming process
        • Serving: combines the results from the Batch and Speed layers
      • Kappa has only a single pipeline and only contains Speed and Serving layers
        • Preferred if the batch and the streaming analytics results are fairly identical
    • Real life implementations
  • Other architectures
    • Event-driven: capable of handling a big number of concurrent requests with minimal resources
    • WebHooks: have an event-based mechanism that only fires an HTTP event to consumers whenever new info is available
    • Share Nothing: every module has its own environment
    • Hexagonal:
      • Port: act as an API, interface
      • Adapter: an implementation of the interface, convert data from Port to be consumed by Domain
      • Domain: contain business logic
    • Peer to Peer: nodes can communicate with each other without the need of a central server
    • Decentralized social network

References: