replication
Some notes from reading DDIA's replication chapter, the chapter kicked off with terms that I didn't understand like "read-your-writes" and "monotonic reads guarantees", and that was enough motivation to read the chapter and figure out what they ment!
- leader based replication, examples include:
- message brokers
- distributed file systems (whats the usecase of these? sounds interesting)
- replicated block devices (not sure what these are)
- questions/problems with leader base replication
- how do we know the write actually got to the leader
- what happens when leader goes out?
- asynchronous replication is widely used
- despite weak durability (leader dies, writes are loss cause they may not get broadcast-ed and replicated)
- "chain replication" (synchronous replication variant but addresses durability problem)
- pro is leader can continuously process writes
- despite weak durability (leader dies, writes are loss cause they may not get broadcast-ed and replicated)
- WAL shipping (don't really get this, should reread)
- row based log replication is what we did for ocaml_db?
- "triggers" and "stored procedures"
- triggers allow you to run code on the database system when performing a write/modification
- for now i'm thinking of this as the tee bash command, we can redirect part of the information elsewhere
- triggers allow you to run code on the database system when performing a write/modification
- "read-your-writes" (read-after-write consistency)
- users read the changes they submitted
- monotonic reads
- when you read data you can read not the latest data, but guarantees that you won't read less stale data
- in other words, you can't go back in time and read old data after reading newer datag
- one way to do this is to make each user always read from the same replica
- upon replica failure, we have to reroute users to a new replica (might cause problems here, such as reading stale data)
- "consistent prefix reads"
- guarantees that if writes happen in a particular order, anyone reading the writes will see them in that order
- typically needed in partitions/sharded databases (where writes could be out of order due to lag of partition updates)
- when do we want to use partitions (industry practice to use them) ?
- when working with eventually consistent systems, its good to ask how the application behaves if lag increased (to minutes or hours)
- multi-leader replication's major drawback is conflict resolution (how to resolve write conflicts)
- offline mode with each device being a leader, and once it's online device data then replicated (CouchDB designed for this)
- avoiding conflicts (force writes to go through a single leader) is the simplest solution
- last write wins is another strategy (prone to data loss)
- conflict resolution research
- crdt (two-way merge function)
- mergeable persistent data structures (three-way merge function)
- operational transformation (used by collaborative editors like google docs)
- there are many multi-leader topologies with different trade-offs
- circular and star - single point of failure could halt data propogating to other replicas
- all-to-all - could have race conditions resulting in conflicts between replicas
- version vectors (use this instead of timestamps/clocks to determine order)
- conflict detection is hard and its best to thoroughly read your db's documentation and test your db to make sure it provides the guarantees you want
- leaderless replication (became popular again because of dynamo, and is often called dynamo-style (don't confuse this with dynamodb which is single leader))
- typically defined by two mechanisms (not all dynamo systems implement both)
- read repair - detect stale data (typically with version number) and writes new data to replica with old data
- anti-entropy process - background process that monitors for inconsistent or missing data (does not copy writes in any particular order and may have long delays)
- typically defined by two mechanisms (not all dynamo systems implement both)
- quorum read and writes (n - the number of nodes; r - number of reads to confirm; w - number of writes to confirm)
- lower numbers for r/w means lower latency and higher availability, but also higher chance of stale data
- important to monitor staleness in dynamo systems (harder than leader systems)
- leader systems can track follower logs vs leader logs to see how far they lag
- leaderless systems there is no write order making it hard to track lag (Note: research area)
- sloppy quorums - optional in most dynamo implementations to increase write availability (makes system more durable)
- when you lose access to nodes that normally hold the data, but you have access to other nodes, temporarily write to those nodes
- when you have access to the correct nodes again, use "hinted handoff"
- concurrent write detection using version numbers (bit confused on this)
- extend version numbers to version vectors (vector clocks)