postgresql's mvcc
I got to learn more about MVCC, through this blog post The parts of PostgreSQL we hate the most. Even though the post is about PostgreSQL's MVCC, it introduces MVCC nicely and goes into the intricacies of implementation -- walking through how one design decision impacts the design of the system.
multi-version concurrency control (MVCC)
- mechanism allows dbms to have multiple read and write queries
- never over write tuples, instead maintain man versions of the tuple
- when query executes, the dbms decides which versions the query sees
- satisfying some version ordering (typically timestamp)
- multiple queries can read older versions of tuples without getting blocked by update query
- high level MVCC design decisions
- how to store updates to existing tuples
- how to find correct version of tuples for query at runtime
- how to remove expired versions that are no longer visible
postgresql's MVCC
- append-only version storage scheme
- when query updates the dbms makes a copy and applies changes to new version
- instead of overwriting the original tuple
- dbms creates a version chain (a linked list of the different versions old-to-new (O2N) or new-to-old (N2O))
- "The O2N approach avoids the need for the DBMS to update indexes to point to a newer version of the tuple each time it’s modified. However, it may take longer for the DBMS to find the latest version during query processing, potentially traversing a long version chain."
- how does O2N avoid updating indexes (if i'm updating on a name and i change the name from amy to xavier wouldn't the index be out of order?)
- mentions that postgresql uses O2N and is almost alone in this decision
- "The O2N approach avoids the need for the DBMS to update indexes to point to a newer version of the tuple each time it’s modified. However, it may take longer for the DBMS to find the latest version during query processing, potentially traversing a long version chain."
- adds a new tuple for each version (to avoid traversing long version chains)
- heap-only tuple (HOT) update, tries to store the new version on the same page as the old version (reduce disk i/o -- increase locality)
- version vacuum
- original postgresql never removes dead versions
- table only grows, longer version chains, add extra indexes to jump to new versions -- bloating system
- add vacuum that periodically scans table sequentially to remove dead versions (versions that seen by running transactions)
- original postgresql never removes dead versions
- append-only version storage scheme
why postgresql MVCC is the worst
- version copying
- copys the whole tuple instead of just storing the diff
- table bloat
- write heavy workflows can overwhelm the vaccuum
- vacuum only removes dead tuples and doens't reclaim the empty pages on disk
- to reclaim we run the
VACUUM FULL
command (does defragmentation) which is expensive
- secondary index maintenance
- upon updating a tuple, dbms creates a copy of the tuple and updates it, then all primary and secondary indexes have to update to point to that new tuple (unless it's a HOT write)
- 50+% are not HOT updates, in commercial use
- "analysis of OtterTune customers’ PostgreSQL databases shows that roughly 46% of updates use the HOT optimization on average."
- vacuum management
- poor vacuum health can degrade the performance of the database over time
- long transaction can blocks vacuum -> more dead tuples -> longer transactions
- version copying
links to other mvcc resources