CMU 15-445 Lecture #23: Distributed OLTP Databases

2024-06-17 约 1124 字预计阅读 3 分钟

CMU 15-445 Database Systems

We have not discussed how to ensure that all nodes agree to commit a txn and then to make sure it does commit if the DBMS decides it should.
- → What happens if a node fails?
- → What happens if messages show up late?
- → What happens if the system does not wait for every node to agree to commit?

如果有节点发送了ABORT的请求，协调者会回复应用事务回滚，然后向所有节点发送ABORT指令
2PC OPTIMIZATIONS
- Early Prepare Voting (Rare)
  - → If you send a query to a remote node that you know will be the last one to execute in this txn, then that node will also return their vote for the prepare phase with the query result.(如果知道是事务的最后一句我自己自己投票，不用等用户指令)
- Early Ack After Prepare (Common)
  - → If all nodes vote to commit a txn, the coordinator can send the client an acknowledgement that their txn was successful before the commit phase finishes.(提交之前先后告诉应用本次txn成功了，也有一定数据安全的风险)

Consensus protocol where a coordinator proposes an outcome (e.g., commit or abort) and then the participants vote on whether that outcome should succeed（协调员提出来一个请求，大家投票决定）
Does not block if a majority of participants are available and has provably minimal message delays in the best case.（可用情况下少数服从多数）

Two Phase Commit: Blocks if coordinator fails after the prepare message is sent, until coordinator recovers.
Paxos: Nonblocking if a majority participants are alive, provided there is a sufficiently long period without further failures.
Raft: Similar to Paxos but with fewer node types. Only nodes with most up-to-date log can become leaders.
Multi-Paxos: If the system elects a single leader that oversees proposing changes for some period, then it can skip the propose phase. The system periodically renews who the leader is using another Paxos round. When there is a failure, the DBMS can fall back to full Paxos.

副本，冗余存储保证稳定性
Design Decisions:
- → Replica Configuration
- → Propagation Scheme
- → Propagation Timing
- → Update Method

Replica Configuration
Approach #1: Primary-Replica
- → All updates go to a designated primary for each object.
- → The primary propagates updates to its replicas by shipping logs.
- → Read-only txns may be allowed to access replicas.
- → If the primary goes down, then hold an election to select a new primary.
Approach #2: Multi-Primary
- → Txns can update data objects at any replica.
- → Replicas must synchronize with each other using an atomic commit protocol.

Propagation Scheme
- synchronous scheme: 主库提交事务的时候要卡住去通知从节点，当从节点也提交成功后才告诉用户成功了
- asynchronous scheme: 主库提交完了就拉到，不管后续
- synchronous, asynchronous
- 折中解决方案：半同步：等日志传送到备库，不用等备库执行完（MySQL）

Propagation Timing
- Continuous：持续不断向备库传播，难的就是回滚的问题，要主备一起回滚
- On Commit：The DBMS only sends the log messages for a txn to the replicas once the txn is commits.（主库提交完了才给备库发日志），不用在回滚上面浪费时间

Active vs Passive
- Active-Active：主库给备库的是SQL语句，备库还要重新执行一遍
- Active-Passive：主库给备库的物理日志，备库直接执行日志就行
- 物理日志大，SQL执行慢，现实中往往是混合传播（MySQL）