CMU 15-445 Lecture #18: Timestamp Ordering Concurrency Control
CMU 15-445 Database Systems
Lecture #18: Timestamp Ordering Concurrency Control
Timestamp Ordering Concurrency Control
-
纯用锁很影响性能,锁是一个悲观的方法
-
乐观的方法:用时间戳
-
If $TS(T_i) < TS(T_j)$, then the DBMS must ensure that the execution schedule is equivalent to the serial schedule where $T_i$appears before $T_j$ .
-
Multiple implementation strategies:
- → System/Wall Clock.:不可能完全准确,一般不用
- → Logical Counter.:单机系统一般用这个
- → Hybrid.:分布式系统用这个
Basic Timestamp Ordering (BASIC T/O)
-
Every object X is tagged with timestamp of the last txn that successfully did read/write:时间戳也分两种
- → W-TS(X) – Write timestamp on X
- → R-TS(X) – Read timestamp on X
-
Check timestamps for every operation:
- → If txn tries to access an object “from the future”, it aborts and restarts.(不能操作“未来”的数据)
-
BASIC T/O – READS
- Don’t read stuff from the “future.”
- Action: Transaction Ti wants to read object X.
- If TS(Ti) < W-TS(X), this violates the timestamp order of Ti with regard to the writer of X.
- → Abort Ti and restart it with a new TS.
- Else:
- → Allow Ti to read X.
- → Update R-TS(X) to max(R-TS(X), TS(Ti))
- → Make a local copy of X to ensure repeatable reads for Ti.
-
BASIC T/O – WRITES
- Can’t write if a future transaction has read or written to the object.(不能写未来读过和写过的数据)
- Action: Transaction Ti wants to write object X.
- If TS(Ti) < R-TS(X) or TS(Ti) < W-TS(X)
- → Abort and restart Ti.
- Else:
- → Allow Ti to write X and update W-TS(X)
- → Also, make a local copy of X to ensure repeatable reads.
-
Thomas Write Rule
- 对上述理论进行优化
- If TS(Ti) < R-TS(X) (未来有事务读了这个数据)
- Abort and Restart Ti
- If TS(Ti) < R-WS(X) (未来有事务写了这个数据,等效成我写了然后未来被覆盖掉了)
- The DBMS can instead ignore the write and allow the transaction to continue instead of aborting and restarting it. This is called the Thomas Write Rule.
-
BASIC T/O总结
- 优点:无锁,无死锁
- 缺点:对于长的事务可能会饥饿(一直rollback),前面的事务一旦修改数据后回滚那么后面的事务会读到错误的数据,读数据的时候需要copy一份到本地,如果读的数据过多,那么开销会很大
Optimistic Concurrency Control (OCC)
- Also based on timestamp
-
The DBMS creates a private workspace for each txn.
- → Any object read is copied into workspace.
- Modifications are applied to workspace.(If data is wried, only applied to private workspace, no need to be wried to DBMS)
-
When a txn commits, the DBMS compares workspace write set to see whether it conflicts with other txns.(提交的时候DBMS看你workspace里面写的数据,和其他事务对比看看有无冲突)
-
If there are no conflicts, the write set is installed into the “global” database.(无冲突一把全部写回到数据库里面)
- OCC PHASES
- #1 – Read Phase: Track the read/write sets of txns and store their writes in a private workspace.
- #2 – Validation Phase: When a txn commits, check whether it conflicts with other txns.
- #3 – Write Phase: If validation succeeds, apply private changes to database. Otherwise abort and restart the txn.
-
OCC – READ PHASE
- Track the read/write sets of txns and store their writes in a private workspace.
- The DBMS copies every tuple that the txn accesses from the shared database to its workspace ensure repeatable reads.
-
OCC – VALIDATION PHASE
-
OCC – WRITE PHASE
- Serial Commits: → Use a global latch to limit a single txn to be in the Validation/Write phases at a time.(直接锁全表写,一是为了解决并发问题,二来由于写的数据都准备好了,所以写耗费的时间很短,对并发度的影响不高)
-
OCC works well when the # of conflicts is low:
- → All txns are read-only (ideal).
- → Txns access disjoint subsets of data.
-
If the database is large and the workload is not skewed, then there is a low probability of conflict, so again locking is wasteful.
-
OCC问题
- 本地Copy带来的额外开销
- commit的时候校验的逻辑很麻烦,消耗性能
- 写的步骤是锁表的,也可能会称为性能瓶颈
- 一旦出了问题前面干的全部回退,这也是一种浪费,2PL能够执行到一半发现死锁直接让这个事务回退,损失就比OCC要小
Dynamic Databases and The Phantom Problem
- 2PL和OCC在完全串行化上面都有BUG。。。$\rightarrow$ 幻读
- 前面讨论的问题都是read和update的问题,但是没有讨论insert和delete的问题
- 2PL和OCC有这个BUG的原因:我只能控制现存的数据,但是不管数据的插入/删除
-
THE PHANTOM PROBLEM
-
Approach #1: Re-Execute Scans
- 对可能产生幻读的行为(SELECT … FROM … GROUP BY …/insert/delete)进行记录,然后在提交的时候再扫描一遍检测有无并发的问题
- 缺点:这种扫描开销过大,性能上接受不了
-
Approach #2: Predicate Locking
-
最早由System R发明
-
Shared lock on the predicate in a WHERE clause of a SELECT query.
-
Exclusive lock on the predicate in a WHERE clause of any UPDATE, INSERT, or DELETE query
-
It is rarely implemented in systems; an example of a system that uses it is HyPer (precision locking).
-
-
Approach #3: Index Locking
- 有索引给索引上锁,没索引就要加大的列锁/表锁了
-
MySQL的解决方案:间隙锁
Isolation Levels
-
数据库很难做到完全串行,而且很多业务也不需要完全串行,所以有不同的隔离级别
-
Isolation Levels (Strongest to Weakest):
-
SERIALIZABLE: No Phantoms, all reads repeatable, and no dirty reads.
- Possible implementation: Index locks + Strict 2PL.
-
REPEATABLE READS: Phantoms may happen.
- Possible implementation: Strict 2PL.
-
READ-COMMITTED: Phantoms and unrepeatable reads may happen.
- Possible implementation: Strict 2PL for exclusive locks, immediate release of the shared lock after a read.
-
READ-UNCOMMITTED: All anomalies may happen.
- Possible implementation: Strict 2PL for exclusive locks, no shared locks for reads.
-
-
如果显式声明一个表是READ ONLY的话,那么数据库会进行优化(不加锁),还有的数据库会自动检测,如果没有写的操作会自动优化