目录

CMU 15-445 Lecture #18: Timestamp Ordering Concurrency Control


CMU 15-445 Database Systems

Lecture #18: Timestamp Ordering Concurrency Control

Timestamp Ordering Concurrency Control

  • 纯用锁很影响性能,锁是一个悲观的方法

  • 乐观的方法:用时间戳

  • If $TS(T_i) < TS(T_j)$, then the DBMS must ensure that the execution schedule is equivalent to the serial schedule where $T_i$appears before $T_j$ .

  • Multiple implementation strategies:

    • → System/Wall Clock.:不可能完全准确,一般不用
    • → Logical Counter.:单机系统一般用这个
    • → Hybrid.:分布式系统用这个

Basic Timestamp Ordering (BASIC T/O)

  • Every object X is tagged with timestamp of the last txn that successfully did read/write:时间戳也分两种

    • → W-TS(X) – Write timestamp on X
    • → R-TS(X) – Read timestamp on X
  • Check timestamps for every operation:

    • → If txn tries to access an object “from the future”, it aborts and restarts.(不能操作“未来”的数据)
  • BASIC T/O – READS

    • Don’t read stuff from the “future.”
    • Action: Transaction Ti wants to read object X.
    • If TS(Ti) < W-TS(X), this violates the timestamp order of Ti with regard to the writer of X.
      • → Abort Ti and restart it with a new TS.
    • Else:
      • → Allow Ti to read X.
      • → Update R-TS(X) to max(R-TS(X), TS(Ti))
      • → Make a local copy of X to ensure repeatable reads for Ti.
  • BASIC T/O – WRITES

    • Can’t write if a future transaction has read or written to the object.(不能写未来读过和写过的数据)
    • Action: Transaction Ti wants to write object X.
    • If TS(Ti) < R-TS(X) or TS(Ti) < W-TS(X)
      • → Abort and restart Ti.
    • Else:
      • → Allow Ti to write X and update W-TS(X)
      • → Also, make a local copy of X to ensure repeatable reads.
  • Thomas Write Rule

    • 对上述理论进行优化
    • If TS(Ti) < R-TS(X) (未来有事务读了这个数据)
      • Abort and Restart Ti
    • If TS(Ti) < R-WS(X) (未来有事务写了这个数据,等效成我写了然后未来被覆盖掉了)
      • The DBMS can instead ignore the write and allow the transaction to continue instead of aborting and restarting it. This is called the Thomas Write Rule.
  • BASIC T/O总结

    • 优点:无锁,无死锁
    • 缺点:对于长的事务可能会饥饿(一直rollback),前面的事务一旦修改数据后回滚那么后面的事务会读到错误的数据,读数据的时候需要copy一份到本地,如果读的数据过多,那么开销会很大
/img/CMU 15-445 Database Systems/chapter18-1.png
事务2的数据来源于事务1,事务1一旦回滚那么无法恢复事务2

Optimistic Concurrency Control (OCC)

  • Also based on timestamp

  • The DBMS creates a private workspace for each txn.

    • → Any object read is copied into workspace.
    • Modifications are applied to workspace.(If data is wried, only applied to private workspace, no need to be wried to DBMS)
  • When a txn commits, the DBMS compares workspace write set to see whether it conflicts with other txns.(提交的时候DBMS看你workspace里面写的数据,和其他事务对比看看有无冲突)

  • If there are no conflicts, the write set is installed into the “global” database.(无冲突一把全部写回到数据库里面)


  • OCC PHASES
    • #1 – Read Phase: Track the read/write sets of txns and store their writes in a private workspace.
    • #2 – Validation Phase: When a txn commits, check whether it conflicts with other txns.
    • #3 – Write Phase: If validation succeeds, apply private changes to database. Otherwise abort and restart the txn.
/img/CMU 15-445 Database Systems/chapter18-2.png
OCC在提交的时候才会分配时间戳
  • OCC – READ PHASE

    • Track the read/write sets of txns and store their writes in a private workspace.
    • The DBMS copies every tuple that the txn accesses from the shared database to its workspace ensure repeatable reads.
  • OCC – VALIDATION PHASE

  • /img/CMU 15-445 Database Systems/chapter18-3.png
    VALIDATION PHASE的决策图,不能提交的时候发现后面的事务读/写了自己写了的数据(因为按照串行的理论,应该是后面的事务要读自己提交后的数据,但是自己还没提交,后面的事务读的是自己之前的数据!)
    • Approach #1: Backward Validation:和前面的历史数据做校验
    • /img/CMU 15-445 Database Systems/chapter18-4.png
    • Approach #2: Forward Validation:和未来的事务做校验
    • /img/CMU 15-445 Database Systems/chapter18-5.png
  • OCC – WRITE PHASE

    • Serial Commits: → Use a global latch to limit a single txn to be in the Validation/Write phases at a time.(直接锁全表写,一是为了解决并发问题,二来由于写的数据都准备好了,所以写耗费的时间很短,对并发度的影响不高)
  • OCC works well when the # of conflicts is low:

    • → All txns are read-only (ideal).
    • → Txns access disjoint subsets of data.
  • If the database is large and the workload is not skewed, then there is a low probability of conflict, so again locking is wasteful.

  • OCC问题

    • 本地Copy带来的额外开销
    • commit的时候校验的逻辑很麻烦,消耗性能
    • 写的步骤是锁表的,也可能会称为性能瓶颈
    • 一旦出了问题前面干的全部回退,这也是一种浪费,2PL能够执行到一半发现死锁直接让这个事务回退,损失就比OCC要小

Dynamic Databases and The Phantom Problem

  • 2PL和OCC在完全串行化上面都有BUG。。。$\rightarrow$ 幻读
  • 前面讨论的问题都是read和update的问题,但是没有讨论insert和delete的问题
/img/CMU 15-445 Database Systems/chapter18-6.png
幻读的情景
  • 2PL和OCC有这个BUG的原因:我只能控制现存的数据,但是不管数据的插入/删除

  • THE PHANTOM PROBLEM

  • Approach #1: Re-Execute Scans

    • 对可能产生幻读的行为(SELECT … FROM … GROUP BY …/insert/delete)进行记录,然后在提交的时候再扫描一遍检测有无并发的问题
    • 缺点:这种扫描开销过大,性能上接受不了
  • Approach #2: Predicate Locking

    • 最早由System R发明

    • Shared lock on the predicate in a WHERE clause of a SELECT query.

    • Exclusive lock on the predicate in a WHERE clause of any UPDATE, INSERT, or DELETE query

    • It is rarely implemented in systems; an example of a system that uses it is HyPer (precision locking).

    • /img/CMU 15-445 Database Systems/chapter18-7.jpg
      谓词锁控制数据竞争
  • Approach #3: Index Locking

    • 有索引给索引上锁,没索引就要加大的列锁/表锁了
  • MySQL的解决方案:间隙锁

Isolation Levels

  • 数据库很难做到完全串行,而且很多业务也不需要完全串行,所以有不同的隔离级别

  • Isolation Levels (Strongest to Weakest):

    • SERIALIZABLE: No Phantoms, all reads repeatable, and no dirty reads.

      • Possible implementation: Index locks + Strict 2PL.
    • REPEATABLE READS: Phantoms may happen.

      • Possible implementation: Strict 2PL.
    • READ-COMMITTED: Phantoms and unrepeatable reads may happen.

      • Possible implementation: Strict 2PL for exclusive locks, immediate release of the shared lock after a read.
    • READ-UNCOMMITTED: All anomalies may happen.

      • Possible implementation: Strict 2PL for exclusive locks, no shared locks for reads.
  • /img/CMU 15-445 Database Systems/chapter18-8.png
  • /img/CMU 15-445 Database Systems/chapter18-9.png
  • 如果显式声明一个表是READ ONLY的话,那么数据库会进行优化(不加锁),还有的数据库会自动检测,如果没有写的操作会自动优化