What is LSM, and Why LSM?
目录
What is LSM, and Why LSM?
-
本文章是对LSM的基本介绍
-
LSM trees全称为Log-structured merge trees,是一种用于KV存储引擎的数据结构,它被广泛应用于分布式存储系统
-
使用其的工业产品
- TiDB and CockroachDB
- RocksDB, based on LevelDB
-
LSM tree是一种append-friendly data structure,和其他的KV存储引擎底层的数据结构,比如RB-Tree and B-Tree最大的不同就是当更改数据的时候,RB-Tree and B-Tree需要在内存或者磁盘把原有数据进行修改,但是对于LSM tree来说上面的修改操作从之前的同步操作改成了lazily applied to the storage,引擎可以一次性批量将前面修改数据的操作处理成SST (sorted string table) files,然后再将其写入磁盘(这个处理的速度要比前面的RB-Tree and B-Tree快得多,顺序追加日志要比查找到值再改快得多),一旦写入磁盘,引擎将不再直接修改它们,只需要后台的任务定时将以进行compaction即可
-
这种架构上的设计让LSM trees易于使用
- 数据在持久存储里面是不可变的,并发控制的问题因为这个变得简单,而且compaction的任务可以交给远程的服务器去做,这样的架构也非常适合现在的云原生(存储和compaction完全可以派发给不同的机器)
- 引擎的读和写的性能可以通过调整compaction算法/算法的参数来进行变化,这样可以让引擎适应多种不同的情境
-
一个LSM存储引擎大致可以分为下面三个部分
- Write-ahead log to persist temporary data for recovery.
- SSTs on the disk to maintain an LSM-tree structure.
- Mem-tables in memory for batching small writes.
-
一个存储引擎应该提供下面四个接口
Put(key, value)
: store a key-value pair in the LSM tree.Delete(key)
: remove a key and its corresponding value.Get(key)
: get the value corresponding to a key.Scan(range)
: get a range of key-value pairs.
-
为了保证持久化还会有一个接口
Sync()
: ensure all the operations beforesync
are persisted to the disk.
-
一些引擎会将
Put
和Write
的操作组合起来处理,称为WriteBatch
-
本教程中LSM的压缩算法选用leveled compaction algorithm,在工业界也很常用
- Write Path
- The write path of LSM contains four steps:
- Write the key-value pair to the write-ahead log so that it can be recovered after the storage engine crashes.
- Write the key-value pair to memtable. After (1) and (2) are completed, we can notify the user that the write operation is completed.
- (In the background) When a mem-table is full, we will freeze them into immutable mem-tables and flush them to the disk as SST files in the background.
- (In the background) The engine will compact some files in some levels into lower levels to maintain a good shape for the LSM tree so that the read amplification is low.
-
Read Path
-
When we want to read a key
- We will first probe all the mem-tables from the latest to the oldest.
- If the key is not found, we will then search the entire LSM tree containing SSTs to find the data.