目录

What is LSM, and Why LSM?


What is LSM, and Why LSM?

  • 本文章是对LSM的基本介绍

  • LSM trees全称为Log-structured merge trees,是一种用于KV存储引擎的数据结构,它被广泛应用于分布式存储系统

  • 使用其的工业产品

  • LSM tree是一种append-friendly data structure,和其他的KV存储引擎底层的数据结构,比如RB-Tree and B-Tree最大的不同就是当更改数据的时候,RB-Tree and B-Tree需要在内存或者磁盘把原有数据进行修改,但是对于LSM tree来说上面的修改操作从之前的同步操作改成了lazily applied to the storage,引擎可以一次性批量将前面修改数据的操作处理成SST (sorted string table) files,然后再将其写入磁盘(这个处理的速度要比前面的RB-Tree and B-Tree快得多,顺序追加日志要比查找到值再改快得多),一旦写入磁盘,引擎将不再直接修改它们,只需要后台的任务定时将以进行compaction即可

  • 这种架构上的设计让LSM trees易于使用

    • 数据在持久存储里面是不可变的,并发控制的问题因为这个变得简单,而且compaction的任务可以交给远程的服务器去做,这样的架构也非常适合现在的云原生(存储和compaction完全可以派发给不同的机器)
    • 引擎的读和写的性能可以通过调整compaction算法/算法的参数来进行变化,这样可以让引擎适应多种不同的情境

  • /img/mini-lsm/lsm-structure.jpeg
    lsm-structure
  • 一个LSM存储引擎大致可以分为下面三个部分

    • Write-ahead log to persist temporary data for recovery.
    • SSTs on the disk to maintain an LSM-tree structure.
    • Mem-tables in memory for batching small writes.
  • 一个存储引擎应该提供下面四个接口

    • Put(key, value): store a key-value pair in the LSM tree.
    • Delete(key): remove a key and its corresponding value.
    • Get(key): get the value corresponding to a key.
    • Scan(range): get a range of key-value pairs.
  • 为了保证持久化还会有一个接口

    • Sync(): ensure all the operations before sync are persisted to the disk.
  • 一些引擎会将PutWrite的操作组合起来处理,称为WriteBatch

  • 本教程中LSM的压缩算法选用leveled compaction algorithm,在工业界也很常用


  • Write Path
    /img/mini-lsm/write-path.jpeg
    write-path
  • The write path of LSM contains four steps:
    • Write the key-value pair to the write-ahead log so that it can be recovered after the storage engine crashes.
    • Write the key-value pair to memtable. After (1) and (2) are completed, we can notify the user that the write operation is completed.
    • (In the background) When a mem-table is full, we will freeze them into immutable mem-tables and flush them to the disk as SST files in the background.
    • (In the background) The engine will compact some files in some levels into lower levels to maintain a good shape for the LSM tree so that the read amplification is low.

  • Read Path

    /img/mini-lsm/read-path.jpeg
    read-path
  • When we want to read a key

    • We will first probe all the mem-tables from the latest to the oldest.
    • If the key is not found, we will then search the entire LSM tree containing SSTs to find the data.