Why Sequential Writes Are Still Faster on SSDs

February 2020 ยท 2 minute read

Many people think of SSD as a random accessed device, like memory. While SSDs pretend to be randomly accessed, they are more complicated than that, so their performance patterns are sometimes confusing. To understand why we need to dig a little bit deeper.

An SSD is composed of a few NAND chips. Those NAND chips support 3 basic operations:

  1. Read a page. A page is the minimal addressable unit, usually 2KB or 4KB, and a read operation takes around 10us.
  2. Write(“program”) a page. A write operation can only be done when the page is already erased, takes around 100us.
  3. Erase a block. A block is usually 128 or 256 pages, and the erase operation takes a few ms.

Note that the unit of read/write operations are a page, while the unit of erase operation is a block. I guess this is because of cost reasons. So for disk write, a naive implementation needs to read the whole block, erase the block, then write updated data back to the block, which is unacceptable.

Furthermore, NAND flash has a certain amount of lifetime, ranging from 10000 to 100000 P/E (Program/Erase) cycles. Efforts must be taken to make sure blocks wear out uniformly, otherwise, the SSD would lose capacity.

Because of these, inside the firmware of SSDs lives another level of abstraction, the Flash Translation Layer (FTL). FTL helps to build an illusion of random access device. To achieve that, FTL employs an approach very similar to Log-Structured Merge (LSM) tree. Writes are always written to new, already erased pages, while in the background, garbage collects (GC) outdated data. And surely, FTL needs to keep a map from the user’s logical address to physical address on SSD, both in-memory and persistently.

With the knowledge above, we could finally answer the question in the title, why sequential writes are faster than random writes on SSDs? Because for sequential writes:

References:

[1] Operating Systems: Three Easy Pieces

[2] The NAND chip picture