SSD Performance Optimization Techniques

From: thepipeline_xyz

For optimizing a performant EVM, a key consideration is the custom state database, such as Bana Database, rather than standard implementations like Pebble DB or RocksDB [00:00:05].

Identifying Blockchain Performance Bottlenecks

The most expensive parts in blockchain performance are typically cryptography functions, electrical curve cryptography, hashing functions, and state access [00:00:22]. In contrast, complex business logic in most smart contracts is relatively inexpensive to execute [00:00:41].

Optimizing computation in modern blockchain doesn’t yield significant performance gains because there isn’t extensive computation involved [00:00:57]. Some clients already have parallel signature recovery, which is one of the most expensive parts of transaction execution [00:01:05].

When profiling code, it becomes clear that most time is spent on database operations [00:01:30]. A single read from an SSD can have a latency of 80 to 100 microseconds or more, depending on the model and generation [00:01:38]. This is orders of magnitude longer than executing a simple smart contract [00:01:56].

Sequential Database Reads

Executing a single transaction often involves multiple sequential database reads:

Reading the sender’s account to check their balance [00:02:07].
Reading the destination account [00:02:11].
Reading proxy accounts if applicable [00:02:13].
Reading storage slots, such as balances in ERC-20 tokens or data in Uniswap [00:02:17].

If these reads are performed sequentially and are not cached in main memory, their cumulative latency can make a single transaction execution quite long [00:02:30].

Limitations of General-Purpose Databases

While simply adding more RAM to cache everything is an option to avoid disk reads, it necessitates very large and expensive hardware [00:02:53]. Modern SSDs are capable of incredible high performance, with some achieving 500,000 I/O operations per second [00:06:53]. However, many general-purpose databases commonly used by blockchain clients, like Pebble DB or RocksDB, fail to leverage this raw performance effectively [00:03:48].

Common issues include:

Layered Data Structures: Embedding one data structure inside another on disk, leading to expensive double-traversal for every request [00:04:06].
General-Purpose Design: Databases like B+ tree databases (LMDB, MDBX) and LSM trees (RocksDB, LevelDB) are designed for general applications [00:04:30]. They aim for average performance rather than specialized, peak performance [00:05:02].

This general design leads to inefficient use of hardware capabilities [00:07:33]. Some databases make 20 requests to hardware for a basic lookup that a custom solution might achieve in one or two requests [00:08:07].

The Power of Customization

The approach for Bana Database is inspired by techniques from high-frequency trading (HFT) [00:05:08]. In HFT, standard libraries and general data structures are avoided because customizing the data structure to the specific trading model extracts significantly better performance from the hardware [00:05:14].

By knowing exactly how the data needs to be used and stored for blockchain applications, a custom database can be implemented to achieve optimal performance [00:05:32]. This allows for massive performance gains by customizing the use of SSDs to be maximally efficient [00:06:31]. It’s crucial not to forget about software optimizations, as even an algorithm with better computational complexity might perform worse if poorly implemented [00:07:12].

The Pipeline Knowledge Graph

Explorer

Table of Contents

SSD Performance Optimization Techniques

Identifying Blockchain Performance Bottlenecks

Sequential Database Reads

Limitations of General-Purpose Databases

The Power of Customization

Graph View