From: thepipeline_xyz

Optimizing blockchain client performance requires a deep understanding of where execution time is truly spent. While complex business logic in smart contracts is relatively cheap to execute compared to traditional applications [00:00:41], the most expensive parts of blockchain operations are cryptography functions (like elliptical curve cryptography and hashing) and state access [00:00:22]. Optimizing computation itself doesn’t yield significant gains [00:00:57]. Some clients already implement parallel execution for signature recovery, which is a major part of transaction execution [00:01:05].

Database as a Bottleneck

Profiling reveals that a significant amount of time is spent on database operations [00:01:30]. A single read from an SSD can have a latency of 80 to 100 microseconds or more, which is orders of magnitude longer than executing a simple smart contract [00:01:38]. A single transaction often requires multiple sequential reads, such as reading the sender’s account for balance, the destination account, proxy accounts, and storage slots for data like ERC20 balances or Uniswap data [00:02:02]. If this data is not cached in main memory, these sequential reads accumulate, leading to prolonged transaction execution times [00:02:30].

While one approach to mitigate this is to equip nodes with very large amounts of RAM to avoid disk reads entirely, significant performance optimization can still be achieved through optimized code that effectively leverages modern SSD capabilities [00:02:53].

Challenges with Standard Databases

Many blockchain clients currently use standard databases like Pebble DB or RocksDB [00:00:07], or other types such as B+ tree databases (e.g., LMDB, MDBX) and LSM trees (e.g., LevelDB, RocksDB) [00:04:30]. However, these general-purpose databases are not designed for the specific needs of blockchain clients and lead to suboptimal performance [00:04:50].

The issues include:

  • Layered data structures: Some implementations, like Go Ethereum (Geth), embed one data structure inside another on disk, leading to expensive double traversals for every request [00:04:06].
  • General-purpose design: Standard databases are built to be performant on average for a wide range of applications [00:05:02]. This means they are not optimized for the specific data access patterns found in blockchain clients. For example, a standard database might require 20 requests to hardware to look up basic information not in cache, whereas a highly optimized database like the one used by Monb might only need one or two requests [00:07:39]. This highlights the challenges of standard databases in blockchain performance.

Custom Database Solutions

To achieve high performance blockchains, a custom state database is crucial [00:00:17]. The approach involves applying techniques common in high-frequency trading (HFT), where standard libraries and data structures are avoided in favor of highly customized ones [00:05:10]. By tailoring the data structure to the specific usage model, significantly better performance can be extracted from the hardware [00:05:20].

This means:

  • Knowing the data usage: Developers know exactly how blockchain data is used and how it should be stored [00:05:32].
  • Leveraging modern SSDs: Modern SSDs offer impressive capabilities, with some reaching 500,000 IOPS (Input/Output Operations Per Second) [00:06:53]. However, this raw performance is often wasted by inefficient general-purpose database implementations [00:03:44]. By implementing unique database optimizations in blockchains, the full potential of this hardware can be leveraged [00:07:04].

By implementing custom databases specifically designed for blockchain data, clients can drastically reduce the number of disk requests needed for basic lookups, thereby maximizing the throughput of the underlying hardware and addressing growth and scalability challenges in blockchain ecosystems [00:07:47]. This is a critical aspect of blockchain scalability and high-performance systems.