Custom databases in blockchain technology

From: thepipeline_xyz
The Monad DB is a custom-built state database crucial for achieving high performance in blockchain technology, particularly for the Ethereum Virtual Machine (EVM) [00:04:51]. Its development stems from identifying significant bottlenecks in existing blockchain architectures.

The Bottleneck of State Access

While parallel EVM execution is a significant narrative in crypto, Monad discovered early on that merely implementing a parallel EVM algorithm did not lead to substantial performance improvements [00:03:18]. The primary bottleneck was identified as state access [00:03:32].

In Ethereum, state (account information, slot data) is stored on SSDs [00:03:41]. The cost of reading from an SSD is considerable [00:03:48]. Crucially, the databases used by Ethereum and other EVM-compatible blockchains, such as PebbleDB or RocksDB, do not natively support parallel access [00:03:54]. This means that even if multiple virtual machines run in parallel, they still bottleneck when accessing the database, effectively resulting in single-file execution [00:04:05].

The most expensive parts of blockchain operations are:

Cryptography functions (e.g., elliptical curve cryptography, hashing) [00:09:11].
State access [00:09:21].

The business logic within most smart contracts is relatively inexpensive to execute compared to state access [00:09:21]. Paralyzing computation alone, therefore, yields minimal gains [00:09:44].

Limitations of General-Purpose Databases

Standard databases like LMDB, MDBX (B+ tree databases), and RocksDB, LevelDB (LSM trees) are general-purpose solutions [00:13:14]. While suitable for storing and searching general data, they are not optimized for the specific access patterns and performance requirements of a blockchain [00:13:34]. Their generic nature means they are “performant on average” but not for highly specialized, latency-sensitive applications like blockchain state management [00:13:50].

For instance, Go-Ethereum’s database structure involves embedding one data structure inside another on disk, which makes every request traverse two data structures, leading to a very expensive operation [00:12:47].

The Monad DB Approach

The realization of the state access bottleneck led Monad to develop a custom database [00:04:26] [00:14:29]. This approach draws inspiration from high-frequency trading (HFT), where specialized systems are built from scratch, optimizing every component to shave off latency [00:05:25] [00:14:05]. In HFT, standard libraries are avoided because customizing data structures to the specific trading model extracts significantly better performance from the hardware [00:13:59].

Applying this philosophy to blockchain, Monad recognized that understanding the exact data usage and storage patterns allows for building a database that leverages hardware capabilities to the maximum [00:14:16].

SSD Performance and Optimization

Modern SSDs are highly performant devices, capable of hundreds of thousands of I/O operations per second (IOPS) [00:12:09] [00:15:43]. However, many current blockchain clients using general-purpose databases fail to fully utilize this potential, resulting in poor performance [00:12:39].

A key aspect of Monad DB is its ability to drastically reduce the number of requests to the hardware. For instance, Monad DB might require only one or two requests to look up an account, whereas other data structures might make 20 requests for the same information if it’s not cached [00:16:46]. This “super optimization” extracts every last bit of performance from the SSD [00:16:59].

Avoiding Shortcuts

The “no shortcuts” philosophy adopted by Monad emphasizes deep optimization rather than simply throwing more expensive hardware at the problem [00:20:58]. A shortcut could be to require a very large amount of RAM to cache all state and avoid disk reads [00:11:37]. However, this is not sustainable for long-term growth and decentralization:

Cost: RAM is two orders of magnitude more expensive than SSDs (e.g., 2 TB of high-quality NVMe SSD costs about $200, w hi l e 2 TB o f R A M cos t s a ro u n d$ 20,000) [00:50:32].
Scalability: While state will grow over time, managing this growth is more feasible with affordable SSDs than constantly increasing RAM requirements [00:51:07].
Decentralization: High hardware requirements hinder participation from regular users, centralizing the network [00:21:17].

Instead of relying on excessive RAM, Monad focuses on:

Building systems from the ground up to be performant [00:07:07].
Deeply understanding the hardware and making informed engineering decisions [00:23:21].
Conducting extensive quantitative experimentation with different database types to identify and address engineering issues [00:23:25].
Focusing on micro-optimizations (e.g., optimizing Translation Lookaside Buffer for 5% gain) that collectively lead to significant speedups [00:30:09].
Questioning common assumptions (e.g., whether access lists are truly beneficial for performance) and validating them through measurement and experimentation [00:31:44].

This comprehensive approach, focusing on a custom state database and fine-grained optimizations, is essential for unlocking true performance gains and enabling the EVM to scale [00:33:03].

The Pipeline Knowledge Graph

Explorer

Table of Contents