A concept of default MerkleDb instance seems redundant #17002

artemananiev · 2024-12-09T22:52:00Z

Every virtual map has a data source, and all data sources for virtual maps in the state are stored in a single folder on disk. This folder is a MerkleDb instance. Here I claim that this MerkleDb instance concept is not needed. It brings more complexity, but doesn't provide any benefits. Each data source can reside in its own temp folder as it used to be previously.

A number of good reasons about it:

With all virtual maps combined to a mega map, there will be just one map and one data source
Take a snapshot. All data source snapshots are taken one by one anyway, despite they belong to a single MerkleDb instance
Restore a snapshot. When a data source snapshot is restored, it is restored to a default MerkleDb instance. In real consensus nodes this is not an issue, there is only one instance in the process created at startup. However, in tests there may be multiple instances, and tests often conflict with each other. This is why MerkleDb.resetDefaultInstancePath() workaround exists, but it's ugly
Data source copies. During reconnect, a data source copy is created for every virtual map, both on teacher and learner sides. These copies are put to the same default MerkleDb instance, but they are short-lived and should be removed after reconnects. Copies have the same table names as the corresponding data sources, so tables have unique IDs to distinguish between such tables. ID tracking is also messy
MerkleDb database has its own metadata: a set of table IDs and table metadata, but this database metadata isn't used for any purpose (other than, maybe, some integrity checks). It's a source of constant pain, though: all these "table already exists" or "table doesn't exist" exceptions in the tests are very hard to debug

This ticket proposes a much simpler approach, which is somewhat similar to what was used in JasperDB times:

Each data source has a dedicated folder on disk to store its data files
These folders are all in temp file space
Data source copies just create new independent temp folders
There must be a way to snapshot a data source to a specific folder (snapshot folder / table name)
To restore from a snapshot, yet another temp folder is used

The text was updated successfully, but these errors were encountered:

artemananiev added Tech Debt Reduced Issues which reduce technical debt. Platform Virtual Map Platform Data Structures Platform Tickets pertaining to the platform labels Dec 9, 2024

artemananiev assigned thenswan Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A concept of default MerkleDb instance seems redundant #17002

A concept of default MerkleDb instance seems redundant #17002

artemananiev commented Dec 9, 2024 •

edited

Loading

A concept of default MerkleDb instance seems redundant #17002

A concept of default MerkleDb instance seems redundant #17002

Comments

artemananiev commented Dec 9, 2024 • edited Loading

artemananiev commented Dec 9, 2024 •

edited

Loading