Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: remove legacy orphans background #863

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions batch.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
package iavl

import (
"sync"

dbm "github.com/cosmos/cosmos-db"
)

Expand All @@ -11,6 +13,7 @@ type BatchWithFlusher struct {
db dbm.DB // This is only used to create new batch
batch dbm.Batch // Batched writing buffer.

mtx sync.Mutex
flushThreshold int // The threshold to flush the batch to disk.
}

Expand Down Expand Up @@ -46,6 +49,9 @@ func (b *BatchWithFlusher) estimateSizeAfterSetting(key []byte, value []byte) (i
// the batch is flushed to disk, cleared, and a new one is created with buffer pre-allocated to threshold.
// The addition entry is then added to the batch.
func (b *BatchWithFlusher) Set(key, value []byte) error {
b.mtx.Lock()
defer b.mtx.Unlock()

batchSizeAfter, err := b.estimateSizeAfterSetting(key, value)
if err != nil {
return err
Comment on lines 49 to 57
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [49-81]

Consider protecting Write, WriteSync, Close, and GetByteSize methods with the mutex to prevent potential race conditions.

Expand All @@ -67,6 +73,9 @@ func (b *BatchWithFlusher) Set(key, value []byte) error {
// the batch is flushed to disk, cleared, and a new one is created with buffer pre-allocated to threshold.
// The deletion entry is then added to the batch.
func (b *BatchWithFlusher) Delete(key []byte) error {
b.mtx.Lock()
defer b.mtx.Unlock()

batchSizeAfter, err := b.estimateSizeAfterSetting(key, []byte{})
if err != nil {
return err
Expand Down
36 changes: 33 additions & 3 deletions nodedb.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (
"strconv"
"strings"
"sync"
"time"

"cosmossdk.io/log"
dbm "github.com/cosmos/cosmos-db"
Expand Down Expand Up @@ -470,9 +471,38 @@ func (ndb *nodeDB) deleteLegacyVersions() error {
ndb.legacyLatestVersion = -1

// Delete all orphan nodes of the legacy versions
return ndb.traversePrefix(legacyOrphanKeyFormat.Key(), func(key, value []byte) error {
return ndb.batch.Delete(key)
})
go func() {
if err := ndb.deleteOrphans(); err != nil {
ndb.logger.Error("failed to clean legacy orphans", "err", err)
}
}()
Comment on lines +474 to +478
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of a goroutine to call deleteOrphans is a good approach to avoid blocking the main thread. However, error handling within the goroutine should be improved. Currently, if an error occurs, it is only logged. Consider adding a mechanism to handle these errors more robustly, such as a channel to communicate errors back to the main thread or a callback.


return nil
}

// deleteOrphans cleans all legacy orphans from the nodeDB.
func (ndb *nodeDB) deleteOrphans() error {
itr, err := dbm.IteratePrefix(ndb.db, legacyOrphanKeyFormat.Key())
if err != nil {
return err
}
defer itr.Close()

count := 0
for ; itr.Valid(); itr.Next() {
if err := ndb.batch.Delete(itr.Key()); err != nil {
return err
}

// Sleep for a while to avoid blocking the main thread i/o.
count++
if count > 1000 {
count = 0
time.Sleep(100 * time.Millisecond)
}
Comment on lines +491 to +502
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sleep mechanism in deleteOrphans is implemented to avoid blocking the main thread I/O by sleeping after every 1000 deletions. This is a simple rate-limiting strategy, but it may not be optimal. Consider using a more dynamic approach, such as monitoring the system's load or I/O wait time, to adjust the sleep duration or the number of operations between sleeps.

}

return nil
}

// DeleteVersionsFrom permanently deletes all tree versions from the given version upwards.
Expand Down
Loading