You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
VirtualDev::alloc_blk takes 16bit nblks, however existing caller in homestore take 32bit data size and pass 32bit nblks to vdev layer which will cause overflow when the data size is larger than 256MB, it will cause corruption (worst case data mismatch) when data size being written is larger than 256MB.
Meta Svc layer has protection when the overflow happens, e.g. it can't match the input context_sz with the nblks matched sizes and will hit release assert, however this is still will cause crash.
Code pointer:
repl_req_ctx::alloc_local_blks
--> auto status = data_service().alloc_blks(sisl::round_up(uint32_cast(data_size), // <<< overflow
BlkDataService::alloc_blks
--> return m_vdev->alloc_blks(nblks, hints, out_blkids); // <<< overflow
MetaBlkService::alloc_meta_blks
--> const auto ret = m_sb_vdev->alloc_blks(nblks, blk_alloc_hints{}, bids); // <<< overflow
The underlying alloc_blks when overflow happens, if we go deep into the actual blk allocator (both the bitmap and append blk allocator), it will return SUCCESS with bids.size() equls to zero. There is no bug in the underlying blkallocator, the point is there is no assert in them also to detect the input nblks is actual 0, it will just return SUCCESS with nothing allocated, when this happens, silent data corruption will happen.
The fix needs to be done at vdev layer.
The text was updated successfully, but these errors were encountered:
yamingk
changed the title
VirtualDev::alloc_blk nblks 16bit overflow can cause return SUCCESS with zero blks allocated
VirtualDev::alloc_blk nblks 16bit overflow can cause silent data corruption
Sep 23, 2024
yamingk
changed the title
VirtualDev::alloc_blk nblks 16bit overflow can cause silent data corruption
VirtualDev::alloc_blk nblks 16bit overflow can possibly cause silent data corruption
Sep 23, 2024
Problem Statement:
VirtualDev::alloc_blk takes 16bit nblks, however existing caller in homestore take 32bit data size and pass 32bit nblks to vdev layer which will cause overflow when the data size is larger than 256MB, it will cause corruption (worst case data mismatch) when data size being written is larger than 256MB.
Meta Svc layer has protection when the overflow happens, e.g. it can't match the input context_sz with the nblks matched sizes and will hit release assert, however this is still will cause crash.
Code pointer:
repl_req_ctx::alloc_local_blks
--> auto status = data_service().alloc_blks(sisl::round_up(uint32_cast(data_size), // <<< overflow
BlkDataService::alloc_blks
--> return m_vdev->alloc_blks(nblks, hints, out_blkids); // <<< overflow
MetaBlkService::alloc_meta_blks
--> const auto ret = m_sb_vdev->alloc_blks(nblks, blk_alloc_hints{}, bids); // <<< overflow
The underlying alloc_blks when overflow happens, if we go deep into the actual blk allocator (both the bitmap and append blk allocator), it will return SUCCESS with bids.size() equls to zero. There is no bug in the underlying blkallocator, the point is there is no assert in them also to detect the input nblks is actual 0, it will just return SUCCESS with nothing allocated, when this happens, silent data corruption will happen.
The fix needs to be done at vdev layer.
The text was updated successfully, but these errors were encountered: