nvidia_p2p_get_pages(): Fix double-free in register-callback error path #557
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Double-free in rm_p2p_register_callback() error-path in nv_p2p_get_pages() causes memory corruption that leads to a kernel panic.
Fix this by adding a separate goto for this error path that skips freeing the already-freed memory.
Double-free can be produced by calling nvidia_p2p_get_pages() on one CPU while simultaneously freeing the GPU virtual address range passed into nvidia_p2p_get_pages() on another CPU. Producing the double-free is timing dependent and may require multiple tries.
'slub_debug=FZ' kernel boot parameter shows the double-free:
[ 239.115091] =============================================================================
[ 239.124659] BUG kmalloc-16 (Tainted: G OE ): Object already free
[ 239.133011] -----------------------------------------------------------------------------
[ 239.144491] Slab 0xfffffa8bc4434140 objects=85 used=82 fp=0xffff9a3dd0d05910 flags=0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
[ 239.158997] Object 0xffff9a3dd0d05670 @offset=1648 fp=0x0000000000000000
[ 239.168766] Redzone ffff9a3dd0d05660: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 239.179633] Object ffff9a3dd0d05670: 10 00 00 00 00 00 00 00 e5 04 3f 13 96 18 8e 47 ..........?....G
[ 239.190641] Redzone ffff9a3dd0d05680: bb bb bb bb bb bb bb bb ........
[ 239.200739] Padding ffff9a3dd0d05688: 84 80 0e 00 00 00 00 00 ........
[ 239.210938] CPU: 0 PID: 3150 Comm: hfi-sdma-test Kdump: loaded Tainted: G OE 6.5.0-rc1+ #1
[ 239.221911] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.1029.090220201031 09/02/2020
[ 239.233948] Call Trace:
[ 239.236992]
[ 239.239608] dump_stack_lvl+0x33/0x50
[ 239.244010] object_err+0x3a/0x80
[ 239.248014] free_debug_processing+0x265/0x360
[ 239.253392] ? nv_p2p_get_pages+0x163/0x590 [nvidia]
[ 239.259399] free_to_partial_list+0x80/0x280
[ 239.264478] ? nv_p2p_get_pages+0x163/0x590 [nvidia]
[ 239.270426] nv_p2p_get_pages+0x163/0x590 [nvidia]
[ 239.276303] ? __pfx_remove_nvidia_pages+0x10/0x10 [hfi1]
[ 239.282692] nvidia_p2p_get_pages+0x25/0x40 [nvidia]
[ 239.288601] ? __pfx_remove_nvidia_pages+0x10/0x10 [hfi1]
...
[ 239.498990]
[ 239.501662] Disabling lock debugging due to kernel taint
[ 239.507828] FIX kmalloc-16: Object at 0xffff9a3dd0d05670 not freed