Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sgx_destroy_enclave blocks app in kernel #90

Open
jovanbulck opened this issue Oct 20, 2024 · 2 comments
Open

sgx_destroy_enclave blocks app in kernel #90

jovanbulck opened this issue Oct 20, 2024 · 2 comments
Labels

Comments

@jovanbulck
Copy link
Owner

Example dmesg:

[  484.355618] INFO: task app:8986 blocked for more than 120 seconds.
[  484.355643]       Tainted: G           OE     5.15.0-124-generic #134-Ubuntu
[  484.355665] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.355688] task:app             state:D stack:    0 pid: 8986 ppid:  8985 flags:0x00004002
[  484.355691] Call Trace:
[  484.355692]  <TASK>
[  484.355694]  __schedule+0x24e/0x590
[  484.355698]  schedule+0x69/0x110
[  484.355699]  schedule_timeout+0x105/0x140
[  484.355701]  ? __queue_delayed_work+0x5c/0xa0
[  484.355703]  ? queue_delayed_work_on+0x3d/0x60
[  484.355705]  __wait_for_common+0xab/0x150
[  484.355706]  ? usleep_range_state+0x90/0x90
[  484.355708]  wait_for_completion+0x24/0x30
[  484.355709]  __synchronize_srcu.part.0+0x7f/0xf0
[  484.355712]  ? __bpf_trace_rcu_stall_warning+0x10/0x10
[  484.355714]  synchronize_srcu+0xfb/0x120
[  484.355716]  mmu_notifier_unregister+0xbc/0xf0
[  484.355719]  sgx_release+0x94/0x140
[  484.355722]  __fput+0x9c/0x280
[  484.355723]  ____fput+0xe/0x20
[  484.355725]  task_work_run+0x6a/0xb0
[  484.355726]  exit_to_user_mode_loop+0x157/0x160
[  484.355729]  exit_to_user_mode_prepare+0xa0/0xb0
[  484.355731]  syscall_exit_to_user_mode+0x27/0x50
[  484.355733]  ? x64_sys_call+0x1e07/0x1fa0
[  484.355736]  do_syscall_64+0x63/0xb0
[  484.355738]  ? exit_to_user_mode_prepare+0x37/0xb0
[  484.355740]  ? syscall_exit_to_user_mode+0x2c/0x50
[  484.355741]  ? x64_sys_call+0x1de6/0x1fa0
[  484.355743]  ? do_syscall_64+0x63/0xb0
[  484.355744]  ? __x64_sys_openat+0x55/0x90
[  484.355746]  ? exit_to_user_mode_prepare+0x37/0xb0
[  484.355748]  ? syscall_exit_to_user_mode+0x2c/0x50
[  484.355750]  ? x64_sys_call+0x1a55/0x1fa0
[  484.355752]  ? do_syscall_64+0x63/0xb0
[  484.355753]  ? x64_sys_call+0x1e3e/0x1fa0
[  484.355755]  ? do_syscall_64+0x63/0xb0
[  484.355755]  ? clear_bhb_loop+0x45/0xa0
[  484.355758]  ? clear_bhb_loop+0x45/0xa0
[  484.355760]  ? clear_bhb_loop+0x45/0xa0
[  484.355762]  ? clear_bhb_loop+0x45/0xa0
[  484.355764]  ? clear_bhb_loop+0x45/0xa0
[  484.355766]  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[  484.355768] RIP: 0033:0x7fa070ccba7b
[  484.355770] RSP: 002b:00007ffd1025fe18 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
[  484.355772] RAX: 0000000000000000 RBX: 00005580d4c8e8c0 RCX: 00007fa070ccba7b
[  484.355773] RDX: 0000000000000000 RSI: 0000000000200000 RDI: 00007fa070400000
[  484.355774] RBP: 00007ffd1025fe94 R08: 00005580d4c8e8c0 R09: 0000000000000000
[  484.355775] R10: 0000000000000000 R11: 0000000000000206 R12: 00007fa070bac1c0
[  484.355775] R13: 00007fa070bac188 R14: 00007fa070bac188 R15: 00007fa070bac200
[  484.355777]  </TASK>

app is blocked in D state and reboot is the only remedy

After some digging, it seems this is caused by an explicit call to sgx_destroy_enclave before process exit.

From the call trace above, the problem seems to be caused by:

Possible hypothesis:

  • kernel creates enclave memory vmas (virtual memory areas)
  • sgx-step/app creates pte/pmd pointers to the physical vmas of the enclave (via dev/mem)
  • upon destroying enclave vmas, kernel checks reference counters to see if anyone still holds ptrs
  • waits indef for sgx-step/app to release it's ptrs to the physical memory
@jovanbulck jovanbulck added the bug label Oct 20, 2024
@heavyimage
Copy link
Contributor

heavyimage commented Oct 20, 2024

Problem is happening on a Comet Lake i9-10900K

$ uname -srvp
Linux 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64

@jovanbulck
Copy link
Owner Author

Fwiw: I think what may go wrong here is that rcu calls the scheduler timeout so the kernel configures the apic tsc_deadline but sxgstep still has it in oneshot mode so the timer never fires and the rcu check blocks somehow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants