Potential issue with timeout #34

gterzian · 2019-07-28T16:31:55Z

In the recv_timeout at

Line 226 in 35559ae

if let Ok(msg) = crate::recv_timeout(&self.receiver, TIMEOUT) {

Since it's called as part of perform_updates on the main-thread.

I think it might be possible for script to hijack the main-thread and make it unresponsive to user input in the XR device, since if script keeps sending messages within 5ms(for example using requestAnimationFrame every 5 ms, or even less since the main-thread will be a bit busy in between handling each message), the main-thread will be stuck there and I don't think it could then handle new input. In the meantime the script could keep running and the user wouldn't be able to do anything about it.

For example in the Hololens code, the loop would get stuck at https://github.com/servo/servo/blob/b52bfbe68aed9b0abb39a6e994c5dad37cf7d33c/support/hololens/ServoApp/BrowserPage.cpp#L168

cc @paulrouget

The text was updated successfully, but these errors were encountered:

gterzian · 2019-07-29T03:54:04Z

Ah no it's not true in all cases, since cancelling the raf callback still results in sending a SessionMsg::RenderAnimationFrame, which increments the timestamp and would break the main-thread out of the while loop.

https://github.com/servo/servo/blob/89bade45b5d33b2f9df20888b00fd95e1c08191d/components/script/dom/xrsession.rs#L227

So if script were to send a raf every 5 ms, and immediately cancel the callback, you'd still break out of the loop when the raf callback, cancelled or not, is executed and a message is sent back to the main-thread.

However, that requires the callback to execute, which means script has to run the task enqueued at https://github.com/servo/servo/blob/89bade45b5d33b2f9df20888b00fd95e1c08191d/components/script/dom/xrsession.rs#L310

So if script were to just start looping in a task, and send a raf every 5 ms, it could keep the main-thead looping and prevent the raf callback from executing by remaining in a long-running task.

However it would be obvious a script was long-running, so the UA could actually stop it.

So we can still say that while script is running inside the same task, and that task sends a raf every 5 ms, Servo would remain unresponsive to user input, since the main-thread would be stuck in a loop handling raf requests.

The "bad scenario" would look something like:

While the "main script" would stay inside a long-running task, sending raf every 5ms keeping the main-thread looping, it could have already spin-up a dedicated worker and thereby retaining access to a normally running event-loop, which means it could use fetch and other async apis from the worker.

All the while, the user is stuck in a frozen XR session, and clicking "stop" doesn't do anything.

If the constellation were to send a "ask the user to kill a script" message to the main-thread(which we currently don't have I think), that message wouldn't be handled since the main-thread is stuck in a raf loop. So the constellation would have to decide to kill the script on it's own, without confirmation from the user.

Now if the constellation blocks on the response from the "ask the user to kill a script" message, it would actually deadlock, all the while script can keep running until the user takes off the headset and manually turns off the machine or kills the all browser-related processes.

The better scenario sees the constellation suspend script first, then send a message to the main-thread asking whether to kill it, which would eventually be handled since the main-thread would have been "freed" from the raf loop that was kept going by script.

asajeffrey · 2019-07-29T21:56:38Z

In order to perform this attack, script would have to create an rAF callback that never ended (in order to block the RenderAnimationFrame message) but in that case script wouldn't send another RequestAnimationFrame message, and the timeout would expire.

That said, the code is a bit brittle, it might be better to use recv_deadline rather than recv_timeout.

gterzian · 2019-07-30T04:52:09Z

In order to perform this attack, script would have to create an rAF callback that never ended, but in that case script wouldn't send another RequestAnimationFrame message, and the timeout would expire.

The way I read RequestAnimationFrame

https://github.com/servo/servo/blob/9043f247d9b031ed285e880e4b90aa523d4a63ae/components/script/dom/xrsession.rs#L287

is that it doesn't prevent the currently running task in script from making an arbitrary number of calls, and each call sends a SessionMsg, via Session.request_animation_frame:

webxr/webxr-api/session.rs

Line 107 in fcbdddb

    
           pub fn request_animation_frame(&mut self, dest: Sender<(HighResTimeStamp, Frame)>) {

So the attack doesn't so much require a never ending raf callback, it requires a long-running task making calls to RequestAnimationFrame every 5 ms.

Each call sends a message to the main-thread, which resets the timeout, but doesn't increment the timestamp(that requires the raf callback to execute).

The long-running task prevents raf callback from executing, since they require xr_raf_callback task enqueued by the router to executed on the loop, which isn't going to happen so long as another task is keeping the loop busy.

https://github.com/servo/servo/blob/9043f247d9b031ed285e880e4b90aa523d4a63ae/components/script/dom/xrsession.rs#L310

This could be solved on the script side by requiring a RequestAnimationFrame to only queue a callback if the last callback was executed, unfortunately that doesn't appear spec compliant...
https://immersive-web.github.io/webxr/#dom-xrsession-requestanimationframe

Maybe you could fix this by only sending a SessionMsg::RequestAnimationFrame when a raf callback has executed since the last call to RequestAnimationFrame?

You need only one raf callback to executed all currently enqueued callbacks, so you need only to send one SessionMsg::RequestAnimationFrame per arbitrary number of raf requests, since all callbacks when execute when the response comes. So you only need to send a new message after a raf callback executes.

I've opened a PR with a potential fix servo/servo#23891

asajeffrey · 2019-07-30T13:28:31Z

We might also want to replace recv_timeout by recv_deadline as a belt-and-braces solution?

WebXr: allow for only a single raf message, until callbacks execute  See servo/webxr#34 ---  - [ ] `./mach build -d` does not report any errors - [ ] `./mach test-tidy` does not report any errors - [ ] These changes fix #___ (GitHub issue number if applicable)  - [ ] There are tests for these changes OR - [ ] These changes do not require tests because ___    --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/23891)

gterzian · 2019-07-31T04:21:51Z

We might also want to replace recv_timeout by recv_deadline as a belt-and-braces solution?

Yes I guess in crossbeam terms the below could be done. And the IPC implementation would be doing try_recv in a loop while checking the timeout?

The main point is that the timeout is not reset each time a message is received....

let deadline = after(timeout);
while true {
    select! {
        recv(r) -> msg => handle_msg(msg),
        recv(deadline) -> _ => break,
    }
}

Also I think it's worth re-highlighting that the underlying problem is that WebXR is running a "sub message-loop" inside perform_updates, which itself is one part of the "main device loop". And while the sub-loop is running, the device loop is not, which means user input is not handled(At least in Hololens).

And the danger is that since the sub-loop is driven by messages coming from script, it could find a way to keep the loop running and prevent UI events from being handled.

asajeffrey · 2019-07-31T14:40:06Z

Well yes, really we should just have one main thread loop, but I'm not sure how to achieve that without a drastic rewrite. I'm going to open a "main thread" metabug over in servo/servo to try to capture the discussion, which is currently happening in lots of different issues.

gterzian mentioned this issue Jul 29, 2019

Match event loop wakeups to expected messages servo/servo#23865

Open

gterzian mentioned this issue Jul 30, 2019

WebXr: allow for only a single raf message, until callbacks execute servo/servo#23891

Merged

5 tasks

asajeffrey mentioned this issue Jul 31, 2019

Main thread complexity servo/servo#23899

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential issue with timeout #34

Potential issue with timeout #34

gterzian commented Jul 28, 2019 •

edited

Loading

gterzian commented Jul 29, 2019 •

edited

Loading

asajeffrey commented Jul 29, 2019

gterzian commented Jul 30, 2019 •

edited

Loading

asajeffrey commented Jul 30, 2019

gterzian commented Jul 31, 2019 •

edited

Loading

asajeffrey commented Jul 31, 2019

Potential issue with timeout #34

Potential issue with timeout #34

Comments

gterzian commented Jul 28, 2019 • edited Loading

gterzian commented Jul 29, 2019 • edited Loading

asajeffrey commented Jul 29, 2019

gterzian commented Jul 30, 2019 • edited Loading

asajeffrey commented Jul 30, 2019

gterzian commented Jul 31, 2019 • edited Loading

asajeffrey commented Jul 31, 2019

gterzian commented Jul 28, 2019 •

edited

Loading

gterzian commented Jul 29, 2019 •

edited

Loading

gterzian commented Jul 30, 2019 •

edited

Loading

gterzian commented Jul 31, 2019 •

edited

Loading