-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to slightly delay camera feed display (ARKit/ARCore only)? #44
Comments
Is there a way to predict the pose and/or timewarp the generated scene so it can be in sync with the real world? |
@rcabanier I think the answer is "maybe yes?", and could involve something like MS's documented XCloud / Outatime approach (speculative rendering + timewarp when speculation fails and maybe some AI in the mix to get solid results). Simpler serviceable methods might exist, and I've pondered some, but I think they'd at least involve getting a depth buffer to the client device which is extra data I'd rather not be streaming if I can help it. It all sounds potentially doable but is quite complex compared to introducing a small camera playout delay on the client device, so I'm thankful that this far simpler possibility exists on ARCore/ARKit. With Hololens, Magic Leap (or Video mixed AR headsets where low latency is important as an anti-sickness measure), what you describe may be the only or best option... |
If this is a common idiom for handheld devices, we should make this an option for handheld devices. |
Another quick note: Programmatic access to the contents of their camera output is not required for this -- only the ability to schedule the output of the camera frames to the display is necessary (within reason... since the delay buffer can't be of unlimited size). Ideally then, this wouldn't require a "This app is requesting access to your camera data Yes/No" sort of prompt to the user. Also, while we're relatively early to this party, I do think we'll be seeing more of this kind of thing, because it's one of the very few (only?) ways of visualizing billion+ primitive datasets using handheld AR without decimation or other LOD compromises. |
If you can write up an explainer with a proposed API surface, we can discuss it in the group. |
I gave it a shot: I haven't been keeping super close tabs on the spec lately but I did my best from memory. For example, it is my current understanding that baseLayer.framebuffer in some way contains the camera's framebuffer data, and by calling glCtx.bindFrameBuffer with that baseLayer.framebuffer, that camera's framebuffer content becomes your background color, and you can then render on top of it. If this is a wrong assumption on my part, some aspects of the API I proposed won't make much sense. But it should still probably get the point across. |
You probably should add it as a required/optional feature during the requestSession callback |
Just wanted to check in on this one. Has there been any discussion around it? It's still something that's needed, from our perspective... |
I wanted to circle back once more on this as we continue to get demand for this capability from users of our systems. For newcomers to the thread that want to catch up quickly, I've prepared an example of what this new API surface could look like here: |
Oh, misread which repo this was on, ignore the previous (now deleted) comment if you got an email. I kinda feel like this is sufficiently complex to require a separate incubation (see the process here. This seems like a pretty major feature and it's not clear to me if there's implementor interest which is crucial to add something to an implemented spec. The proposed API won't work: the WebXR framebuffer does not have control over the camera feed; that's composited later. You'd at least need access to camera frames from the CV incubation, and you'd need an additional API that gave control over the composited camera frame. I know there's some interest from Google's side on exposing raw camera frames in AR, but I'm not sure if they'd be interested in adding control over what camera frames get composited. Perhaps you should open an issue on the proposals repo and see if you can get interest there? |
@Manishearth To be clear, there's no theoretical reason I need actual access to the camera feed for this to work -- an opaque handle to a frame of it could be fine, coupled with the ability to control the timing of playout for a given handle. Does that make anything easier? CV seems to be more about access to the actual frame's contents, which typically also starts to involve user permissions and approvals, which I was really hoping to avoid, with this. That is, ideally, there'd be no need to ask the user permission to their camera, since the application wouldn't gain access to the actual camera frame data. |
@AndrewJDR right, but then you're talking about some new and strange GL capabilities, making it even less likely to belong in an existing spec I mention CV because they're looking at similar capabilities, but yes, it's not the same thing |
@Manishearth Okay, thanks. I think I'll try to formulate something for the proposals repo. I could probably be an implementor for Chromium if that would help bolster the case for it. |
@AndrewJDR you misunderstand, when I say implementor i don't mean a particular person willing to write the code, I mean a particular browser willing to ship it -- it would be helpful (but not a prerequisite) if you can convince a browser to work with you and agree to ship it |
@Manishearth Regarding implementor: Ah understood. I think I'll wait until we have our non-browser based implementation of this finalized before filing an issue in the proposals repo, so folks have a concrete usecase they can see in action. Thanks again. |
Note, I had a brief email exchange with @blairmacintyre and he said it'd be best to open an issue here and in the computer-vision repo regarding this, along with a reference to the computer-vision issue:
immersive-web/computer-vision#1
I'll also paste the issue contents below:
We've developed a 2D/3D asset management and inspection tool that uses server-side rendering. Rendered frames are encoded to a video stream and sent back to client devices for display. This allows for dense assets to be interactive even on lower end mobile devices. In addition to standard interaction models (touch/mouse/keyboard controls), we also have an AR viewing mode. You can see the AR functionality in action to visualize a 1 billion triangle mesh at our Siggraph Real Time Live Demo at the 6:30 mark:
https://youtu.be/BQM9WyrXie4?t=389
This is happening inside chrome canary on android. You'll see at the 6:56 mark that the mesh, while workable, does have some issues with being slightly behind-sync with the real world environment. The reason it is behind sync is that this approach (and essentially all such server-side rendering approaches) must send the view transform from client to server, render a frame, encode the frame into a video stream, send that back to the client, decode, and overlay it with the AR camera display. While this turnaround can be made fairly fast (a few tens of milliseconds), the 3D content will always be slightly behind the video.
If we put aside HMDs and focus on arkit/arcore, if we only had some means of controlling when frames from the camera are displayed on the user's screen, it would make it possible to introduce a slight delay and synchronize the display of camera frames with the rendered results from the server. Is there anything planned within webxr to allow for this? Thanks.
The text was updated successfully, but these errors were encountered: