-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API for planes and meshes #11
Comments
Unlike planes, meshes are very big and change constantly. I think we can create a module that just deals with meshing. I'm a bit unsure what the best approach for the delivery of these meshes is. For world mesh data, the UA would pass chunks of vertices, each of them representing a mesh with a unique id. Subsequent calls could remove meshes, add new ones or update existing ones. |
@cabanier why not just build on the current proposal? As described, there would be a way to request meshes similar to planes in updateTrackingState. The reason I liked that part of the proposal is that it could be called at any point, and changed as the session goes on: the kind of data we get should probably be independent of the session creation. Session options should probably be limited to just "I want world geometry or not" for many apps, since we should encourage a pattern of (a) having apps do their best with what the UA can provide, and (b) have UAs provide some common fallback for world geometry that all apps can fall back to. I would suggest that "mesh" is that fallback. Since planes, faces, objects, etc can all be represented as meshes, even if not ideal. So, I could do something, even if I'd prefer planes. Perhaps some apps would fail, but I would hope that frameworks will emerge that will polyfill things (e.g., a UA that provides only meshes might work with an app+framework that polyfills a plane finder as some WASM blob running in a worker) Given all that, meshes would be delivered each frame in the worldInformation field as proposed. This is exactly what I did in the sample code / API I posted in #6 I based the structure of the mesh data on the current Windows MR data returned by Hololens; it would be interesting to see if that structure works for ML, or if there is something missing. |
Are you referring to this proposal? Having the request for meshes (and potentially planes) be part of requestSession will make it so the browser can ask for the permissions at the time that the immersive session starts. Your
It makes sense that you can turn meshing on and off but that could be done with some boolean on the session. I suspect that most authors know what features and feature quality they want so there's no need to change it at runtime.
I think the device itself will always know better what a plane is. Authors should not have to check on what device they are running and pick an algorithm based on that.
Yes, this structure could work for us. |
I think we agree, @cabanier Implicit (I should have made it explicit) in my current thinking is that we would have to ask for spatial geometry in I would argue that we should ask for a general Regarding devices: I didn't say authors would check what device they are on and then choose. I said they would attempt to get the kind of information they want, and fall back if it's not available. There is no guarantee any given device will provide all the variations of data that we expose. For example, a given device might only provide meshes, and leave it up to the app to find planes -- I believe that's what most HMDs do now, right? Hololens doesn't provide planes, just meshes. Does ML currently provide planes? From the OS level, or provide a utility library? |
At least for us, hand sensing uses a complete different set of APIs and sensors. It's also less exposure from privacy standpoint I suspect so it makes sense to make it a separate permission check.
I'm unsure what most devices do. ML has a rich set of APIs for plane detection and it doesn't return meshes (which is why I disagreed with your previous statement that planes are meshes too). Since we're so close in agreement on how meshing would work, maybe I can draft a more complete IDL that we can discuss. Does that sound reasonable @blairmacintyre ? |
(Aside: note that your definition of plane is different from the ARKit/ARCore one) Sure, go ahead and draft something. |
New classes to request or ask for support of world and local meshing:
Then supports/requestSession are extended:
Structures to hold the mesh data:
XRMetaData can contain an optional set of world or near meshes. Mesh data is then supplied to the page with an updated XRFrameRequestCallback
|
I'd like @thetuvix to comment here, since the Hololens APIs are also dealing with this. |
Couple of questions and minor comments:
I’m not sure if I understand the protocol for mesh updates across frames, can you elaborate a bit more on that? Sample code would be great! |
In my current proposal, there is no way to reconfigure the mesh quality. I think it makes sense to provide a way to turn the meshing on and off. I have not seen any applications that ask for different qualities at runtime.
Even though that would make it easier to inspect the data in Javascript, it would be a very expensive operation to pass that data to WebGL because hundreds of meshes with thousand's of vertices need to be translated (and each vertex point would be a javascript object).
That sounds fine! What would be the advantage of doing it that way?
By passing it into the call, it's more clear that this is new additional information. Would
ok.
Sure! Let me dig my test code up and I'll post it here |
@cabanier, @blairmacintyre, @thetuvix - what are the primary use cases for meshes? A couple that come to my mind are: rendering / occlusion, physics simulation (ex. bouncing virtual objects off of real ones), object placement (including reticle display), AI (controlling virtual agents based on the environment they’re in). For some of them, GL-friendly format is definitely the way to go, but others might be better off with something more JS-friendly.
If the let nearMeshBlocks = …; // Near mesh blocks set from previous frame.
// Below happens somewhere in the RAF callback.
metaData.nearMesh.forEach(mesh => {
if(nearMeshBlocks.has(mesh)) {
// Current mesh still contains that particular mesh block.
// No need to create GL buffers, we can reuse the ones created in previous frame.
} else {
// We haven’t seen that block, create GL buffers,
// store them in a mesh block for rendering & reuse.
someMeshBlock.context = { /* … all the data needed to render the mesh block … */ };
}
});
// Some time later, render the mesh block.
...
// Before exiting RAF callback, store the current near
// mesh blocks for the next frame to use.
nearMeshBlocks = metaData.nearMesh; Can we revisit this after you post some example code to demonstrate how the updates are communicated to the app? That’s the piece that I feel I’m still missing.
I’d suggest that It just occurred to me: we should probably make the mesh contain a |
Here is my three.js sample code:
This is called in the |
The biggest use case is occlusion. For detecting objects and interacting with the environment, I suspect that simpler and stable constructs are the better way to go. Generally, you just want to push a mesh directly into a GL call. Of course, if the platform can't support object detection, an author would have to pull apart the raw data.
Yes. I pasted my sample code above and that is what I'm doing: only update the meshes that are new and leave the others as-is.
Only deltas are passed into the callback but I guess appending them to xrFrame works.
Yes. I agree that xrFrame could become a container for more information.
Yes, I punted on that for now. The mesh should have an associated XRSpace. For now I assume it's in the same reference space as the one passed to |
@blairmacintyre @bialpio I created a proposal on my personal github: https://cabanier.github.io/real-world-geometry/webxrmeshing-1.html |
Thanks, the example code helps a lot!
In that case, maybe we can design this API in a way that would directly return WebGL buffers and optionally provide a JS-friendly access only if needed by the app? This way we could maybe skip moving significant amounts of data?
Based on the examples, it seems that in your proposal you’re not reusing the mesh block objects across frames, so it might not change that much to use a
Now I think I see it - the mesh block “updates” also are passed in, but there is only one case of a mesh block update and that is a mesh block removal (signaled by passing 0 vertices in an already-existing block). As for deltas: is it possible to get the full state of the currently detected mesh if the app missed some of the updates? Is the “app missed an update” something that we should worry about? I’d generally try to make an API that still allows the app to get back in sync with current state, but maybe that’s not the goal here.
Thanks! I didn’t have a chance to take a deeper look at it and the PR yet, but the early feedback that I can give is that usually we start with an explainer that describes use cases but does not include a specific Web IDL proposals since it’s more difficult to reach consensus on a specific API proposal if we don’t first agree on what we’re trying to solve. :) Just so you know, we’re not currently planning on spending too much time on mesh detection APIs - if you’re interested in driving the discussions around the API, I can give you permissions to the repo so you won’t have to wait for me to approve the PRs & plane detection and meshes could still live side by side in this repo. |
I'd still like to see @thetuvix's comments (or someone from Microsoft). They have a ton of experience with what people actually use the meshes for, and what they like and dislike about their native APIs. I'm not convinced occlusion is the main use; except when combined with lighting and physics. It would be great if we can provide the data in both GL and/or JS buffers, perhaps lazily -- I completely agree that avoiding copies and unnecessary work would be great. It is probably also the case that however we provide it, it should be in a format that can be sent off to a worker efficiently. This implies |
I don't think that would work. What form would such an API be? By passing raw arrays, the author can plug them directly into WebGL and do all sorts of interesting things. The current proposal is very light and doesn't require the browser itself to keep state. Only the perception subsystem and the author need to know the current state of the mesh.
No. As the system learns more about the world, it can provide greater detail about previously passed in meshes (= update) or it can collapse multiple meshes into one (= update + delete).
No, that is not possible with this approach. I chose this to minimize the need for caching on the browser side.
The introduction in the proposal talks about what problems the meshes are solving. This thread started with @blairmacintyre proposing some pseudo-code based on the Hololens API. I just enhanced it and wrote up a proposal.
What repo is that? |
Why do you think lighting and physics are needed for occlusion?
I agree about avoiding copies.
Typed Arrays are defined to be transferable so the meshes can be sent to workers without having to create a copy. |
I meant the meshes are needed useful for lighting (e.g., shadows, reflections) and physics. Obviously lighting and physics are not needed for occlusion. |
yes, they are indeed needed for physics. I put that in the spec introduction. Should I also put lighting in there? AFAIK neither Hololens or Magic Leap use meshes for that purpose. |
Just catching up on this thread now! One thing to note is that we just released official platform support for detecting planes on HoloLens through the Scene Understanding API, allowing apps to specifically request walls, floors, ceilings and platforms. This may simplify things by allowing us to just establish planes as the app baseline for devices that do geometry detection at all, since they are currently supported by ARCore, ARKit, Magic Leap and HoloLens. For devices that support no geometry detection, a UA could even synthesize a single floor plane from a bounded reference space, which would at least support the "furniture placement" class of scenarios where you want to visualize how large an object is while you stand next to it.
One pattern that is common on HoloLens is for an app to have a scanning phase and then a running phase. The user starts the app in a new room where the device hasn't been before. The app asks the user to look around until the mesh that's been scanned is sufficient for the app to do whatever analysis and feature isolation that it needs. At a minimum, we need to allow apps to scan for some period and then stop scanning. Note that even when a HoloLens app is scanning, it is common to not hydrate vertex/index buffers for all meshes. For example, an app may only care about mesh within 10m of the user, even if the user has previously scanned the entire building. Also, after the user places their primary anchor, the app may choose to only generate buffers for mesh within 5m of that anchor to save power. I see a few options here:
Given that getting to world mesh involves latent processing anyway, it may be less important to eagerly force finished mesh on apps ASAP. Instead, if the mesh metadata object has a .requestMesh() method that returns a promise, that could get us the best of both worlds, allowing apps to request the mesh they care about most earliest.
Depending on the density of mesh requested, a mesh volume on HoloLens can sometimes have enough indices to require a 32-bit index buffer.
Some app scenarios on HoloLens end up not requiring normals, which aren't free to calculate. We should also allow apps to skip requesting normals in the first place.
The key mesh scenarios that we've seen on HoloLens are placement, occlusion, physics, navigation and visualization. You can see detailed descriptions of these scenarios in both our Spatial Mapping documentation and our new Scene Understanding documentation. Increasingly, we are seeing apps that are willing to accept even more latency around world mesh/planes, in order to get higher quality for the analysis that powers their placement and dynamic navigation. That is what led to the new Scene Understanding API design.
It may well be that Near mesh that is intended to represent the user's hands would indeed be invalid at the next frame if it's expressed in world space or view space. To avoid this problem, HoloLens 2's HandMeshObserver API represents each hand mesh update relative to its own Note that the API we have on HoloLens for hand meshes has quite different characteristics from our world mesh APIs. For example:
The optimal APIs to allow users to both asynchronously request durable world mesh updates and also synchronously receive per-frame pose-predicted hand mesh updates may differ. We should consider how best to address each of these cases first, and then see if there's common ground worth combining.
One benefit to splitting world mesh handling into first delivering all metadata and then allowing on-demand async hydration of world mesh buffers is that we can give out all metadata every frame, without worrying about wasting effort delivering the same mesh over and over. Apps can determine which meshes are added and deleted by observing metadata entries coming and going, and can determine updates by seeing a newer |
Thanks for all your feedback @thetuvix !
I agree and I filed #15 about 2 weeks ago. #5 also touches on this. |
This (immersive-web/real-world-geometry) repo. :)
I don't think the second part of this statement is correct - there are other ways of providing occlusion data than just handing out entire mesh to the application. Additionally, interactive hit testing is certainly possible on some of the platforms w/o exposing meshes to the app - the immersive-web/hit-test repo is attempting to expose the APIs which enables that. |
What I mean by "interactive hit testing", is that the author can do the hit testing themselves since they know geometry of the real world. They are both hit testing but the former can be done in real time. |
Thanks for all your feedback @thetuvix !
|
In #6 we seem to have settled on the idea of allowing developers to request the kinds of data that actually want, be it planes of various kinds, meshes or other things in the future (tables, the ground, etc ... anything platforms might be able to provide). To do this, we need to update the proposal to as follows. Initially, focusing on planes and meshes should be sufficient to test across a good number of platforms. (We might also be able to test faces in the WebXR Viewer).
We need:
xrSession.updateWorldTrackingState
). (Suggestion: change this toxrSession.updateWorldSensingState
, as we will probably move this to the "Core AR Module" when that happens, and use it for other sorts of sensing configuration as well)frame.worldInformation
The text was updated successfully, but these errors were encountered: