Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for planes and meshes #11

Open
blairmacintyre opened this issue Jul 26, 2019 · 25 comments
Open

API for planes and meshes #11

blairmacintyre opened this issue Jul 26, 2019 · 25 comments

Comments

@blairmacintyre
Copy link

In #6 we seem to have settled on the idea of allowing developers to request the kinds of data that actually want, be it planes of various kinds, meshes or other things in the future (tables, the ground, etc ... anything platforms might be able to provide). To do this, we need to update the proposal to as follows. Initially, focusing on planes and meshes should be sufficient to test across a good number of platforms. (We might also be able to test faces in the WebXR Viewer).

We need:

  • a way of requesting (and checking the success of) the specific types. For example, I might ask for meshes, and if those are unavailable, ask for planes (and then convert them to meshes myself). This may be done by adding features to the main webxr api.
  • a way of configuring each of these (already there with xrSession.updateWorldTrackingState). (Suggestion: change this to xrSession.updateWorldSensingState, as we will probably move this to the "Core AR Module" when that happens, and use it for other sorts of sensing configuration as well)
  • a way to get the data (already there with frame.worldInformation
@cabanier
Copy link
Member

cabanier commented Jul 30, 2019

Unlike planes, meshes are very big and change constantly. I think we can create a module that just deals with meshing.
The simplest approach would be to add an optional parameter to requestSession so an author can request additional permissions. The UA can then ask the user if it's ok to start an immersive session with world and hand meshing. For instance:
navigator.xr.requestSession("immersive-ar", { hand-meshing: "high-quality", world-meshing: "medium-quality"]) .then((session) => { ... });

I'm a bit unsure what the best approach for the delivery of these meshes is.
We could do it with a custom event so one could write session.addEventListener("meshdata", function(mesh){..)); but that would require more bookkeeping on the UA side.
An alternate approach would be to add optional parameters to XRFrameRequestCallback.

For world mesh data, the UA would pass chunks of vertices, each of them representing a mesh with a unique id. Subsequent calls could remove meshes, add new ones or update existing ones.
For hand mesh data, the UA would pass a new set of vertices.
@blairmacintyre, what do you think?

@blairmacintyre
Copy link
Author

@cabanier why not just build on the current proposal?

As described, there would be a way to request meshes similar to planes in updateTrackingState. The reason I liked that part of the proposal is that it could be called at any point, and changed as the session goes on: the kind of data we get should probably be independent of the session creation. Session options should probably be limited to just "I want world geometry or not" for many apps, since we should encourage a pattern of (a) having apps do their best with what the UA can provide, and (b) have UAs provide some common fallback for world geometry that all apps can fall back to.

I would suggest that "mesh" is that fallback. Since planes, faces, objects, etc can all be represented as meshes, even if not ideal. So, I could do something, even if I'd prefer planes. Perhaps some apps would fail, but I would hope that frameworks will emerge that will polyfill things (e.g., a UA that provides only meshes might work with an app+framework that polyfills a plane finder as some WASM blob running in a worker)

Given all that, meshes would be delivered each frame in the worldInformation field as proposed.

This is exactly what I did in the sample code / API I posted in #6

I based the structure of the mesh data on the current Windows MR data returned by Hololens; it would be interesting to see if that structure works for ML, or if there is something missing.

@cabanier
Copy link
Member

@cabanier why not just build on the current proposal?

Are you referring to this proposal?
I think it's not that different from mine.

Having the request for meshes (and potentially planes) be part of requestSession will make it so the browser can ask for the permissions at the time that the immersive session starts. Your updateWorldSensingState call happens while the session is established so you can no longer ask for permission.

The reason I liked that part of the proposal is that it could be called at any point, and changed as the session goes on

It makes sense that you can turn meshing on and off but that could be done with some boolean on the session. I suspect that most authors know what features and feature quality they want so there's no need to change it at runtime.

I would suggest that "mesh" is that fallback. Since planes, faces, objects, etc can all be represented as meshes, even if not ideal. So, I could do something, even if I'd prefer planes.

I think the device itself will always know better what a plane is. Authors should not have to check on what device they are running and pick an algorithm based on that.
Plane detection is a one-off API call (which will likely be promise based) and different from a mesh (which will continually update)

I based the structure of the mesh data on the current Windows MR data returned by Hololens; it would be interesting to see if that structure works for ML, or if there is something missing.

Yes, this structure could work for us.

@blairmacintyre
Copy link
Author

I think we agree, @cabanier

Implicit (I should have made it explicit) in my current thinking is that we would have to ask for spatial geometry in requestSession, for permission management. Thanks for highlighting that.

I would argue that we should ask for a general world-geometry permission there, and then use updateWorldSensing to provide the details. This way, apps can change what they want over time. We can't turn in on or off in the session if we want to do it within the same session.

Regarding devices: I didn't say authors would check what device they are on and then choose. I said they would attempt to get the kind of information they want, and fall back if it's not available. There is no guarantee any given device will provide all the variations of data that we expose. For example, a given device might only provide meshes, and leave it up to the app to find planes -- I believe that's what most HMDs do now, right? Hololens doesn't provide planes, just meshes. Does ML currently provide planes? From the OS level, or provide a utility library?

@cabanier
Copy link
Member

cabanier commented Aug 1, 2019

I would argue that we should ask for a general world-geometry permission there, and then use updateWorldSensing to provide the details.

At least for us, hand sensing uses a complete different set of APIs and sensors. It's also less exposure from privacy standpoint I suspect so it makes sense to make it a separate permission check.

Regarding devices: I didn't say authors would check what device they are on and then choose. I said they would attempt to get the kind of information they want, and fall back if it's not available. There is no guarantee any given device will provide all the variations of data that we expose. For example, a given device might only provide meshes, and leave it up to the app to find planes
-- I believe that's what most HMDs do now, right? Hololens doesn't provide planes, just meshes. Does ML currently provide planes? From the OS level, or provide a utility library?

I'm unsure what most devices do. ML has a rich set of APIs for plane detection and it doesn't return meshes (which is why I disagreed with your previous statement that planes are meshes too).
So, I think that for plane detection we need another structure.
For us, a "plane" is a rectangle of a certain width and height in 3d space.

Since we're so close in agreement on how meshing would work, maybe I can draft a more complete IDL that we can discuss. Does that sound reasonable @blairmacintyre ?

@blairmacintyre
Copy link
Author

(Aside: note that your definition of plane is different from the ARKit/ARCore one)

Sure, go ahead and draft something.

@cabanier
Copy link
Member

cabanier commented Aug 2, 2019

New classes to request or ask for support of world and local meshing:

enum XRMeshingOptions {
  "off",
  "low-quality",
  "medium-quality",
  "high-quality
};

dictionary XRSessionOptions {
  XRMeshingOptions worldMeshQuality = "off",
  XRMeshingOptions nearMeshQuality = "off"
};

Then supports/requestSession are extended:

Promise<void> supportsSession(XRSessionMode mode, XRSessionOptions? options);
Promise<XRSession> requestSession(XRSessionMode mode, XRSessionOptions? options);

Structures to hold the mesh data:

dictionary XRMeshBlock {
  required Float32Array vertices;
  required Uint16Array indices;
  Float32Array? normals;
};

[
    SecureContext,
    Exposed=Window
] interface XRMesh {
  readonly maplike<DOMString, XRMeshBlock>;
};

dictionary XRMetaData {
  XRMesh? worldMesh;
  FrozenArray<XRMeshBlock>? nearMesh;
};

XRMetaData can contain an optional set of world or near meshes.
XRMesh contains a unique id for each mesh.

Mesh data is then supplied to the page with an updated XRFrameRequestCallback

callback XRFrameRequestCallback = void (DOMHighResTimeStamp time, XRFrame frame, optional XRMetaData metaData);
  • If local mesh data is provided, it will replace the existing local mesh. A vertex count of 0 means no local mesh.

  • If the same unique id is passed for a world mesh, it will replace the existing mesh.

  • If a world mesh with a unique id already exists and the new vertex count is zero, the mesh should be deleted.

@blairmacintyre
Copy link
Author

I'd like @thetuvix to comment here, since the Hololens APIs are also dealing with this.

@bialpio
Copy link
Contributor

bialpio commented Aug 6, 2019

Couple of questions and minor comments:

  • Do we plan on allowing the application to reconfigure the mesh sensing after a session was already created, or can configuration only happen at session creation?
  • It might be better to use DOMPointReadOnlys for vertex data and for normals - this is what main WebXR spec seems to be using to represent points / vectors.
  • Does the order of the XRMeshBlocks matter in XRMetaData.nearMesh? If not, can we add an intermediate type XRMeshBlockSet? It could be a readonly setlike<XRMeshBlock> that we’d store in XRMetaData under nearMesh key.
  • I’d propose that we add the XRMetaData into xrFrame.worldInformation object - this way, we’ll keep all world-sensing data in one place.
  • Keys in a dictionary are optional by default, so the question marks are not needed for XRMeshBlock.normals, XRMetaData.worldMesh & XRMetaData.nearMesh. I don’t fully understand all other implications of using dictionaries here, I’ve seen them being used mostly for initialization purposes in main WebXR spec.

I’m not sure if I understand the protocol for mesh updates across frames, can you elaborate a bit more on that? Sample code would be great!

@cabanier
Copy link
Member

cabanier commented Aug 7, 2019

  • Do we plan on allowing the application to reconfigure the mesh sensing after a session was already created, or can configuration only happen at session creation?

In my current proposal, there is no way to reconfigure the mesh quality. I think it makes sense to provide a way to turn the meshing on and off.
Meshing is an expensive operation so this could help performance if an author thinks that the mesh is "good enough".

I have not seen any applications that ask for different qualities at runtime.

  • It might be better to use DOMPointReadOnlys for vertex data and for normals - this is what main WebXR spec seems to be using to represent points / vectors.

Even though that would make it easier to inspect the data in Javascript, it would be a very expensive operation to pass that data to WebGL because hundreds of meshes with thousand's of vertices need to be translated (and each vertex point would be a javascript object).

  • Does the order of the XRMeshBlocks matter in XRMetaData.nearMesh? If not, can we add an intermediate type XRMeshBlockSet? It could be a readonly setlike<XRMeshBlock> that we’d store in XRMetaData under nearMesh key.

That sounds fine! What would be the advantage of doing it that way?

  • I’d propose that we add the XRMetaData into xrFrame.worldInformation object - this way, we’ll keep all world-sensing data in one place.

By passing it into the call, it's more clear that this is new additional information. Would xrFrame.worldInformation contain the entire mesh, or only the updated ones (and how would you detect that?)

  • Keys in a dictionary are optional by default, so the question marks are not needed for XRMeshBlock.normals, XRMetaData.worldMesh & XRMetaData.nearMesh. I don’t fully understand all other implications of using dictionaries here, I’ve seen them being used mostly for initialization purposes in main WebXR spec.

ok.

I’m not sure if I understand the protocol for mesh updates across frames, can you elaborate a bit more on that? Sample code would be great!

Sure! Let me dig my test code up and I'll post it here

@bialpio
Copy link
Contributor

bialpio commented Aug 7, 2019

  • It might be better to use DOMPointReadOnlys for vertex data and for normals - this is what main WebXR spec seems to be using to represent points / vectors.

Even though that would make it easier to inspect the data in Javascript, it would be a very expensive operation to pass that data to WebGL because hundreds of meshes with thousand's of vertices need to be translated (and each vertex point would be a javascript object).

@cabanier, @blairmacintyre, @thetuvix - what are the primary use cases for meshes? A couple that come to my mind are: rendering / occlusion, physics simulation (ex. bouncing virtual objects off of real ones), object placement (including reticle display), AI (controlling virtual agents based on the environment they’re in). For some of them, GL-friendly format is definitely the way to go, but others might be better off with something more JS-friendly.

  • Does the order of the XRMeshBlocks matter in XRMetaData.nearMesh? If not, can we add an intermediate type XRMeshBlockSet? It could be a readonly setlike<XRMeshBlock> that we’d store in XRMetaData under nearMesh key.

That sounds fine! What would be the advantage of doing it that way?

If the XRMeshBlock object can be reused across frames, it’d be slightly simpler for the application to check that a particular mesh block was already rendered & potentially reuse the GL buffers (& it won’t imply that the order matters):

let nearMeshBlocks = ; // Near mesh blocks set from previous frame.

// Below happens somewhere in the RAF callback.
metaData.nearMesh.forEach(mesh => {
  if(nearMeshBlocks.has(mesh)) {
    // Current mesh still contains that particular mesh block.
    // No need to create GL buffers, we can reuse the ones created in previous frame.
  } else {
    // We haven’t seen that block, create GL buffers,
    // store them in a mesh block for rendering & reuse.
    someMeshBlock.context = { /* … all the data needed to render the mesh block … */ };
  }
});

// Some time later, render the mesh block.
...
// Before exiting RAF callback, store the current near
// mesh blocks for the next frame to use.
nearMeshBlocks = metaData.nearMesh;

Can we revisit this after you post some example code to demonstrate how the updates are communicated to the app? That’s the piece that I feel I’m still missing.

  • I’d propose that we add the XRMetaData into xrFrame.worldInformation object - this way, we’ll keep all world-sensing data in one place.

By passing it into the call, it's more clear that this is new additional information. Would xrFrame.worldInformation contain the entire mesh, or only the updated ones (and how would you detect that?)

I’d suggest that xrFrame.worldInformation should contain exactly the same thing that you’d pass in to the request animation frame callback (IIUC, it’s an entire mesh now, w/ some information about mesh block removals). My comment was due to 2 reasons. First is that I think it’ll send a stronger signal that this mesh data is valid (potentially) only in this particular xrFrame if we make the XRMetaData a member of XRFrame. Second reason is that we’ll probably keep adding more and more objects that are potentially going to be changing on a frame by frame basis (planes, anchors, lighting estimation come to my mind) - depending on how we decide we’ll pass those to the application, it might turn out that the request animation frame callback suddenly has way more parameters than we’d like. I think it might be valuable to group all of those in some XRFrame member - that was one of the reasons for the xrFrame.worldInformation object.

It just occurred to me: we should probably make the mesh contain a XRSpace - otherwise, we’d hand out vertex coordinates w/o specifying what are they relative to.

@cabanier
Copy link
Member

cabanier commented Aug 7, 2019

Here is my three.js sample code:

 let metadata = renderer.vr.consumeMetadata(); // returns metadata of the frame

 if (metadata !== null) {
     let vrWorldMesh = metadata.worldMesh;
     if (vrWorldMesh) {

        console.log("got meshdata");
        vrWorldMesh.forEach((block, uuid) => {
            if (uuid in meshes) {
                var three_mesh = meshes[block.uuid];
                group.remove(meshes[block.uuid]);
            }
            if (block.vertices.length == 0) {
                return;
            }

            var geometry = new THREE.BufferGeometry();
            geometry.addAttribute('position', new THREE.Float32BufferAttribute(block.vertices, 3));
            geometry.setIndex(Array.prototype.slice.call(block.indices));

            meshes[uuid] = new THREE.Mesh(geometry, worldmaterial);
            meshes[uuid].castShadow = true;
            meshes[uuid].receiveShadow = false;
            meshes[uuid].doubleSided = true;
            group.add(meshes[uuid]);
        });
    }

    let vrNearMesh = metadata.nearMesh;

    if (vrNearMesh !== null) {
        console.log("got nearmeshdata");
        if (nearMesh !== undefined) {
            group.remove(nearmesh);
            nearMesh = undefined;
        }

        vrNearMesh.forEach(block => {
            if (block.vertices.length == 0) {
                return;
            }
            var geometry = new THREE.BufferGeometry();
            geometry.addAttribute('position', new THREE.Float32BufferAttribute(block.vertices, 3));
            geometry.setIndex(Array.prototype.slice.call(block.indices));

            // this code should be updated to handle more than 1 near mesh
            nearmesh = new THREE.Mesh(geometry, nearmaterial);
            nearmesh.castShadow = true;
            nearmesh.receiveShadow = true;
            nearmesh.doubleSided = true;
            group.add(nearmesh);
        });
    }
}

This is called in the render method

@cabanier
Copy link
Member

cabanier commented Aug 7, 2019

  • It might be better to use DOMPointReadOnlys for vertex data and for normals - this is what main WebXR spec seems to be using to represent points / vectors.

Even though that would make it easier to inspect the data in Javascript, it would be a very expensive operation to pass that data to WebGL because hundreds of meshes with thousand's of vertices need to be translated (and each vertex point would be a javascript object).

@cabanier, @blairmacintyre, @thetuvix - what are the primary use cases for meshes? A couple that come to my mind are: rendering / occlusion, physics simulation (ex. bouncing virtual objects off of real ones), object placement (including reticle display), AI (controlling virtual agents based on the environment they’re in). For some of them, GL-friendly format is definitely the way to go, but others might be better off with something more JS-friendly.

The biggest use case is occlusion.
I personally like to use the near mesh to show my hands as a wireframe because it looks so cool :-)

For detecting objects and interacting with the environment, I suspect that simpler and stable constructs are the better way to go. Generally, you just want to push a mesh directly into a GL call.

Of course, if the platform can't support object detection, an author would have to pull apart the raw data.
I'm unaware if there are such platforms though...

  • Does the order of the XRMeshBlocks matter in XRMetaData.nearMesh? If not, can we add an intermediate type XRMeshBlockSet? It could be a readonly setlike<XRMeshBlock> that we’d store in XRMetaData under nearMesh key.

That sounds fine! What would be the advantage of doing it that way?

If the XRMeshBlock object can be reused across frames, it’d be slightly simpler for the application to check that a particular mesh block was already rendered & potentially reuse the GL buffers (& it won’t imply that the order matters):

let nearMeshBlocks = ; // Near mesh blocks set from previous frame.

// Below happens somewhere in the RAF callback.
metaData.nearMesh.forEach(mesh => {
  if(nearMeshBlocks.has(mesh)) {
    // Current mesh still contains that particular mesh block.
    // No need to create GL buffers, we can reuse the ones created in previous frame.
  } else {
    // We haven’t seen that block, create GL buffers,
    // store them in a mesh block for rendering & reuse.
    someMeshBlock.context = { /* … all the data needed to render the mesh block … */ };
  }
});

// Some time later, render the mesh block.
...
// Before exiting RAF callback, store the current near
// mesh blocks for the next frame to use.
nearMeshBlocks = metaData.nearMesh;

Can we revisit this after you post some example code to demonstrate how the updates are communicated to the app? That’s the piece that I feel I’m still missing.

Yes. I pasted my sample code above and that is what I'm doing: only update the meshes that are new and leave the others as-is.

  • I’d propose that we add the XRMetaData into xrFrame.worldInformation object - this way, we’ll keep all world-sensing data in one place.

By passing it into the call, it's more clear that this is new additional information. Would xrFrame.worldInformation contain the entire mesh, or only the updated ones (and how would you detect that?)

I’d suggest that xrFrame.worldInformation should contain exactly the same thing that you’d pass in to the request animation frame callback (IIUC, it’s an entire mesh now, w/ some information about mesh block removals).

Only deltas are passed into the callback but I guess appending them to xrFrame works.

My comment was due to 2 reasons. First is that I think it’ll send a stronger signal that this mesh data is valid (potentially) only in this particular xrFrame if we make the XRMetaData a member of XRFrame. Second reason is that we’ll probably keep adding more and more objects that are potentially going to be changing on a frame by frame basis (planes, anchors, lighting estimation come to my mind) - depending on how we decide we’ll pass those to the application, it might turn out that the request animation frame callback suddenly has way more parameters than we’d like. I think it might be valuable to group all of those in some XRFrame member - that was one of the reasons for the xrFrame.worldInformation object.

Yes. I agree that xrFrame could become a container for more information.

It just occurred to me: we should probably make the mesh contain a XRSpace - otherwise, we’d hand out vertex coordinates w/o specifying what are they relative to.

Yes, I punted on that for now. The mesh should have an associated XRSpace. For now I assume it's in the same reference space as the one passed to getViewerPose

@cabanier
Copy link
Member

@blairmacintyre @bialpio I created a proposal on my personal github: https://cabanier.github.io/real-world-geometry/webxrmeshing-1.html
Can you take a look and let me know what you think?

@bialpio
Copy link
Contributor

bialpio commented Aug 26, 2019

Thanks, the example code helps a lot!

For detecting objects and interacting with the environment, I suspect that simpler and stable constructs are the better way to go. Generally, you just want to push a mesh directly into a GL call.

In that case, maybe we can design this API in a way that would directly return WebGL buffers and optionally provide a JS-friendly access only if needed by the app? This way we could maybe skip moving significant amounts of data?

Can we revisit this after you post some example code to demonstrate how the updates are communicated to the app? That’s the piece that I feel I’m still missing.

Yes. I pasted my sample code above and that is what I'm doing: only update the meshes that are new and leave the others as-is.

Based on the examples, it seems that in your proposal you’re not reusing the mesh block objects across frames, so it might not change that much to use a setlike<XRMeshBlock>. OTOH, the XRMesh doesn’t have to be a maplike<DOMString, XRMeshBlock> & your sample code would still work (it relies on the fact that XRMeshBlock has an uuid) - I think we could make XRMetaData simply contain 2 keys, nearMesh and worldMesh that would both be XRMeshes, and XRMesh would be a selike<XRMeshBlock>. All this is just me spit-balling the possibilities, take them or leave them. :)

Only deltas are passed into the callback but I guess appending them to xrFrame works.

Now I think I see it - the mesh block “updates” also are passed in, but there is only one case of a mesh block update and that is a mesh block removal (signaled by passing 0 vertices in an already-existing block).

As for deltas: is it possible to get the full state of the currently detected mesh if the app missed some of the updates? Is the “app missed an update” something that we should worry about? I’d generally try to make an API that still allows the app to get back in sync with current state, but maybe that’s not the goal here.

I created a proposal on my personal github: https://cabanier.github.io/real-world-geometry/webxrmeshing-1.html

Thanks! I didn’t have a chance to take a deeper look at it and the PR yet, but the early feedback that I can give is that usually we start with an explainer that describes use cases but does not include a specific Web IDL proposals since it’s more difficult to reach consensus on a specific API proposal if we don’t first agree on what we’re trying to solve. :)

Just so you know, we’re not currently planning on spending too much time on mesh detection APIs - if you’re interested in driving the discussions around the API, I can give you permissions to the repo so you won’t have to wait for me to approve the PRs & plane detection and meshes could still live side by side in this repo.

@blairmacintyre
Copy link
Author

I'd still like to see @thetuvix's comments (or someone from Microsoft). They have a ton of experience with what people actually use the meshes for, and what they like and dislike about their native APIs.

I'm not convinced occlusion is the main use; except when combined with lighting and physics. It would be great if we can provide the data in both GL and/or JS buffers, perhaps lazily -- I completely agree that avoiding copies and unnecessary work would be great.

It is probably also the case that however we provide it, it should be in a format that can be sent off to a worker efficiently. This implies Transferable buffers, and it implies that the buffers will not be reused by the implementation (since they will have been transferred). Or, at least, that the implementation will notice and no resuse them if they have been transferred. I don't know if it's feasible to support transferring them back and "returning" them to the underlying system somehow (to avoid excessive reallocation)

@cabanier
Copy link
Member

cabanier commented Aug 26, 2019

Thanks, the example code helps a lot!

For detecting objects and interacting with the environment, I suspect that simpler and stable constructs are the better way to go. Generally, you just want to push a mesh directly into a GL call.

In that case, maybe we can design this API in a way that would directly return WebGL buffers and optionally provide a JS-friendly access only if needed by the app? This way we could maybe skip moving significant amounts of data?

I don't think that would work. What form would such an API be? By passing raw arrays, the author can plug them directly into WebGL and do all sorts of interesting things.

The current proposal is very light and doesn't require the browser itself to keep state. Only the perception subsystem and the author need to know the current state of the mesh.

Can we revisit this after you post some example code to demonstrate how the updates are communicated to the app? That’s the piece that I feel I’m still missing.

Yes. I pasted my sample code above and that is what I'm doing: only update the meshes that are new and leave the others as-is.

Based on the examples, it seems that in your proposal you’re not reusing the mesh block objects across frames, so it might not change that much to use a setlike<XRMeshBlock>. OTOH, the XRMesh doesn’t have to be a maplike<DOMString, XRMeshBlock> & your sample code would still work (it relies on the fact that XRMeshBlock has an uuid) - I think we could make XRMetaData simply contain 2 keys, nearMesh and worldMesh that would both be XRMeshes, and XRMesh would be a selike<XRMeshBlock>. All this is just me spit-balling the possibilities, take them or leave them. :)

Only deltas are passed into the callback but I guess appending them to xrFrame works.

Now I think I see it - the mesh block “updates” also are passed in, but there is only one case of a mesh block update and that is a mesh block removal (signaled by passing 0 vertices in an already-existing block).

No. As the system learns more about the world, it can provide greater detail about previously passed in meshes (= update) or it can collapse multiple meshes into one (= update + delete).

As for deltas: is it possible to get the full state of the currently detected mesh if the app missed some of the updates? Is the “app missed an update” something that we should worry about? I’d generally try to make an API that still allows the app to get back in sync with current state, but maybe that’s not the goal here.

No, that is not possible with this approach. I chose this to minimize the need for caching on the browser side.
It would be possible to enhance the API to allow this but in my experience, applications don't typically re-request the mesh.

I created a proposal on my personal github: https://cabanier.github.io/real-world-geometry/webxrmeshing-1.html

Thanks! I didn’t have a chance to take a deeper look at it and the PR yet, but the early feedback that I can give is that usually we start with an explainer that describes use cases but does not include a specific Web IDL proposals since it’s more difficult to reach consensus on a specific API proposal if we don’t first agree on what we’re trying to solve. :)

The introduction in the proposal talks about what problems the meshes are solving.
I can pull it out and make it an explainer if that helps. :-)

This thread started with @blairmacintyre proposing some pseudo-code based on the Hololens API. I just enhanced it and wrote up a proposal.

Just so you know, we’re not currently planning on spending too much time on mesh detection APIs - if you’re interested in driving the discussions around the API, I can give you permissions to the repo so you won’t have to wait for me to approve the PRs & plane detection and meshes could still live side by side in this repo.

What repo is that?
For immersive AR, meshes are considerably more useful than plane detection. Without them, you can't have occlusion or interactive hit testing.
I'm happy to contribute to your API though. Sometimes it's useful to ask the system for a plane with certain properties and that can't be done efficiently with meshes.

@rcabanier
Copy link

rcabanier commented Aug 26, 2019

I'm not convinced occlusion is the main use; except when combined with lighting and physics.

Why do you think lighting and physics are needed for occlusion?
On an additive display occlusion is done by rendering the mesh as solid black.

It would be great if we can provide the data in both GL and/or JS buffers, perhaps lazily -- I completely agree that avoiding copies and unnecessary work would be great.

I agree about avoiding copies.
I don't really know how to provide something as a GL buffer. I suspect only the author really knows how they are going to use the mesh data.
Are you maybe thinking of some sort of wrapper class for the mesh buffer that is understood by WebGL?

It is probably also the case that however we provide it, it should be in a format that can be sent off to a worker efficiently. This implies Transferable buffers, and it implies that the buffers will not be reused by the implementation (since they will have been transferred). Or, at least, that the implementation will notice and no resuse them if they have been transferred. I don't know if it's feasible to support transferring them back and "returning" them to the underlying system somehow (to avoid excessive reallocation)

Typed Arrays are defined to be transferable so the meshes can be sent to workers without having to create a copy.

@blairmacintyre
Copy link
Author

I'm not convinced occlusion is the main use; except when combined with lighting and physics.

Why do you think lighting and physics are needed for occlusion?

I meant the meshes are needed useful for lighting (e.g., shadows, reflections) and physics. Obviously lighting and physics are not needed for occlusion.

@cabanier
Copy link
Member

I meant the meshes are needed useful for lighting (e.g., shadows, reflections) and physics. Obviously lighting and physics are not needed for occlusion.

yes, they are indeed needed for physics. I put that in the spec introduction. Should I also put lighting in there? AFAIK neither Hololens or Magic Leap use meshes for that purpose.

@thetuvix
Copy link

Just catching up on this thread now!

One thing to note is that we just released official platform support for detecting planes on HoloLens through the Scene Understanding API, allowing apps to specifically request walls, floors, ceilings and platforms. This may simplify things by allowing us to just establish planes as the app baseline for devices that do geometry detection at all, since they are currently supported by ARCore, ARKit, Magic Leap and HoloLens. For devices that support no geometry detection, a UA could even synthesize a single floor plane from a bounded reference space, which would at least support the "furniture placement" class of scenarios where you want to visualize how large an object is while you stand next to it.

@cabanier:

It makes sense that you can turn meshing on and off but that could be done with some boolean on the session. I suspect that most authors know what features and feature quality they want so there's no need to change it at runtime.

One pattern that is common on HoloLens is for an app to have a scanning phase and then a running phase. The user starts the app in a new room where the device hasn't been before. The app asks the user to look around until the mesh that's been scanned is sufficient for the app to do whatever analysis and feature isolation that it needs. At a minimum, we need to allow apps to scan for some period and then stop scanning.

Note that even when a HoloLens app is scanning, it is common to not hydrate vertex/index buffers for all meshes. For example, an app may only care about mesh within 10m of the user, even if the user has previously scanned the entire building. Also, after the user places their primary anchor, the app may choose to only generate buffers for mesh within 5m of that anchor to save power.

I see a few options here:

  • We can provide metadata for all available mesh volumes, and allow apps to request vertex/index buffers on demand.
  • We can allow apps to specify a radius around the user within which mesh buffers are optimistically provided.

Given that getting to world mesh involves latent processing anyway, it may be less important to eagerly force finished mesh on apps ASAP. Instead, if the mesh metadata object has a .requestMesh() method that returns a promise, that could get us the best of both worlds, allowing apps to request the mesh they care about most earliest.

required Uint16Array indices;

Depending on the density of mesh requested, a mesh volume on HoloLens can sometimes have enough indices to require a 32-bit index buffer.

Float32Array? normals;

Some app scenarios on HoloLens end up not requiring normals, which aren't free to calculate. We should also allow apps to skip requesting normals in the first place.

@bialpio:

what are the primary use cases for meshes?

The key mesh scenarios that we've seen on HoloLens are placement, occlusion, physics, navigation and visualization. You can see detailed descriptions of these scenarios in both our Spatial Mapping documentation and our new Scene Understanding documentation.

Increasingly, we are seeing apps that are willing to accept even more latency around world mesh/planes, in order to get higher quality for the analysis that powers their placement and dynamic navigation. That is what led to the new Scene Understanding API design.

@bialpio:

First is that I think it’ll send a stronger signal that this mesh data is valid (potentially) only in this particular xrFrame if we make the XRMetaData a member of XRFrame.

It may well be that XRFrame is the most natural place to serve up mesh updates. However, we should avoid sending signals to apps that a given world mesh is only valid for use in a single frame. It will often take multiple frames of background work for the platform to synthesize a new world mesh update for a single volume. An app will then need to use all of that world mesh over many future frames.

Near mesh that is intended to represent the user's hands would indeed be invalid at the next frame if it's expressed in world space or view space. To avoid this problem, HoloLens 2's HandMeshObserver API represents each hand mesh update relative to its own SpatialCoordinateSystem (the equivalent of an XRSpace). This space tracks the user's hand for the timestamp requested.

Note that the API we have on HoloLens for hand meshes has quite different characteristics from our world mesh APIs. For example:

  • Hand meshes have a constant index buffer and then update their vertex buffer every frame. World meshes change both their vertex and index buffer when they update, but only update periodically.
  • Each hand mesh update is aligned with a particular articulated hand joint snapshot, ensuring that the app's joint colliders always fall within the rendered hand mesh. World meshes are expressed relative to world-locked spaces.
  • Hand meshes can be requested in a neutral pose, allowing apps to precalculate the UVs they will need to render hand visualizations. (this is another key reason to have a static index buffer)

The optimal APIs to allow users to both asynchronously request durable world mesh updates and also synchronously receive per-frame pose-predicted hand mesh updates may differ. We should consider how best to address each of these cases first, and then see if there's common ground worth combining.

@bialpio:

As for deltas: is it possible to get the full state of the currently detected mesh if the app missed some of the updates? Is the “app missed an update” something that we should worry about? I’d generally try to make an API that still allows the app to get back in sync with current state, but maybe that’s not the goal here.

One benefit to splitting world mesh handling into first delivering all metadata and then allowing on-demand async hydration of world mesh buffers is that we can give out all metadata every frame, without worrying about wasting effort delivering the same mesh over and over. Apps can determine which meshes are added and deleted by observing metadata entries coming and going, and can determine updates by seeing a newer lastUpdateTime.

@rcabanier
Copy link

rcabanier commented Aug 30, 2019

Just catching up on this thread now!

Thanks for all your feedback @thetuvix !
There's a lot of new information in your message so I will break this thread into a couple of issues.

One thing to note is that we just released official platform support for detecting planes on HoloLens through the Scene Understanding API, allowing apps to specifically request walls, floors, ceilings and platforms. This may simplify things by allowing us to just establish planes as the app baseline for devices that do geometry detection at all, since they are currently supported by ARCore, ARKit, Magic Leap and HoloLens. For devices that support no geometry detection, a UA could even synthesize a single floor plane from a bounded reference space, which would at least support the "furniture placement" class of scenarios where you want to visualize how large an object is while you stand next to it.

I agree and I filed #15 about 2 weeks ago. #5 also touches on this.
The perception system knows what the world looks like and should be able to give the author a list of planes according to certain criteria. The current proposal where frames come in continuously during requestAnimationFrame does not match that model.

@bialpio
Copy link
Contributor

bialpio commented Sep 3, 2019

What repo is that?

This (immersive-web/real-world-geometry) repo. :)

For immersive AR, meshes are considerably more useful than plane detection. Without them, you can't have occlusion or interactive hit testing.

I don't think the second part of this statement is correct - there are other ways of providing occlusion data than just handing out entire mesh to the application. Additionally, interactive hit testing is certainly possible on some of the platforms w/o exposing meshes to the app - the immersive-web/hit-test repo is attempting to expose the APIs which enables that.

@rcabanier
Copy link

I don't think the second part of this statement is correct - there are other ways of providing occlusion data than just handing out entire mesh to the application. Additionally, interactive hit testing is certainly possible on some of the platforms w/o exposing meshes to the app - the immersive-web/hit-test repo is attempting to expose the APIs which enables that.

What I mean by "interactive hit testing", is that the author can do the hit testing themselves since they know geometry of the real world.
That other proposal is for asynchronous hit testing where you ask the system for hit data and it will return it at some later point.

They are both hit testing but the former can be done in real time.

@rcabanier
Copy link

Just catching up on this thread now!

Thanks for all your feedback @thetuvix !
There's a lot of new information in your message so I will break this thread into a couple of issues.

One thing to note is that we just released official platform support for detecting planes on HoloLens through the Scene Understanding API, allowing apps to specifically request walls, floors, ceilings and platforms. This may simplify things by allowing us to just establish planes as the app baseline for devices that do geometry detection at all, since they are currently supported by ARCore, ARKit, Magic Leap and HoloLens. For devices that support no geometry detection, a UA could even synthesize a single floor plane from a bounded reference space, which would at least support the "furniture placement" class of scenarios where you want to visualize how large an object is while you stand next to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants