Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve point-cloud schema extraction #928

Merged
merged 2 commits into from
Oct 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ _When adding new entries to the changelog, please include issue/PR numbers where
- Adds support for disabling the working-copy checkout of specific datasets using the commands `kart import DATASET --no-checkout` or `kart checkout --not-dataset=DATASET`, and re-enabling it using `kart checkout --dataset=DATASET`. [#926](https://github.com/koordinates/kart/pull/926)
- Adds information on referencing and citing Kart to `CITATION`. [#914](https://github.com/koordinates/kart/pull/914)
- Fixes a bug where Kart would misidentify a non-Kart repo as a Kart V1 repo in some circumstances. [#918](https://github.com/koordinates/kart/issues/918)
- Improve schema extraction for point cloud datasets. [#924](https://github.com/koordinates/kart/issues/924)

## 0.14.2

Expand Down
110 changes: 71 additions & 39 deletions docs/pages/development/pointcloud_v1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,103 +87,135 @@ For example, this is the schema of a dataset using "PDRF 7":
[
{
"name": "X",
"dataType": "float",
"size": 64
"dataType": "integer",
"size": 32
},
{
"name": "Y",
"dataType": "float",
"size": 64
"dataType": "integer",
"size": 32
},
{
"name": "Z",
"dataType": "float",
"size": 64
"dataType": "integer",
"size": 32
},
{
"name": "Intensity",
"dataType": "integer",
"size": 16
"size": 16,
"unsigned": true
craigds marked this conversation as resolved.
Show resolved Hide resolved
},
{
"name": "ReturnNumber",
"name": "Return Number",
"dataType": "integer",
"size": 8
"size": 4,
"unsigned": true
},
{
"name": "NumberOfReturns",
"name": "Number of Returns",
"dataType": "integer",
"size": 8
"size": 4,
"unsigned": true
},
{
"name": "ScanDirectionFlag",
"name": "Synthetic",
"dataType": "integer",
"size": 8
"size": 1
},
{
"name": "EdgeOfFlightLine",
"name": "Key-Point",
"dataType": "integer",
"size": 8
"size": 1
},
{
"name": "Classification",
"name": "Withheld",
"dataType": "integer",
"size": 8
"size": 1
},
{
"name": "ScanAngleRank",
"dataType": "float",
"size": 32
"name": "Overlap",
"dataType": "integer",
"size": 1
},
{
"name": "UserData",
"name": "Scanner Channel",
"dataType": "integer",
"size": 8
"size": 2,
"unsigned": true
},
{
"name": "PointSourceId",
"name": "Scan Direction Flag",
"dataType": "integer",
"size": 16
"size": 1
},
{
"name": "GpsTime",
"dataType": "float",
"size": 64
"name": "Edge of Flight Line",
"dataType": "integer",
"size": 1
},
{
"name": "ScanChannel",
"name": "Classification",
"dataType": "integer",
"size": 8
"size": 8,
"unsigned": true
},
{
"name": "ClassFlags",
"name": "User Data",
"dataType": "integer",
"size": 8
"size": 8,
"unsigned": true
},
{
"name": "Red",
"name": "Scan Angle",
"dataType": "integer",
"size": 16
},
{
"name": "Point Source ID",
"dataType": "integer",
"size": 16,
"unsigned": true
},
{
"name": "GPS Time",
"dataType": "float",
"size": 64
},
{
"name": "Red",
"dataType": "integer",
"size": 16,
"unsigned": true
},
{
"name": "Green",
"dataType": "integer",
"size": 16
"size": 16,
"unsigned": true
},
{
"name": "Blue",
"dataType": "integer",
"size": 16
"size": 16,
"unsigned": true
}
]

Kart uses `PDAL <pdal_>`_ internally to read and write LAS files. For certain fields, PDAL modifies the type of the field as it reads it, for either of the following reasons:

* The native type of the field is "fixed point" - for the sake of simplicity, PDAL converts these to the more widely-used floating point type.
* The native type of the field has changed over time. In order that the field can be read in a consistent way without worrying about the LAS version, PDAL converts
these fields to a type expressive enough that both old and new data can be stored in the same type.
Note: Kart vs PDAL schema extraction
####################################

Kart uses `PDAL <pdal_>`_ internally to read and write LAS files. PDAL is an abstraction layer that can read data from a variety of different
types of point cloud files, and as such, it interprets the schema in its own way to make it more interoperable with the rest of PDAL.
The schema that Kart conveys is schema of the LAS file as it is stored or specified, not as PDAL reads it, although these two concepts are very similar. Here are some differences between stored / specified schema and PDAL's interpretation:

* Where the specification gives a dimension's name as multiple words, ie "Number of Returns", PDAL reports it in CamelCase, ie "NumberOfReturns".
* PDAL converts some dimensions which are technically stored as integers to floating point values as it applies scaling factors to them - for example, X, Y, and Z.
* Sometimes PDAL loads newer and older versions of a particular dimension in a version-independent way - ie the older 8-bit field "Scan Angle Rank" and the newer 16-bit field "Scan Angle" are both loaded as "ScanAngleRank", and both converted to floating point.

Kart exposes the schema as read by PDAL (not as it is actually stored) - all of the same changes are made.
If you need to see PDAL's interpretation of a schema instead of Kart's, you can run ``pdal info --schema <FILENAME>``.
A PDAL command-line executable can be found in the directory where Kart is installed.

``meta/crs.wkt``
^^^^^^^^^^^^^^^^
Expand Down
61 changes: 45 additions & 16 deletions kart/point_cloud/metadata_util.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import base64
from enum import IntFlag
import logging
import json
Expand All @@ -15,10 +16,9 @@
from kart.lfs_util import get_hash_and_size_of_file
from kart.geometry import ring_as_wkt
from kart.point_cloud.schema_util import (
get_schema_from_pdrf,
get_schema_from_pdrf_and_vlr,
get_record_length_from_pdrf,
equivalent_copc_pdrf,
pdal_schema_to_kart_schema,
)
from kart import subprocess_util as subprocess

Expand Down Expand Up @@ -84,14 +84,23 @@ def rewrite_format(tile_metadata, rewrite_metadata=RewriteMetadata.NO_REWRITE):
elif RewriteMetadata.AS_IF_CONVERTED_TO_COPC in rewrite_metadata:
orig_pdrf = orig_format["pointDataRecordFormat"]
new_pdrf = equivalent_copc_pdrf(orig_pdrf)
return {
"compression": "laz",
"lasVersion": "1.4",
"optimization": "copc",
"optimizationVersion": "1.0",
"pointDataRecordFormat": new_pdrf,
"pointDataRecordLength": get_record_length_from_pdrf(new_pdrf),
}
orig_length = orig_format["pointDataRecordLength"]
new_length = (
orig_length
- get_record_length_from_pdrf(orig_pdrf)
+ get_record_length_from_pdrf(new_pdrf)
)
return _remove_nones(
{
"compression": "laz",
"lasVersion": "1.4",
"optimization": "copc",
"optimizationVersion": "1.0",
"pointDataRecordFormat": new_pdrf,
"pointDataRecordLength": new_length,
"extraBytesVlr": orig_format.get("extraBytesVlr"),
}
)
else:
return orig_format

Expand All @@ -103,7 +112,7 @@ def rewrite_schema(tile_metadata, rewrite_metadata=RewriteMetadata.NO_REWRITE):
orig_schema = tile_metadata["schema.json"]
if RewriteMetadata.AS_IF_CONVERTED_TO_COPC in rewrite_metadata:
orig_pdrf = tile_metadata["format.json"]["pointDataRecordFormat"]
return get_schema_from_pdrf(equivalent_copc_pdrf(orig_pdrf))
return get_schema_from_pdrf_and_vlr(equivalent_copc_pdrf(orig_pdrf), None)
else:
return orig_schema

Expand Down Expand Up @@ -188,16 +197,20 @@ def extract_pc_tile_metadata(pc_tile_path, oid_and_size=None):
compound_crs = metadata["srs"].get("compoundwkt")
horizontal_crs = metadata["srs"].get("wkt")
is_copc = metadata.get("copc") or False
pdrf = metadata["dataformat_id"]
format_json = {
"compression": "laz" if metadata["compressed"] else "las",
"lasVersion": f"{metadata['major_version']}.{metadata['minor_version']}",
"optimization": "copc" if is_copc else None,
"optimizationVersion": get_copc_version(metadata) if is_copc else None,
"pointDataRecordFormat": metadata["dataformat_id"],
"pointDataRecordFormat": pdrf,
"pointDataRecordLength": metadata["point_length"],
}
extra_bytes_vlr = find_extra_bytes_vlr(metadata)
if extra_bytes_vlr:
format_json["extraBytesVlr"] = True

schema_json = pdal_schema_to_kart_schema(output["schema"])
schema_json = get_schema_from_pdrf_and_vlr(pdrf, extra_bytes_vlr)
if oid_and_size:
oid, size = oid_and_size
else:
Expand All @@ -219,14 +232,12 @@ def extract_pc_tile_metadata(pc_tile_path, oid_and_size=None):
"oid": f"sha256:{oid}",
"size": size,
}
if not url:
tile_info.pop("url", None)

result = {
"format.json": format_json,
"schema.json": schema_json,
"crs.wkt": normalise_wkt(compound_crs or horizontal_crs),
"tile": tile_info,
"tile": _remove_nones(tile_info),
}

return result
Expand Down Expand Up @@ -260,6 +271,8 @@ def _calc_crs84_extent(src_extent, src_crs):
"""
Given a 3D extent with a particular CRS, return a CRS84 extent that surrounds that extent.
"""
if not src_crs:
return None
src_srs = osr.SpatialReference()
src_srs.ImportFromWkt(src_crs)
src_srs.SetAxisMappingStrategy(osr.OAMS_TRADITIONAL_GIS_ORDER)
Expand Down Expand Up @@ -307,3 +320,19 @@ def extract_format(tile_format):
if "format" in tile_format:
return tile_format["format"]
return tile_format


def find_extra_bytes_vlr(metadata):
return find_vlr(metadata, "LASF_Spec", 4)


def find_vlr(metadata, user_id, record_id):
for key, value in metadata.items():
if not key.startswith("vlr"):
continue
if value["user_id"] == user_id and value["record_id"] == record_id:
return base64.b64decode(value["data"])


def _remove_nones(input_dict):
return {key: value for key, value in input_dict.items() if value is not None}
7 changes: 2 additions & 5 deletions kart/point_cloud/pdal_convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,7 @@ def convert_tile_to_copc(source, dest):
"type": "readers.las",
"filename": str(source),
},
{
"type": "writers.copc",
"filename": str(dest),
"forward": "all",
},
{"type": "writers.copc", "filename": str(dest), "forward": "all"},
]
try:
pdal_execute_pipeline(pipeline)
Expand All @@ -54,6 +50,7 @@ def convert_tile_to_laz(source, dest, target_format):
"type": "writers.las",
"filename": str(dest),
"forward": "all",
"extra_dims": "all",
"compression": True,
"major_version": major_version,
"minor_version": minor_version,
Expand Down
Loading
Loading