Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for publicly hosted USGS LiDAR #26

Open
rosepearson opened this issue May 20, 2022 · 2 comments
Open

Add support for publicly hosted USGS LiDAR #26

rosepearson opened this issue May 20, 2022 · 2 comments
Assignees

Comments

@rosepearson
Copy link
Collaborator

rosepearson commented May 20, 2022

The USGS is moving to publically host it's LiDAR data on a public AWS server. Details can be found at: https://registry.opendata.aws/usgs-lidar/

The key information is:
location = us-west-2 -> s3.us-west-2.amazonaws.com
bucket = usgs-lidar-public

The AWS key appears to be inhte URL to the dataset on open topograpy. For instace the key for https://portal.opentopography.org/usgsDataset?dsid=USGS_LPC_AL_25Co_B3_2017 is USGS_LPC_AL_25Co_B3_2017

The only challenge seems to be tracking down the key of the dataset as this information doesn't seem to be listed in the metadata seek link for an example.

@rosepearson
Copy link
Collaborator Author

rosepearson commented May 20, 2022

A quick example of some code for connecting to the relevant bucket either via boto.client or boto.resource and printing out some of the contained objects.

NETLOC_DATA = "s3.us-west-2.amazonaws.com" 
SCHEME = "https"
aws_endpoint_url = urllib.parse.urlunparse((SCHEME, NETLOC_DATA, "", "", "", ""))

client = boto3.client('s3', endpoint_url=aws_endpoint_url,
                      config=botocore.config.Config(signature_version=botocore.UNSIGNED))

s3 = boto3.resource('s3', endpoint_url=aws_endpoint_url,
                    config=botocore.config.Config(signature_version=botocore.UNSIGNED))

my_bucket = s3.Bucket('usgs-lidar-public')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object)

@rosepearson rosepearson self-assigned this May 20, 2022
@rosepearson
Copy link
Collaborator Author

rosepearson commented May 20, 2022

So I've had a look into the contents of the AWS bucket for dataset USGS LPC AL 25Co B3 2017.

I created a boto3 client for interrogating the dataset as:

import urllib, boto3, botocore

NETLOC_DATA = "s3.us-west-2.amazonaws.com" 
SCHEME = "https"
aws_endpoint_url = urllib.parse.urlunparse((SCHEME, NETLOC_DATA, "", "", "", ""))
client = boto3.client('s3', endpoint_url=aws_endpoint_url ,
                      config=botocore.config.Config(signature_version=botocore.UNSIGNED))

And had a look at the folder structure of the bucket using client.list_objects_v2(Bucket='usgs-lidar-public', Prefix='USGS_LPC_AL_25Co_B3_2017/', Delimiter='/'), which returned the 'common prefixes':

[{'Prefix': 'USGS_LPC_AL_25Co_B3_2017/ept-backup/'},
  {'Prefix': 'USGS_LPC_AL_25Co_B3_2017/ept-data/'},
  {'Prefix': 'USGS_LPC_AL_25Co_B3_2017/ept-hierarchy/'},
  {'Prefix': 'USGS_LPC_AL_25Co_B3_2017/ept-sources/'},
  {'Prefix': 'USGS_LPC_AL_25Co_B3_2017/info/'}]

The .laz files appear to be contained in the ept-data folder. The other folders appear to contain .json files which may or may not have information about the naming convention or spatial distribution of the .laz files.

Looking at the contents of a JSON file

An example of the contents of USGS_LPC_AL_25Co_B3_2017_16S_FB_3393.json from the `info folder~is show below.
image

client.download_file('usgs-lidar-public', 'USGS_LPC_AL_25Co_B3_2017/info/USGS_LPC_AL_25Co_B3_2017_16S_FB_3393.json', r"path/to/download/USGS_LPC_AL_25Co_B3_2017/USGS_LPC_AL_25Co_B3_2017_16S_FB_3393.json")

Downloading a .laz file

Filtering is only supported by prefix and not file type. First we need to filter the object lists to select only LAZ files.

prefix = 'USGS_LPC_AL_25Co_B3_2017/ept-data/'
file_list = client.list_objects_v2(Bucket='usgs-lidar-public', Prefix=prefix, Delimiter='/')

Download the first file in the returned file list:

download_path = pathlib.Path("/path/to/download/location/")
pathlib.Path(download_path / prefix).mkdir(parents=True, exist_ok=True)

for i in len(file_list['Contents']):
    client.download_file('usgs-lidar-public',file_list['Contents'][i]['Key'],
                                      download_path / file_list['Contents'][i]['Key'])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant