Skip to content

Add geolocation querying to DynamoDB tables

License

Notifications You must be signed in to change notification settings

russellsteadman/dynamo-locx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Geo Querying for Amazon DynamoDB

This project is a Node.js location querying library for Amazon DynamoDB through geohashing. It is based on AWS's awslabs/dynamodb-geo and Rob Hogan's robhogan/dynamodb-geo.js, and it works directly with AWS DynamoDB SDK.

Features

  • Box Queries: Return all of the items that fall within a pair of geo points that define a rectangle as projected onto a sphere.
  • Radius Queries: Return all of the items that are within a given radius of a geo point.
  • Basic CRUD Operations: Create, retrieve, update, and delete geospatial data items.
  • Customizable: Access to raw request and result objects from the AWS SDK for javascript.
  • Fully Typed: This port is written in typescript and declaration files are bundled into releases.

Installation

npm install dynamo-locx

Getting started

Start by setting up the DynamoDB client. This is the same as you would do for any other DynamoDB application. To test locally, you can run docker run -p 8000:8000 deangiberson/aws-dynamodb-local to spin up a local docker instance exposed on port 8000.

Create an instance of GeoTable for each geospatial table. This allows you to configure per-table options, but at minimum you must provide a DynamoDBClient instance and a table name. See the configuration reference for more details.

import GeoTable from "dynamo-locx";
// Or, if you are using CommonJS:
// const GeoTable = require("dynamo-locx").default;
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";

const ddb = new DynamoDBClient({
  endpoint: "http://localhost:8000", // For local development only
  region: "us-east-1",
});

const locx = new GeoTable({
  client: ddb,
  tableName: "MyGeoTable",
  hashKeyLength: 3, // See below for explanation
  // See configuration reference for more options...
});

Choosing a hash key length

The hashKeyLength is the number of most significant digits (in base 10) of the 64-bit geo hash to use as the hash key. Larger numbers will allow small geographical areas to be spread across DynamoDB partitions, but at the cost of performance as more queries need to be executed for box/radius searches that span hash keys. See these tests for an idea of how query performance scales with hashKeyLength for different search radii.

If your data is sparse, a large number will mean more RCUs since more empty queries will be executed and each has a minimum cost. However if your data is dense and hashKeyLength is too short, more RCUs will be needed to read a hash key and a higher proportion will be discarded by server-side filtering.

From the AWS Query documentation:

DynamoDB calculates the number of read capacity units consumed based on item size, not on the amount of data that is returned to an application. ... The number will also be the same whether or not you use a FilterExpression

Optimally, you should pick the largest hashKeyLength your usage scenario allows. The wider your typical radius/box queries, the smaller it will need to be. Changing your hashKeyLength would require you to recreate your table.

Creating a GeoTable

GeoTable has method getCreateTableRequest for to create a DynamoDB CreateTable request request given your configuration. This request can be edited as desired before being sent to DynamoDB. Alternatively, you can create a table using other methods as long as it has the correct schema and indexing. See the table setup for details.

const createTableInput = locx.getCreateTableRequest({
  BillingMode: "PROVISIONED",
  ProvisionedThroughput: {
    ReadCapacityUnits: 5,
    WriteCapacityUnits: 5,
  },
  // Configure any CreateTableCommandInput options here
});

// Create the table
ddb
  .send(new CreateTableCommand(createTableInput))
  // Wait for it to become ready
  .then(() =>
    waitForTableToBeReady(
      { client: ddb, maxWaitTime: 20 },
      { TableName: locx.tableName }
    )
  )
  .then(() => {
    console.log("Table created and ready!");
  });

Adding a GeoPoint

locx
  .putPoint({
    RangeKeyValue: { S: "1234" }, // Use this to ensure uniqueness of the hash/range pairs
    GeoPoint: {
      // An object specifying latitude and longitude as plain numbers
      // These are used to build the geohash, the hashkey, and geojson data
      latitude: 51.51,
      longitude: -0.13,
    },
    PutItemCommandInput: {
      // Passed through to the underlying PutItem request, TableName is prefilled
      Item: {
        // The primary key, geohash, and geojson data are prefilled
        country: { S: "UK" }, // Specify attribute values using the AttributeValue type
        capital: { S: "London" },
      },
      // ... Anything else to pass through to PutItem request, e.g. ConditionExpression
    },
  })
  .then(function () {
    console.log("Done!");
  });

See also DynamoDB PutItem request

Updating a GeoPoint

The hash key, range key, geohash and geoJson cannot be updated. To change these, recreate the record.

You must specify a RangeKeyValue, a GeoPoint, and an UpdateItemCommandInput matching the DynamoDB UpdateItem request (TableName and Key are filled in for you).

locx
  .updatePoint({
    RangeKeyValue: { S: "1234" },
    GeoPoint: {
      // An object specifying latitude and longitude as plain numbers.
      latitude: 51.51,
      longitude: -0.13,
    },
    UpdateItemCommandInput: {
      // TableName and Key are filled in for you
      UpdateExpression: "SET country = :newName",
      ExpressionAttributeValues: {
        ":newName": { S: "United Kingdom" },
      },
    },
  })
  .then(function () {
    console.log("Done!");
  });

Deleting a GeoPoint

You must specify a RangeKeyValue and a GeoPoint. Optionally, you can pass DeleteItemInput matching DynamoDB DeleteItem request (TableName and Key are filled in for you).

locx
  .deletePoint({
    RangeKeyValue: { S: "1234" },
    GeoPoint: {
      // An object specifying latitutde and longitude as plain numbers.
      latitude: 51.51,
      longitude: -0.13,
    },
    DeleteItemCommandInput: {
      // Optional, any additional parameters to pass through.
      // TableName and Key are filled in for you
      // Example: Only delete if the point does not have a country name set
      ConditionExpression: "attribute_not_exists(country)",
    },
  })
  .then(function () {
    console.log("Done!");
  });

Rectangular queries

Query by rectangle by specifying a MinPoint and MaxPoint.

// Querying a rectangle
locx
  .queryRectangle({
    MinPoint: {
      latitude: 52.22573,
      longitude: 0.149593,
    },
    MaxPoint: {
      latitude: 52.889499,
      longitude: 0.848383,
    },
  })
  // Print the results, an array of DynamoDB.AttributeMaps
  .then(console.log);

Radius queries

Query by radius by specifying a CenterPoint and RadiusInMeter.

// Querying 100km from Cambridge, UK
locx
  .queryRadius({
    RadiusInMeter: 100000,
    CenterPoint: {
      latitude: 52.22573,
      longitude: 0.149593,
    },
  })
  // Print the results, an array of DynamoDB.AttributeMaps
  .then(console.log);

Batch operations

TODO: Docs (see the example for an example of a batch write)

Configuration reference

client: DynamoDBClient

(Required) The DynamoDBClient to use.

tableName: string

(Required) The name of the DynamoDB table to use.

consistentRead: boolean = false

Whether queries use the ConsistentRead option (for strongly consistent reads) or not (for eventually consistent reads, at half the cost).

This can also be overridden for individual queries as a query config option.

longitudeFirst: boolean = true

This library will automatically add GeoJSON-style position data to your stored items. The GeoJSON standard uses [lon,lat] ordering, but awslabs/dynamodb-geo uses [lat,lng].

This fork allows you to choose between awslabs/dynamodb-geo compatibility and GeoJSON standard compliance.

  • Use false ([lat, lon]) for compatibility with awslabs/dynamodb-geo
  • Use true ([lon, lat]) for GeoJSON standard compliance. (default)

Note that this value should match the state of your existing data - if you change it you must update your database manually, or you'll end up with ambiguously mixed data.

geoJsonPointType: "Point" | "POINT" = "Point"

The value of the type attribute in recorded GeoJSON points. Should normally be "Point", which is standards compliant.

Use "POINT" for compatibility with awslabs/dynamodb-geo.

This setting is only relevant for writes. This library doesn't inspect or set this value when reading/querying.

geohashAttributeName: string = "geohash"

The name of the attribute storing the full 64-bit geohash. Its value is auto-generated based on item coordinates.

hashKeyAttributeName: string = "hashKey"

The name of the attribute storing the first hashKeyLength digits (default 2) of the geo hash, used as the hash (aka partition) part of a hash/range primary key pair. Its value is auto-generated based on item coordinates.

hashKeyLength: number = 2

See above.

rangeKeyAttributeName: string = "rangeKey"

The name of the attribute storing the range key, used as the range (aka sort) part of a hash/range key primary key pair. Its value must be specified by you (hash-range pairs must be unique).

geoJsonAttributeName: string = "geoJson"

The name of the attribute which will contain the longitude/latitude pair in a GeoJSON-style point (see also longitudeFirst).

geohashIndexName: string = "geohash-index"

The name of the index to be created against the geohash. Only used for creating new tables.

Table Setup

The library requires a table with a hash/range primary key pair. The hash key is a number, and the range key is a string. The hash key is specified by the hashKeyAttributeName configuration option, and the range key is specified by the rangeKeyAttributeName configuration option.

The library also requires a [local secondary index][lsi] with the same hash key and a different range key. This index range key is a number, and it is specified by the geohashAttributeName configuration option. The name of the index is specified by the geohashIndexName configuration option. This index must project at least the hash key, range key, geohash, and GeoJSON attributes.

Summary of table setup

  • Primary hash key: hashKeyAttributeName, default hashKey (number)
  • Primary range key: rangeKeyAttributeName, default rangeKey (string)
  • Local secondary index hash key: same as primary hash key
  • Local secondary index range key: geohashAttributeName, default geohash (number)
  • Local secondary index name: geohashIndexName, default geohash-index
  • Local secondary index projection: ALL (or at least hashKeyAttributeName, rangeKeyAttributeName, geohashAttributeName, and geoJsonAttributeName)

Example

See the example on Github

Limitations

No composite key support

Currently, the library does not support composite keys. You may want to add tags such as restaurant, bar, and coffee shop, and search locations of a specific category; however, it is currently not possible. You need to create a table for each tag and store the items separately.

Queries retrieve all paginated data

Although low level DynamoDB Query requests return paginated results, this library automatically pages through the entire result set. When querying a large area with many points, a lot of Read Capacity Units may be consumed.

More Read Capacity Units

The library retrieves candidate Geo points from the cells that intersect the requested bounds. The library then post-processes the candidate data, filtering out the specific points that are outside the requested bounds. Therefore, the consumed Read Capacity Units will be higher than the final results dataset. Typically 8 queries are executed per radius or box search.

High memory consumption

Because all paginated Query results are loaded into memory and processed, it may consume substantial amounts of memory for large datasets.

Dataset density limitation

The Geohash used in this library is roughly centimeter precision. Therefore, the library is not suitable if your dataset has much higher density.

License

Where not otherwise noted, copyright project contributors and licensed under an Apache 2.0 License. See LICENSE for full details.

Legal Disclaimer

This project is not affiliated with or endorsed by Amazon Technologies, Inc. or any of its affiliates. Amazon and DynamoDB are trademarks of Amazon Technologies, Inc. and used nominatively only.