Skip to content

Commit

Permalink
Improve and expand client protocol docs
Browse files Browse the repository at this point in the history
Include the new spooling protocol and its configuration for CLI and JDBC
driver.
  • Loading branch information
mosabua committed Nov 27, 2024
1 parent b854cb7 commit b2dc49a
Show file tree
Hide file tree
Showing 10 changed files with 461 additions and 48 deletions.
1 change: 1 addition & 0 deletions docs/src/main/sphinx/admin.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ admin/properties

* [Properties reference overview](admin/properties)
* [](admin/properties-general)
* [](admin/properties-client-protocol)
* [](admin/properties-http-server)
* [](admin/properties-resource-management)
* [](admin/properties-query-management)
Expand Down
232 changes: 232 additions & 0 deletions docs/src/main/sphinx/admin/properties-client-protocol.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
# Client protocol properties

The following sections provide a reference for all properties related to the
[client protocol](/client/client-protocol).

(prop-protocol-spooling)=
## Spooling protocol properties

The following properties are related to the [](protocol-spooling).

### `protocol.spooling.enabled`

- **Type:** [](prop-type-boolean)
- **Default value:** `true`

Enable the support for the client [](protocol-spooling). The protocol is used if
client drivers and applications request usage, otherwise the direct protocol is
used automatically.

### `protocol.spooling.shared-secret-key`

- **Type:** [](prop-type-string)

A required 256 bit, base64-encoded secret key used to secure spooled metadata
exchanged with the client.

### `protocol.spooling.retrieval-mode`

- **Type:** [](prop-type-string)
- **Default value:** `STORAGE`

Determines how the client retrieves the segment. Following are possible values:

* `STORAGE` - client accesses the storage directly with the pre-signed URI. Uses
one client HTTP request per data segment.
* `COORDINATOR_STORAGE_REDIRECT` - client first accesses the coordinator, which
redirects the client to the storage with the pre-signed URI. Uses two client
HTTP requests per data segment.
* `COORDINATOR_PROXY` - client accesses the coordinator and gets data segment
through it. Uses one client HTTP request per data segment, but requires a
coordinator HTTP request to the storage.
* `WORKER_PROXY` - client accesses the coordinator, which redirects to an
available worker node. It fetches the data from the storage and provides it
to the client. Uses two client HTTP requests, and requires a worker request to
the storage.

### `protocol.spooling.encoding.json.enabled`

- **Type:** [](prop-type-boolean)
- **Default value:** `true`

Activate support for using uncompressed JSON encoding for spooled segments.

### `protocol.spooling.encoding.json+zstd.enabled`

- **Type:** [](prop-type-boolean)
- **Default value:** `true`

Activate support for using JSON encoding with Zstandard compression for spooled
segments.

### `protocol.spooling.encoding.json+lz4.enabled`

- **Type:** [](prop-type-boolean)
- **Default value:** `true`

Activate support for using JSON encoding with LZ4 compression for spooled
segments.

### `protocol.spooling.initial-segment-size`

- **Type:** [](prop-type-data-size)
- **Default value:** 8MB

Initial size of the spooled segments.

### `protocol.spooling.maximum-segment-size`

- **Type:** [](prop-type-data-size)
- **Default value:** 16MB

Maximum size for each spooled segment.

### `protocol.spooling.inlining.enabled`

- **Type:** [](prop-type-boolean)
- **Default value:** `true`

Allow spooled protocol to inline initial rows to decrease time to return the
first row.

### `protocol.spooling.inlining.max-rows`

- **Type:** [](prop-type-integer)
- **Default value:** 1000

Maximum number of rows to inline per worker.

### `protocol.spooling.inlining.max-size`

- **Type:** [](prop-type-data-size)
- **Default value:** 128kB

Maximum size of rows to inline per worker.

(prop-spooling-file-system)=
## Spooling file system properties

The following properties are used to configure the object storage used with the
[](protocol-spooling).

### `fs.azure.enabled`

- **Type:** [](prop-type-boolean)
- **Default value:** `false`

Activate [](/object-storage/file-system-azure) for spooling segments.

### `fs.s3.enabled`

- **Type:** [](prop-type-boolean)
- **Default value:** `false`

Activate [](/object-storage/file-system-s3) for spooling segments.

### `fs.gcs.enabled`

- **Type:** [](prop-type-boolean)
- **Default value:** `false`

Activate [](/object-storage/file-system-gcs) for spooling segments.

### `fs.location`

- **Type:** [](prop-type-string)

The object storage location to use for spooling segments. Must be accessible by
the coordinator and all workers. With the `protocol.spooling.retrieval-mode`
retrieval modes `STORAGE` and `COORDINATOR_STORAGE_REDIRECT` the location must
also be accessible by all clients. Valid location values vary by object storage
type, and typically follow a pattern of `scheme://bucketName/path/`.

Examples:

* `s3://my-spooling-bucket/my-segments/`

:::{caution}
When using the same object storage for spooling from multiple Trino clusters,
you must use separate locations for each cluster. For example:

* `s3://my-spooling-bucket/my-segments/cluster1`
* `s3://my-spooling-bucket/my-segments/cluster2`
:::

### `fs.segment.ttl`

- **Type:** [](prop-type-duration)
- **Default value:** `12h`

Maximum available time for the client to retrieve spooled segment before it
expires and is pruned.

### `fs.segment.direct.ttl`

- **Type:** [](prop-type-duration)
- **Default value:** `1h`

Maximum available time for the client to retrieve spooled segment using the
pre-signed URI.

### `fs.segment.encryption`

- **Type:** [](prop-type-boolean)
- **Default value:** `true`

Encrypt segments with ephemeral keys using Server-Side Encryption with Customer
key (SSE-C).

### `fs.segment.explicit-ack`

- **Type:** [](prop-type-boolean)
- **Default value:** `true`

Activate pruning of segments on client acknowledgment of a successful read of
each segment.

### `fs.segment.pruning.enabled`

- **Type:** [](prop-type-boolean)
- **Default value:** `true`

Activate periodic pruning of expired segments.

### `fs.segment.pruning.interval`

- **Type:** [](prop-type-duration)
- **Default value:** `5m`

Interval to prune expired segments.

### `fs.segment.pruning.batch-size`

- **Type:** integer
- **Default value:** `250`

Number of expired segments to prune as a single batch operation.

(prop-protocol-shared)=
## Shared protocol properties

The following properties are related to the [](protocol-spooling) and the
[](protocol-direct), formerly named the V1 protocol.

### `protocol.v1.prepared-statement-compression.length-threshold`

- **Type:** [](prop-type-integer)
- **Default value:** `2048`

Prepared statements that are submitted to Trino for processing, and are longer
than the value of this property, are compressed for transport via the HTTP
header to improve handling, and to avoid failures due to hitting HTTP header
size limits.

### `protocol.v1.prepared-statement-compression.min-gain`

- **Type:** [](prop-type-integer)
- **Default value:** `512`

Prepared statement compression is not applied if the size gain is less than the
configured value. Smaller statements do not benefit from compression, and are
left uncompressed.

19 changes: 0 additions & 19 deletions docs/src/main/sphinx/admin/properties-general.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,25 +33,6 @@ across nodes in the cluster. It can be disabled, when it is known that the
output data set is not skewed, in order to avoid the overhead of hashing and
redistributing all the data across the network.

## `protocol.v1.prepared-statement-compression.length-threshold`

- **Type:** {ref}`prop-type-integer`
- **Default value:** `2048`

Prepared statements that are submitted to Trino for processing, and are longer
than the value of this property, are compressed for transport via the HTTP
header to improve handling, and to avoid failures due to hitting HTTP header
size limits.

## `protocol.v1.prepared-statement-compression.min-gain`

- **Type:** {ref}`prop-type-integer`
- **Default value:** `512`

Prepared statement compression is not applied if the size gain is less than the
configured value. Smaller statements do not benefit from compression, and are
left uncompressed.

(file-compression)=
## File compression and decompression

Expand Down
1 change: 1 addition & 0 deletions docs/src/main/sphinx/admin/properties.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ properties, refer to the {doc}`connector documentation </connector/>`.
:titlesonly: true
General <properties-general>
Client protocol <properties-client-protocol>
HTTP server <properties-http-server>
Resource management <properties-resource-management>
Query management <properties-query-management>
Expand Down
54 changes: 38 additions & 16 deletions docs/src/main/sphinx/client.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,49 @@
# Clients

A [client](trino-concept-client) is used to send queries to Trino and receive
results, or otherwise interact with Trino and the connected data sources.
A [client](trino-concept-client) is used to send SQL queries to Trino, and
therefore any [connected data sources](trino-concept-data-source), and receive
results.

Some clients, such as the [command line interface](/client/cli), can provide a
user interface directly. Clients like the [JDBC driver](/client/jdbc), provide a
mechanism for other applications, including your own custom applications, to
connect to Trino.
## Client drivers

The following clients are available as part of every Trino release:
Client drivers, also called client libraries, provide a mechanism for other
applications to connect to Trino. The application are called client application
and include your own custom applications or scripts. The Trino project maintains the
following client drivers:

* [Trino JDBC driver](/client/jdbc)
* [trino-go-client](https://github.com/trinodb/trino-go-client)
* [trino-js-client](https://github.com/trinodb/trino-js-client)
* [trino-python-client](https://github.com/trinodb/trino-python-client)

Other communities and vendors provide [other client
drivers](https://trino.io/ecosystem/client.html).

## Client applications

Client applications provide a user interface and other user-facing features to
run queries with Trino. You can inspect the results, perform analytics with
further queries, and create visualizations. Client applications typically use a
client driver.

The Trino project maintains the [Trino command line interface](/client/cli) as a
client application.

Other communities and vendors provide [numerous other client
applications](https://trino.io/ecosystem/client.html)

## Client protocol

All client drivers and client applications communicate with the Trino
coordinator using the [client protocol](/client/client-protocol).

Configure support for the [spooling protocol](protocol-spooling) on the cluster
to improve throughput for client interactions with higher data transfer demands.

```{toctree}
:maxdepth: 1
client/client-protocol
client/cli
client/jdbc
```

The Trino project maintains the following other client libraries:

* [trino-go-client](https://github.com/trinodb/trino-go-client)
* [trino-js-client](https://github.com/trinodb/trino-js-client)
* [trino-python-client](https://github.com/trinodb/trino-python-client)

In addition, other communities and vendors provide [numerous other client
libraries, drivers, and applications](https://trino.io/ecosystem/client)
14 changes: 14 additions & 0 deletions docs/src/main/sphinx/client/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -604,6 +604,20 @@ Query 20200707_170726_00030_2iup9 failed: line 1:25: Column 'region' cannot be r
SELECT nationkey, name, region FROM tpch.sf1.nation LIMIT 3
```

(cli-spooling-protocol)=
## Spooling protocol

The Trino CLI automatically uses of the spooling protocol to improve throughput
for client interactions with higher data transfer demands, if the
[](protocol-spooling) is configured on the cluster.

Optionally use the `--encoding` option to configure a different desired
encoding, compared to the default on the cluster. The available values are
`json+zstd` (recommended) for JSON with Zstandard compression, and `json+lz4`
for JSON with LZ4 compression, and `json` for uncompressed JSON.

The CLI process must have network access to the spooling object storage.

(cli-output-format)=
## Output formats

Expand Down
Loading

0 comments on commit b2dc49a

Please sign in to comment.