PureScript library and code generator for Google Protocol Buffers version 3.
This library operates on
ArrayBuffer
, so it will run both
in Node.js
and in browser environments.
We aim to support binary-encoded protobuf for
syntax = "proto3";
descriptor files.
Many syntax = "proto2";
descriptor files will
also work, as long as they don't use "proto2"
features, especially
groups,
which we do not support. We also do not support "proto2"
extensions.
We do not support services.
We do not support JSON Mapping.
In this version, we pass all 684 of the
Google conformance tests
of binary-wire-format proto3 for Protocol Buffers v28.2.
See the conformance/README.md
in this repository for details.
We also have our own unit tests, see test/README.md
in this repository.
The nix develop
environment provides
- The PureScript toolchain:
purs
,spago
, andnode
. - The
protoc
compiler. - The
protoc-gen-purescript
executable plugin forprotoc
on thePATH
so thatprotoc
can find it.
$ nix develop
PureScript Protobuf development environment
libprotoc 28.2
node v20.15.1
purs 0.15.15
spago 0.93.40
To build the protoc compiler plugin, run:
cd plugin
spago build
To compile PureScript .purs files from .proto files, run for example:
protoc --purescript_out=. google/protobuf/timestamp.proto
We can test out code generation immediately by
generating .purs
files for any of Google’s built-in “well-known types” in the
google.protobuf
package namespace. Try the command
protoc --purescript_out=. google/protobuf/any.proto
or
protoc --purescript_out=. google/protobuf/timestamp.proto
To see
all of the .proto
definitions
included with the Nix PureScript Protobuf installation including
the “well-known types,”
ls $(nix path-info .#protobuf)/src/google/protobuf/*.proto
If you don't want to use Nix,
- install the PureScript toolchain and
protoc
. - Build the PureScript plugin for
protoc
. - Run
protoc
with the path to the PureScript plugin executable, like for exampleprotoc --plugin=bin/protoc-gen-purescript --purescript_out=. google/protobuf/timestamp.proto
Suppose we have a Rectangle
message declared in a shapes.proto
descriptor file:
syntax = "proto3";
package interproc;
message Rectangle {
double width = 1;
double height = 2;
}
We run
protoc --purescript_out=. shapes.proto
which will generate a file shapes.Interproc.purs
.
The code generator will use the package
import statement in the .proto
file
and the base .proto
file name as the PureScript module name for that file.
The generated shapes.Interproc.purs
file
will export these four names from module Interproc.Shapes
.
-
A message data type.
newtype Rectangle = Rectangle { width :: Maybe Number, height :: Maybe Number }
The message data type will also include an
__unknown_fields
array field for holding received fields which were not in the descriptor.proto
file. We can ignore__unknown_fields
if we want to. -
A message maker which constructs a message from a
Record
with some message fields.mkRectangle :: forall r. Record r -> Rectangle
All message fields are optional, and can be omitted when making a message. There are some extra type constraints, not shown here, which will cause a compiler error if we try to add a field which is not in the message data type.
If we want the compiler to check that we’ve explicitly supplied all the fields, then we can use the ordinary message data type constructor
Rectangle
. -
A message serializer which works with arraybuffer-builder.
putRectangle :: forall m. MonadEffect m => Rectangle -> PutM m Unit
-
A message deserializer which works with parsing-dataview.
parseRectangle :: forall m. MonadEffect m => ByteLength -> ParserT DataView m Rectangle
The message parser needs an argument which tells it the length of the message which it’s about to parse, because “the Protocol Buffer wire format is not self-delimiting.”
In our program, our imports will look something like this.
The only module from this package which we will import into our program
will be the Protobuf.Library
module.
We'll import the message modules from the generated .purs
files.
We'll also import modules for reading and writing ArrayBuffer
s.
import Protobuf.Library (Bytes(..))
import Interproc.Shapes (Rectangle, mkRectangle, putRectangle, parseRectangle)
import Parsing (runParserT, ParseError, liftMaybe)
import Data.ArrayBuffer.Builder (execPutM)
import Data.ArrayBuffer.DataView (whole)
import Data.ArrayBuffer.ArrayBuffer (byteLength)
import Data.Tuple (Tuple)
import Data.Newtype (unwrap)
This is how we serialize a Rectangle
to an ArrayBuffer
.
We must be in a MonadEffect
.
do
arraybuffer <- execPutM $ putRectangle $ mkRectangle
{ width: Just 3.0
, height: Just 4.0
}
Next we’ll deserialize Rectangle
from the ArrayBuffer
that we just made.
result :: Either ParseError {width :: Number, height :: Number}
<- runParserT (whole arraybuffer) do
rectangle :: Rectangle <- parseRectangle (byteLength arraybuffer)
At this point we’ve consumed all of the parser input and constructed our
Rectangle
message, but we’re not finished parsing.
We want to “validate” the Rectangle
message to make sure it has all of the
fields that we require, because in
proto3, all fields are optional.
Fortunately we are already in the ParserT
monad,
so we can do better than to “validate”:
Parse, don't validate.
We will construct a record {width::Number, height::Number}
with the width and height of the Rectangle
. If the width or height
are missing from the Rectangle
message, then we will fail in the ParserT
monad.
For this validation step,
pattern matching
on the Rectangle
message type works well, so we could validate this way:
case rectangle of
Rectangle { width: Just width, height: Just height } ->
pure {width, height}
_ -> fail "Missing required width or height"
Or we might want to use liftMaybe
for more fine-grained validation:
width <- liftMaybe "Missing required width" (unwrap rectangle).width
height <- liftMaybe "Missing required height" (unwrap rectangle).height
pure {width, height}
And now the result
is either a parsing error or a fully validated rectangle.
The generated code modules will import modules from this package.
The generated code depends on packages which are all in the PureScript Registry.
If the runtime environment is Node.js, then it must be at least v11,
because that is the version in which
TextDecoder
and
TextEncoder
were added to Globals
.
All of the generated message types have instances of
Eq
,
Show
,
Generic
,
NewType
.
The protobuf repository contains three executable Node.js programs which use code generated by protobuf. Refer to these for further examples of how to use the generated code.
- The
protoc
compiler plugin. The code generator imports generated code. This program writes itself. - The unit test suite.
- The Google conformance test program.
The Protobuf Decoder Explainer shows an
example of how to use this library to parse binary Protobuf when we don’t
have access to the .proto
descriptor schema file and can’t generate
message-reading code.
This is how field presence works in our implementation.
A message field will always be Just
when the field is present on the wire.
A message field will always be Nothing
when the field is not present on the wire, even if
it’s a no presence field.
If we want to interpret a missing no presence field as a
default value then
we have the
Protobuf.Library.toDefault
function for that.
A no presence field will not be serialized on the wire when it is Nothing
, or Just
the
default value.
An explicit presence (optional
) field will not be serialized on the wire when it is Nothing
.
It will be serialized when it is Just
the default value.
When the parser encounters an invalid encoding in the Protobuf input stream then it will fail to parse.
When
runParserT
fails it will return a ParseError String (Position {index::Int,line::Int,column::Int})
.
The byte offset at which the parse failure occurred is given by the
index
.
The path to the Protobuf definition which failed to parse will be included
in the ParseError String
and delimited by '/'
, something
like "Message1 / string_field_1 / Invalid UTF8 encoding."
.
The Protobuf
import
statement allows Protobuf messages to have fields
consisting of Protobuf messages imported from another file, and qualified
by the package name in that file. In order to generate
the correct PureScript module name qualifier on the types of imported message
fields, the code generator must be able to lookup the package name
statement in the imported file.
For that reason, we can only use top-level
(not nested)
message
and enum
types from a Protobuf import
.
The generated PureScript code will usually have module imports which cause
the purs
compiler to emit ImplicitQualifiedImport
warnings. Sorry. If this causes
trouble then the imports can be fixed automatically in a precompiling pass
with the command-line tool
suggest.
Or you can censor the warnings.
The flake.nix
provides a package protoc-gen-purescript
so that we
can run the .proto
→ .purs
generation step as part of a Nix
derivation. Include protoc-gen-purescript
and protobuf
as nativeBuildInputs
.
Then protoc --purescript_out=path_to_output file.proto
will be runnable
in our derivation phases.
The flake.nix
provides the Google Protocol Buffers conformance tests
as an app
. To run the conformance tests right now
without installing or cloning
anything,
nix run github:xc-jp/purescript-protobuf#conformance
Pull requests welcome.
- justifill package may be useful for message construction.
- morello package may be useful for message validation.
- Third-Party Add-ons for Protocol Buffers Google’s list of Protocol Buffers language implementations.
- A vision for data interchange in Elm Comparison of JSON, ProtoBuf, GraphQL by Evan Czaplicki.