A thin GraphQL wrapper around spacy
Python 3.6+
An example of a basic Starlette app using Spacy and Graphene.
The main goal is to be able to use the amazing power of spacy from other languages and retrieving only the information you need thanks to the GraphQL query definition.
The GraphQL schema tries to mimic as much as possible the original Spacy API with classes Doc, Span and Token
A simple batch processing with pagination of results is also implemented
- Setup the dev environment and install the dependencies
./scripts/install
- Activate the virtualenv
. venv/bin/activate
- From the virtualenv, download your favorite spacy models
python -m spacy download en
- From the virtualenv
pytest
- From the virtualenv
python -m app.main
- Kotlin : see gracyql-kotlin
Navigate to http://localhost:8990 in your browser to access the GraphiQL console to start making queries. Or http://localhost:8990/schema to introspect the GraphQL schema
fragment PosTagger on Token {
id
start
end
pos
lemma
}
query PosTaggerQuery {
nlp(model: "en") {
doc(text: "How are you Bob? What time is it in London?") {
text
tokens {
...PosTagger
}
}
}
}
fragment PosTagger on Token {
id
start
end
pos
lemma
}
query PosTaggerWihtSentencesQuery {
nlp(model: "en") {
doc(text: "How are you Bob? What time is it in London?") {
text
sents {
start
end
text
tokens {
...PosTagger
}
}
}
}
}
query ParserQuery {
nlp(model: "en") {
doc(text: "How are you Bob? What time is it in London?") {
text
tokens {
id
start
end
pos
lemma
dep
children {
id
dep
}
}
}
}
}
query NERQuery {
nlp(model: "en") {
doc(text: "How are you Bob? What time is it in London?") {
text
ents {
start
end
label
text
}
}
}
}
query ParserDisabledQuery {
nlp(model: "en", disable: ["parser", "ner"]) {
doc(text: "I live in Grenoble, France") {
text
tokens {
id
pos
lemma
dep
}
ents {
start
end
label
}
}
}
}
query ModelMetaQuery {
nlp(model: "en") {
meta {
author
description
lang
license
name
pipeline
sources
spacy_version
version
}
}
}
query MultidocsQuery {
nlp(model: "en") {
batch(texts: [
"Hello world1!",
"Hello world2!",
"Hello world3!",
"Hello world4!",
"Hello world5!",
"Hello world6!",
"Hello world7!",
"Hello world8!",
"Hello world9!",
"Hello world10!"]) {
docs {
text
}
}
}
}
- the list of texts to process
- batch_size : the size of the batch to achieve multi threading speedups with spaCy nlp.pipe
- next : the number of documents to retrieve as result of the query (next < batch_size of course)
query BatchMultidocsQuery {
nlp(model: "en") {
batch(texts: [
"Hello world1!",
"Hello world2!",
"Hello world3!",
"Hello world4!",
"Hello world5!",
"Hello world6!",
"Hello world7!",
"Hello world8!",
"Hello world9!",
"Hello world10!"],
batch_size : 10, next : 2) {
batch_id
docs {
text
}
}
}
}
The result contains a batch_id UUID that will be used in subsequent calls
"data": {
"nlp": {
"batch": {
"batch_id": "5654106e-62a7-4847-80e6-7ba3d0ec7b6a",
"docs": [
{
"text": "Hello world1!"
},
{
"text": "Hello world2!"
}
]
}
}
},
"errors": null
}
- batch_id : the UUID referencing the previous batch
- next : the number of documents to retrieve as result of the query
query BatchMultidocsQuery {
nlp(model: "en") {
batch(batch_id: "5654106e-62a7-4847-80e6-7ba3d0ec7b6a",
next : 2) {
batch_id
docs {
text
}
}
}
}
The result contains the next 2 documents
{
"data": {
"nlp": {
"batch": {
"batch_id": "5654106e-62a7-4847-80e6-7ba3d0ec7b6a",
"docs": [
{
"text": "Hello world3!"
},
{
"text": "Hello world4!"
}
]
}
}
},
"errors": null
}
And you can issue the same query again and again until the batch is exhausted