maana-nlp-spacy-wrapper

A thin GraphQL wrapper around spacy

Requirements

Python 3.6+

Description

An example of a basic Starlette app using Spacy and Graphene.

The main goal is to be able to use the amazing power of spacy from other languages and retrieving only the information you need thanks to the GraphQL query definition.

The GraphQL schema tries to mimic as much as possible the original Spacy API with classes Doc, Span and Token

A simple batch processing with pagination of results is also implemented

Doc

Span

Token

Setup

Setup the dev environment and install the dependencies

./scripts/install

Activate the virtualenv

. venv/bin/activate

From the virtualenv, download your favorite spacy models

python -m spacy download en

Tests

From the virtualenv

pytest

Running

From the virtualenv

python -m app.main

Clients

Kotlin : see gracyql-kotlin

GraphQL queries

Navigate to http://localhost:8990 in your browser to access the GraphiQL console to start making queries. Or http://localhost:8990/schema to introspect the GraphQL schema

Simple POS TaggerQuery:

fragment PosTagger on Token {
  id
  start
  end
  pos
  lemma
}

query PosTaggerQuery {
  nlp(model: "en") {
    doc(text: "How are you Bob? What time is it in London?") {
      text
      tokens {
        ...PosTagger
      }
    }
  }
}

Simple POS TaggerQuery including sentence level:

fragment PosTagger on Token {
  id
  start
  end
  pos
  lemma
}

query PosTaggerWihtSentencesQuery {
  nlp(model: "en") {
    doc(text: "How are you Bob? What time is it in London?") {
      text
      sents {
        start
        end
        text
        tokens {
          ...PosTagger
        }
      }
    }
  }
}

Simple Dependency Parser Query

query ParserQuery {
  nlp(model: "en") {
    doc(text: "How are you Bob? What time is it in London?") {
      text
      tokens {
        id
        start
        end
        pos
        lemma
        dep
        children {
          id
          dep
        }
      }
    }
  }
}

Simple NER Query

query NERQuery {
  nlp(model: "en") {
    doc(text: "How are you Bob? What time is it in London?") {
      text
      ents {
        start
        end
        label
        text
      }
    }
  }
}

Query with some pipes disabled

query ParserDisabledQuery {
  nlp(model: "en", disable: ["parser", "ner"]) {
    doc(text: "I live in Grenoble, France") {
      text
      tokens {
        id
        pos
        lemma
        dep
      }
      ents {
        start
        end
        label
      }
    }
  }
}

Model metadata Query

query ModelMetaQuery {
  nlp(model: "en") {
    meta {
      author
      description
      lang
      license
      name
      pipeline
      sources
      spacy_version
      version
    }
  }
}

Multi documents Query

query MultidocsQuery {
  nlp(model: "en") {
    batch(texts: [
      "Hello world1!",
      "Hello world2!",
      "Hello world3!",
      "Hello world4!",
      "Hello world5!",
      "Hello world6!",
      "Hello world7!",
      "Hello world8!",
      "Hello world9!",
      "Hello world10!"]) {
      docs {
        text
      }
    }
  }
}

Batch multi documents Query

First call must have

the list of texts to process
batch_size : the size of the batch to achieve multi threading speedups with spaCy nlp.pipe
next : the number of documents to retrieve as result of the query (next < batch_size of course)

query BatchMultidocsQuery {
  nlp(model: "en") {
    batch(texts: [
      "Hello world1!",
      "Hello world2!",
      "Hello world3!",
      "Hello world4!",
      "Hello world5!",
      "Hello world6!",
      "Hello world7!",
      "Hello world8!",
      "Hello world9!",
      "Hello world10!"],
    batch_size : 10, next : 2) {
      batch_id
      docs {
        text
      }
    }
  }
}

The result contains a batch_id UUID that will be used in subsequent calls

  "data": {
    "nlp": {
      "batch": {
        "batch_id": "5654106e-62a7-4847-80e6-7ba3d0ec7b6a",
        "docs": [
          {
            "text": "Hello world1!"
          },
          {
            "text": "Hello world2!"
          }
        ]
      }
    }
  },
  "errors": null
}

Subsequent calls must have

batch_id : the UUID referencing the previous batch
next : the number of documents to retrieve as result of the query

query BatchMultidocsQuery {
  nlp(model: "en") {
    batch(batch_id: "5654106e-62a7-4847-80e6-7ba3d0ec7b6a",
      next : 2) {
      batch_id
      docs {
        text
      }
    }
  }
}

The result contains the next 2 documents

{
  "data": {
    "nlp": {
      "batch": {
        "batch_id": "5654106e-62a7-4847-80e6-7ba3d0ec7b6a",
        "docs": [
          {
            "text": "Hello world3!"
          },
          {
            "text": "Hello world4!"
          }
        ]
      }
    }
  },
  "errors": null
}

And you can issue the same query again and again until the batch is exhausted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

maana-nlp-spacy-wrapper

Requirements

Description

Doc

Span

Token

Setup

Tests

Running

Clients

GraphQL queries

Simple POS TaggerQuery:

Simple POS TaggerQuery including sentence level:

Simple Dependency Parser Query

Simple NER Query

Query with some pipes disabled

Model metadata Query

Multi documents Query

Batch multi documents Query

First call must have

Subsequent calls must have

Files

README.md

Latest commit

History

README.md

File metadata and controls

maana-nlp-spacy-wrapper

Requirements

Description

Doc

Span

Token

Setup

Tests

Running

Clients

GraphQL queries

Simple POS TaggerQuery:

Simple POS TaggerQuery including sentence level:

Simple Dependency Parser Query

Simple NER Query

Query with some pipes disabled

Model metadata Query

Multi documents Query

Batch multi documents Query

First call must have

Subsequent calls must have