2.0.0 Beta 1

Added Grid-based Data Extraction and Corpus Querying

This update extends the analytical capabilities of the application, allowing for automated and background extraction of structured data from documents, improving efficiency and scalability.

We've added a couple models on the backend:

Extract: Represents a headless, background annotation task linked to a Corpus and Fieldset.
Fieldset: Defines a reusable set of fields for Extracts, linked to Columns.
Column: Represents a discrete data structure to extract from a document, with various properties like query, match_text, output_type, and more.
Datacell: Represents extracted data for each column and document, storing data as JSON.
LanguageModel: Represents a language model to be used in the extraction process.

Improved Test Suite

LlamaIndex is being tested with vcr.py so we actually have realistic tests and mocks for corpus query and corpus extract tasks
Added a lot of graphql query and endpoint tests

New GUI Elements

There is now an extract tab and a number of GUI elements to make it easy to construct an extract grid made up of documents, corpora and re-usable columns.
Within the Corpus view, there is a query tab you can use to ask questions of the corpus

What's Changed

Add Data Extraction by @JSv4 in #117

Full Changelog: v1.3.0...v2.0.0b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0 b1 - Add Data Extract and Corpus Querying

2.0.0 Beta 1

We've added a couple models on the backend:

Improved Test Suite

New GUI Elements

What's Changed

Contributors