Fulltext search service with JSON API implemented on top of SphinxSearch - for indexing Jekyll websites and blog posts.
Initial deploy on http://search.klokantech.com/
Indexing for projects:
- Kartenportal.CH (blog or pages)
Deployed via docker, system variable DOMAINS defines list of allowed domains (or URL prefixes) for indexing.
Always downloading and indexing the file:
http://[domain]/search.tsv
Example: http://www.kartenportal.ch/search.tsv
tsvpipe
has tab character as hardcoded delimiter and has no quoting rules.
Each value is interpreted as string inside sphinxsearch, nevertheless of quotes. Using tab character inside text values is not possible!
TSV format with fixed columns without header line:
url - only stored, not indexed
title - boosted rank fulltext
content - fulltext
type - filter
lang - filter
date - filter, in ISO 8601 format: YYYY-MM-DDTHH:MM:SS+HH:MM, required
tags - filter on a set + fulltext; comma-separated
custom_data - only stored, not indexed, no filter
product - filter on a set + fulltext; comma-separated, optional (can be omitted)
All in tab separated value. Web must provide correct TSV (no tabs in the content).
Note: The date
column is required, because this component filter via date_end
by default of the actual time. This allows to create data content in the future (for an example the prepared article, which will be published in the future) without searching in them.
Endpoint for update of the fulltext index:
POST /update/{domain}
It downloads http://[domain]/search.tsv and creates index for this domain.
GET /search?domain={domain}&q={q}&type=post&lang=en&date=?????&tags=a,b,c
Paging via OpenSearch query parameters (count
, startIndex
)
JSONP or JSON with CORS
{
"count": 20,
"nextIndex": 20,
"startIndex": 0,
"totalResults": 31,
"results": [
{
"lang": "en",
"tags": "<tags>",
"url": "<url>",
"title": "<title>",
"rank": 31548,
"content": "xxx",
"date": "2016-05-19T11:06:41+02:00",
"date_filter": 1463648801.0,
"type": "<type>",
"custom_data": "xxx",
"id": 21
},
]
}
Related links: https://developers.google.com/custom-search/json-api/v1/overview#data_format http://www.opensearch.org/Community/JSON_Formats