Klogproc

Klogproc is a service for processing and archiving logs generated by applications run by the Institute of the Czech National Corpus (CNC).

In general, Klogproc reads continuously an application-specific log record format from a file, parses individual lines and converts them into a target format which is then stored to ElasticSearch database.

In the CNC, Klogproc replaced LogStash as a less resource-hungry alternative. All the processing (reading, writing, handling multiple files) is performed concurrently which makes it quite fast.

Overview

Supported applications

Name	config code	versions	scripting	note
Akalex	akalex	❌	❌	a Shiny app with a custom log (:asterisk:)
APIGuard	apiguard	❌	❌	CNC's internal API proxy and watchdog
Calc	calc	❌	❌	a Shiny app with a custom log (:asterisk:)
CNC-VLO	vlo	❌	❌	a custom CNC node for the Clarin VLO (JSONL log)
Gramatikat	gramatikat	❌	❌	a Shiny app with a custom log (:asterisk:)
KonText	kontext	`0.13`, `0.14`, `0.15`, `0.16`, `0.17`, `0.18`	✅
KorpusDB	korpus-db	❌	❌
Kwords	kwords	`1`, `2`	✅
Lists	lists	❌	❌	a Shiny app with a custom log (:asterisk:)
Mapka	mapka	`1`, `2`, `3`	✅ (v3)	using Nginx/Apache access log
Morfio	morfio	❌	❌
MQuery-SRU	mquery-sru	❌	❌	a Clarin FCS endpoint (JSONL log)
QuitaUP	quita-up	❌	❌	a Shiny app with a custom log (:asterisk:)
SkE	ske	❌	❌	using Nginx/Apache access log
SyD	syd	❌	❌	a custom app log
Treq	treq	current, `v1-api`	✅	a custom app log
WaG	wag	`0.6`, `0.7`	✅	web access log, currently without user credentials

(:asterisk:) All the Shiny apps use the same log fromat.

The program can work in two modes - batch and tail

Batch - ad-hoc processing of a directory or a file

For non-regular imports e.g. when migrating older data or when debugging a log processing routines, batch mode allows importing of multiple files from a single directory. The contents of the directory can be even changed over time by adding newer log records and klogproc will be able to import only new items as it keeps a worklog with the newest record currently processed.

Tail - listening for changes in multiple files

This is the mode which replaces CNC's LogStash solution and it is a typical mode of use. One or more log file listeners can be configured to read newly added lines. The log files are checked in regular intervals (i.e. the change is not detected immediately). Klogproc remembers current inode and current seek position for watched files so it should be able to continue after outages etc. (as long as the log files are not overwritten in the meantime due to log rotation).

Installation

Install Go language if it is not already available on your system.

Clone the klogproc project:

git clone https://klogproc.git

Build the project:

make

Copy the binary somewhere:

sudo cp klogproc /opt/klogproc/bin

Create a config file (e.g. in /opt/klogproc/etc/klogproc.json):

{
  "logging": {
    "path": "/opt/klogproc/var/log/klogproc.log"
  },
  "logTail": {
    "intervalSecs": 15,
    "worklogDir": "/opt/klogproc/var/worklog-tail",
    "files": [
      {"path": "/var/log/ucnk/syd.log", "appType": "syd"},
      {"path": "/var/log/treq/treq.log", "appType": "treq"},
      {"path": "/var/log/ucnk/morfio.log", "appType": "morfio"},
      {"path": "/var/log/ucnk/kwords.log", "appType": "kwords", "tzShift": -120}
      {"path": "/var/log/wag/current.log", "appType": "wag", "version": "0.7"}
    ]
  },
  "elasticSearch": {
    "majorVersion": 6,
    "server": "http://elastic:9200",
    "index": "app",
    "pushChunkSize": 500,
    "scrollTtl": "3m",
    "reqTimeoutSecs": 10
  },
  "geoIPDbPath": "/opt/klogproc/var/GeoLite2-City.mmdb",
  "anonymousUsers": [0, 1, 2]
}

Notes:

Do not forget to create directory for logging, worklog and also download and save GeoLite2-City database.
The applied tzShift for the kwords app is just an example; it should be applied iff the stored datetime values provide incorrect time-zone (e.g. if it looks like UTC time but the actual values reprezent local time) - see the section Time-zone notes for more info.

Configure systemd (/etc/systemd/system/klogproc.service):

[Unit]
Description=A custom agent for collecting UCNK apps logs
After=network.target

[Service]
Type=simple
ExecStart=/opt/klogproc/bin/klogproc tail /opt/klogproc/etc/klogproc.json
User=klogproc
Group=klogproc

[Install]
WantedBy=multi-user.target

Reload systemd config:

systemctl daemon-reload

Start the service:

systemctl start klogproc

Time-zone notes

Klogproc treats each log type individually when parsing but it converts all the timestamps to UTC. In case there is an application storing incorrect values (e.g. missing timezone info even if the time values are actually non-UTC), it is possible to use tzShift setting which defines number of minutes klogproc should add/remove to/from the logged values.

For the tail action, the config is as follows:

{
  "logTail": {
    "intervalSecs": 5,
    "worklogDir": "/path/to/tail-worklog",
    "numErrorsAlarm": 0,
    "errCountTimeRangeSecs": 15,
    "files": [
        {
          "path": "/path/to/application.log",
          "appType": "korpus-db",
          "tzShift": 120
        }
    ]
  }
}

For the batch mode, the config is like this:

{
  "logFiles": {
    "appType": "korpus-db",
    "worklogDir": "/path/to/batch-worklog",
    "srcPath": "/path/to/log/files/dir",
    "tzShift": 120,
    "partiallyMatchingFiles": false
  }
}

Note: partiallyMatchingFiles set to true will allow processing files which are partially older than requested minimum datetime (but still - only the matching records will be accepted)

ElasticSearch compatibility notes

Because ElasticSearch underwent some backward incompatible changes between versions 5 and 6, the configuration contains the majorVersion key which specifies how Klogproc stores the data.

ElasticSearch 5

This version supports multiple data types ("mappings") per index which was also the default approach how CNC applications were stored - single index, multiple document types (one per application). In this case, the configuration directive elasticSearch.index specifies directly the index name Klogproc works with. Individual document types can be distinguished either via ES internal _type property or via normal property type which is created by Klogproc.

ElasticSearch 6

In ES6, multiple data mappings per index has been removed. Klogproc in this case uses its elasticSearch.index key as a prefix for index name created for an individual application. E.g. index log_archive with configured treq and morfio apps expects you to have two indices: log_archive_treq and log_archive_morfio. Please note that Klogproc does not create the indices for you. The property type is still present in documents for backward compatibility.

Customizing log processing with Lua scripts

See the docs/scripting.md page.

Name		Name	Last commit message	Last commit date
Latest commit History 606 Commits
analysis		analysis
clustering		clustering
common		common
config		config
docs		docs
fsop		fsop
load		load
logbuffer		logbuffer
notifications		notifications
save		save
scripting		scripting
scripts		scripts
servicelog		servicelog
testdata/logs		testdata/logs
trfactory		trfactory
users		users
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
batchAction.go		batchAction.go
bots.default.json		bots.default.json
conf-sample.json		conf-sample.json
go.mod		go.mod
go.sum		go.sum
help.go		help.go
klogproc.go		klogproc.go
proclogs.go		proclogs.go
stubgen.go		stubgen.go
tailAction.go		tailAction.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Klogproc

Overview

Supported applications

Batch - ad-hoc processing of a directory or a file

Tail - listening for changes in multiple files

Installation

Time-zone notes

ElasticSearch compatibility notes

ElasticSearch 5

ElasticSearch 6

Customizing log processing with Lua scripts

About

Releases

Packages

Contributors 3

Languages

License

czcorpus/klogproc

Folders and files

Latest commit

History

Repository files navigation

Klogproc

Overview

Supported applications

Batch - ad-hoc processing of a directory or a file

Tail - listening for changes in multiple files

Installation

Time-zone notes

ElasticSearch compatibility notes

ElasticSearch 5

ElasticSearch 6

Customizing log processing with Lua scripts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages