v1.5.0 Release
To download and unpack prebuilt binaries:
$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.5.0/tsv-utils-v1.5.0_linux-x86_64_ldc2.tar.gz | tar xz
$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.5.0/tsv-utils-v1.5.0_osx-x86_64_ldc2.tar.gz | tar xz
Installation instructions are in the ReleasePackageReadme.txt
file in the release package.
To be notified of new releases:
GitHub supports notification of new releases. Click the "Watch" button on the repository page and select "Releases Only".
Release 1.5.0 Changes:
-
Prebuilt binaries have been updated to use the latest LDC compiler (1.20.0).
-
tsv-filter
: Field list support (PR #259).Field list provide a compact way to specify multiple fields for a command. Most tsv-utils tools already support field lists, now
tsv-filter
does as well. Examples:$ # Select lines where fields 1-10 are not empty. $ tsv-filter --not-empty 1-10 data.tsv $ # Select lines where fields 1-5 and 17 are less than 100 $ tsv-filter --lt 1-5,17:100 data.tsv
-
tsv-filter
: New field length tests based on either characters or bytes (PR #258).The new operators allow filtering on field length. Field length can be measured in either characters or bytes. (Characters can occupy multiple bytes in UTF-8). Examples:
$ # Keep only lines where field 3 is less than 50 characters $ tsv-filter --char-len-lt 3:50 data.tsv $ # Find lines where field 5 is more than 20 bytes $ tsv-filter --byte-len-gt 5:20
Character length tests have names of the form:
--char-len-eq|ne|lt|le|gt|ge]
. Byte length tests have names of the form:--byte-len-[eq|ne|lt|le|gt|ge]
. -
tsv-filter
: Improved error messages when invalid regular expressions are used.The error message printed by
tsv-filter
now includes the error text provided by the D regular expression engine. This is helpful when trying to debug complex regular expressions. Examples:$ # Old error message (tsv-filter 1.4.4) $ tsv-filter --regex 4:'abc(d|e' data.tsv [tsv-filter] Error processing command line arguments: Invalid values in option: '--regex 4:abc(d|e'. Expected: '--regex <field>:<val>' where <field> is a number and <val> is a regular expression. $ # New error message (tsv-filter 1.5.0) [tsv-filter] Error processing command line arguments: Invalid regular expression: '--regex 4:abc(d|e'. no matching ')' Pattern with error: `abc(d|e` <--HERE-- `` Expected: '--regex <field>:<val>' or '--regex <field-list>:<val>' where <val> is a regular expression.
The formatting of the message can be improved and is likely to be updated in the future.
-
tsv-uniq
: Performance improvements (PRs #234, #235).Better memory management and other changes improved
tsv-uniq
performance by 5-35% depending on the operation. -
tsv-sample
: Performance improvements reading large data blocks from standard input (PR #238).Sampling and shuffling operations requiring that all data be read into memory were unnecessarily slow when large amounts of data was read from standard input. Performance issues were noticed with data sizes larger than 10 GB. This is now fixed.
-
Sample bash scripts included in release package (PR #254).
Sample versions of the
tsv-sort
andtsv-sort-fast
scripts described on the Tips and Tricks page are now included in the repository and in prebuilt binary packages.