-
Notifications
You must be signed in to change notification settings - Fork 1
/
Changes
335 lines (277 loc) · 10.4 KB
/
Changes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
0.59 2024-11-14
- UsePath::Iterator::Rule instead of Mojo::File.
0.58 2024-09-11
- Remove Directory::Iterator and replace it with Mojo::File.
- Add performance hint.
0.57 2024-07-26
- Support award notes in i5.
- Add support for idno (with @rend) in i5.
- Add support for ISBN in i5.
- Translator is now indexed as Text in i5, when
K2K_TRANSLATOR_TEXT is set as an environment
variable.
0.56 2024-06-05
- Add support für corpusexplorer.
0.55 2024-06-04
- Add support for xenodata to i5.
0.54 2024-05-02
- Fix 'cache' parameter. (reported by kupietz)
- Fix cache deletion for certain scenarios.
- Improve information on the number of jobs
running in parallel.
- Add support for KoKoKom <u> attributes.
0.53 2024-03-20
- Added Spacy support. (kupietz)
- Support 'pos' as an alternative to 'ctag'
in Treetagger. (kupietz)
- Change default certainty value in TreeTagger
to 1.
0.52 2024-01-23
- Introduced 'quiet' flag.
0.51 2023-12-23
- Support ICC meta.
- Fix date handling for years of length < 2.
- Improve emoji detection (rebecca).
- Upgrade minimum perl version required.
0.50 2023-02-13
- Fix 'temporary-extract' configuration
information.
0.49 2023-02-12
- Support for UDPipe POS, lemma and dependency
annotations (kupietz).
- Remove last bit of Sys::Info dependency.
(fixes #9)
0.48 2022-11-15
- Improve support for text siglen including
underscore in corpus parts.
- Split morphological features in NKJP.
0.47 2022-08-08
- Support for preferred language transformation.
- Support for NKJP taxonomies.
- Support for NKJP 'orig' values.
0.46 2022-07-21
- Support NKJP Meta, Morpho and NamedEntities.
0.45 2022-03-04
- Due to problems installing Archive::Tar::Builder
in certain environments, this is now optional,
with a pure perl fallback archiver.
- Support externalLink and internalLink universally in
i5 meta data.
0.44 2022-02-17
- Improve Gingko Metadata support.
- Fix data-URIs by always refering to UTF-8.
- Warn on wrong token order.
- Improve Gingko Metadata support.
- Updated all dependencies.
0.43 2022-01-17
- Fix temporary extract handling when defined
in a config file.
- Improve handling of invalid certainty values
in TreeTagger.
- Add log slimming function.
0.42 2021-10-11
- Replaced Log4perl with Log::Any.
- Ignore level < 0 structures in DeReKo, but support
them for base annotations.
- Define resources in Makefile.
- Add GitHub action for CI.
- Remove MANIFEST file from repo.
- Introduce Gingko support.
- Fix data URIs to always encode percentage-wise.
0.41 2020-08-10
- Added support for RWK annotations.
- Improved DGD support.
- Fixed bug in RWK support that broke on
some KorAP-XML files.
- Separate "real data" test suite from artificial
tests to prepare for CPAN release.
- Optimizations and cleanup based on profiling.
- Remove MultiTerm->add() in favor of
MultiTerm->add_by_term().
- Optimization by reducing calls to _offset().
- Introduced add_span() method to MultiTermToken.
- Removed deprecated 'primary' flag.
- Removed deprecated 'pretty' flag.
- Fix RWK paragraph handling.
- Updated 'Clone' dependency in Makefile.
- Make Sys::Info optional.
- Fixes a bug in XIP::Dependency and added
dependency checks for all annotation libraries.
0.40 2020-03-03
- Fixed XIP parser.
- Added example corpus of the
Redewiedergabe-Korpus.
- Fixed span offset bug.
- Fixed milestones behind the last
token bug.
- Fixed gap behind last token bug.
- Fixed <base/s:t> length.
0.39 2020-02-19
- Added Talismane support.
- Added "distributor" field to I5 metadata.
- Added DGD link field to I5 metadata.
- Improve logging.
- Added support for DGD pseudo-sentences
based on anchor milestones.
- Added brief explanation of the format.
- Fixed parsing of editionStmt.
- Added documentation for supported I5 metadata
fields.
- Added integrated benchmark mechanism.
0.38 2019-05-22
- Stop file processing when base tokenization
is wrong.
- Added DGD support.
0.37 2019-03-06
- Support for 'koral:field' array.
- Support for Koral versioning.
- Added tests for english sources.
- Added support for external links for
Wikipedia resources.
- Ignore temporary extraction
on directory archiving.
- Remove extract_text and extract_doc in
favor of extract_sigle for archives.
0.36 2019-01-22
- Support for non-word tokens (fixes #5).
0.35 2018-09-24
- Lift minimum version of Perl to 5.16 as for
"fc"-feature.
0.34 2018-07-19
- Preliminary support for HNC.
0.33 2018-02-01
- Added LWC support.
- Fixed TreeTagger certainties.
0.32 2017-10-24
- Fixed tar building process in script.
- Support file extensions in base tokenization parameter.
0.31 2017-06-30
- Fixed exit codes in script.
- Use CORE::fc for case folding.
0.30 2017-06-19
- Fixed permission handling in test suite.
- Added preliminary CMC support.
0.29 2017-04-23
- support --to-tar flag.
0.28 2017-04-12
- Improved overwriting behaviour for unzip.
- Introduced --sequential-extraction flag.
0.27 2017-04-10
- Support configuration files.
- Support temporary extraction.
- Support serial conversion.
- Support input-base.
0.26 2017-04-06
- Support wildcards on input.
0.25 2017-03-14
- Updated to Mojolicious 7.20
- Fixed meta treatment in case analytic and monogr
are available
- Added DRuKoLa support to script
- Liberated document and text sigle handling to be
compliant with CoRoLa.
- Added support for pagebreak annotations.
- Renamed "pages" to "srcPages".
- Fixed handling of prefixes for text sigles.
- Support for MarMoT.
- Fix case insensitivity.
- Added preliminary support for diacritic insensitivity.
0.24 2016-12-21
- Added --base-sentences and --base-paragraphs options
0.23 2016-11-03
- Added wildcard support for document extraction
- Fixed archive iteration to not duplicate the first archive
- Added parallel extraction for document sigles
- Improved return value for existing files
- Don't warn on recursion in CoreNLP/Constituency
0.22 2016-10-26
- Added support for document extraction
- Fixed archive naming
0.21 2016-10-24
- Improved Windows support
0.20 2016-10-15
- Fixed treatment of temporary folders in script
0.19 2016-08-17
- Added test for direct I5 support.
- Fixed support for Mojolicious 7.
- Added script test.
- Fixed setting multiple annotations in
script.
- Fixed output of version and help messages.
- Added script test for extraction.
- Fixed extraction with multiple archives and prefix
negation support.
- Added script test for archives.
0.18 2016-07-08
- Added REI test.
- Added multiple archive support to korapxml2krill.
- Added support for prefix negation in korapxml2krill.
- Added support for Malt#Dependency.
- Improved test suite for caching and REI.
- Added support for MDParser annotation.
- Added batch processing class for documents.
0.17 2016-03-22
- Rewrite siglen to use slashes as separators.
- Zip listing optimized. Does no longer work with primary data
in text.xml files.
0.16 2016-03-18
- Added caching mechanism for
metadata.
0.15 2016-03-17
- Modularized metadata handling.
- Simplified metadata handling.
- Added --meta option to script.
- Removed deprecated --human option from script.
0.14 2016-03-15
- Renamed ::Index to ::Annotate and ::Field to ::Index.
- Renamed 'allow' to 'anno' as parameters of the script.
- Added readme.
0.13 2016-03-10
- Removed korapxml2krill_dir.
- Renamed dependency nodes.
- Made dependency relations more effective (trimmed down TUIs)
! This is currently very slow !
0.12 2016-02-28
- Added extract method to korapxml2krill.
- Fixed Mate/Dependency.
- Fixed skip flag in korapxml2krill.
- Ignore spans outside the token range
(i.e. character offsets end before tokens have started).
0.11 2016-02-23
- Merged korapxml2krill and korapxml2krill_dir.
0.10 2016-02-15
- Added EXPERIMENTAL support for parallel jobs.
0.09 2016-02-15
- Fixed temporary directory handling in scripts.
- Improved skipping for archive handling in scripts.
0.08 2016-02-14
- Added support for archive streaming.
- Improved scripts.
0.07 2016-02-13
- Improved support for Schreibgebrauch meta data
(IDS flavour).
0.06 2016-02-11
- Improved support for Schreibgebrauch meta data
(Duden flavour).
0.05 2016-02-04
- Changed KorAP::Document to KorAP::XML::Krill.
- Renamed "Schreibgebrauch" to "Sgbr".
- Preparation for GitHub release.
0.04 2016-01-28
- Added PTI to all payloads.
- Added support for empty elements.
- Added support for element attributes in struct.
- Added meta data support for Schreibgebrauch.
- Fixed test suite for meta data.
0.03 2014-11-03
- Added new metadata scheme.
- Fixed a minor bug in the constituency tree building.
- Sorted terms in tokens a priori.
0.02 2014-07-21
- Sentence annotations for all providing foundries
- Starting subtokenization
0.01 2014-04-15
- [bugfix] for first token annotations
- Sentences are now available from all foundries that have it
- <>:p is now <>:base/para
- Added <>:base/text