-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange lines in eng.tagged corpus #20
Comments
I'm guessing the analyser didn't have If you want to handle the apertium stream format, you should expect to see this kind of thing all the time. You could use
|
(speaking of, we should probably get apertium-cleanstream into https://github.com/apertium/apertium/ ) |
I should open an issue there, shouldn't I? |
> (speaking of, we should probably get apertium-cleanstream into https://github.com/apertium/apertium/ )
I should open an issue there, shouldn't I?
That'd be nice :)
|
I am currently using the
texts/eng.tagged
file for testing the new weighting algorithms.While using the file, I noticed that it has some lines with just a single double quotation character!
(Example: https://github.com/apertium/apertium-eng/blob/master/texts/eng.tagged#L823)
Should these lines be fixed?
I don't want to handle it in my script if it's a bug in the tagged corpus and I believe fixing these lines is just a simple find and replace command that any text editor can do easily.
The text was updated successfully, but these errors were encountered: