-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong identification for windows-1252 #42
Comments
It seems the uchardet has an improved detection, so it must be imported into this project. |
Do you mean the Mozilla Universal Charset Detector.? Too bad this is a refactor of a port, reporting could be a lot of work |
Possibly, but it's almost unusable in my case. And there is no port of uchardet, which is now freedesktop's. |
Would you accept some new languages ported from the uchardet project? |
Now, v2.3.0
From Status Log:
Status LogGet confidence: SBCS 0.014343185: [koi8-r] SBCS 0: [iso-8859-5] SBCS 0.020249203: [x-mac-cyrillic] SBCS 0: [ibm866] SBCS 0.00659974: [ibm855] SBCS 0.026152553: [iso-8859-7] SBCS 0.026152553: [windows-1253] SBCS 0: [iso-8859-5] SBCS 0.0031166344: [windows-1251] SBCS 0: [windows-1255] SBCS 0.04902641: [windows-1255] SBCS 0.050912045: [windows-1255] SBCS 0.013214358: [tis-620] SBCS 0.013214358: [iso-8859-11] SBCS 0.093243085: [iso-8859-1] SBCS 0.093243085: [iso-8859-15] SBCS 0.093243085: [windows-1252] SBCS 0.09324489: [iso-8859-1] SBCS 0.09324489: [iso-8859-15] SBCS 0.09324489: [windows-1252] SBCS 0.14311144: [iso-8859-2] SBCS 0.14311144: [windows-1250] SBCS 0.12198714: [iso-8859-1] SBCS 0.12198714: [windows-1252] SBCS 0.09350189: [iso-8859-3] SBCS 0.14065312: [iso-8859-3] SBCS 0.14065312: [iso-8859-9] SBCS inactive: [iso-8859-6] (i.e. confidence is too low). SBCS 0.084189065: [viscii] SBCS 0.057199046: [windows-1258] SBCS 0.1763441: [iso-8859-15] SBCS 0.1763441: [iso-8859-1] SBCS 0.1763441: [windows-1252] SBCS 0.09554723: [iso-8859-13] SBCS 0.09554723: [iso-8859-10] SBCS 0.09554723: [iso-8859-4] SBCS 0.09578463: [iso-8859-13] SBCS 0.09578463: [iso-8859-10] SBCS 0.09578463: [iso-8859-4] SBCS 0.09340608: [iso-8859-1] SBCS 0.09340608: [iso-8859-9] SBCS 0.09340608: [iso-8859-15] SBCS 0.09340608: [windows-1252] SBCS 0.24882: [iso-8859-3] SBCS 0.095001444: [windows-1250] SBCS 0.095001444: [iso-8859-2] SBCS 0.13669409: [x-mac-ce] SBCS 0.1854423: [ibm852] SBCS 0.081335865: [windows-1250] SBCS 0.081335865: [iso-8859-2] SBCS 0.13743466: [x-mac-ce] SBCS 0.1760888: [ibm852] SBCS 0.13817607: [windows-1250] SBCS 0.13817607: [iso-8859-2] SBCS 0.13817607: [iso-8859-13] SBCS 0.12337148: [iso-8859-16] SBCS 0.21631232: [x-mac-ce] SBCS 0.3023013: [ibm852] SBCS 0.47388184: [iso-8859-1] SBCS 0.47388184: [iso-8859-4] SBCS 0.47388184: [iso-8859-9] SBCS 0.47388184: [iso-8859-13] SBCS 0.47388184: [iso-8859-15] SBCS 0.47388184: [windows-1252] SBCS 0.13686267: [iso-8859-1] SBCS 0.13686267: [iso-8859-3] SBCS 0.13686267: [iso-8859-9] SBCS 0.13686267: [iso-8859-15] SBCS 0.13686267: [windows-1252] SBCS 0.08758995: [windows-1250] SBCS 0.08758995: [iso-8859-2] SBCS 0.08758995: [iso-8859-13] SBCS 0.08798097: [iso-8859-16] SBCS 0.12607843: [x-mac-ce] SBCS 0.16955028: [ibm852] SBCS 0.37495747: [windows-1252] SBCS 0.37495747: [windows-1257] SBCS 0.37495747: [iso-8859-4] SBCS 0.37495747: [iso-8859-13] SBCS 0.37495747: [iso-8859-15] SBCS 0.093210384: [iso-8859-1] SBCS 0.093210384: [iso-8859-9] SBCS 0.093210384: [iso-8859-15] SBCS 0.093210384: [windows-1252] SBCS 0.09317723: [windows-1250] SBCS 0.09317723: [iso-8859-2] SBCS 0.09317723: [iso-8859-16] SBCS 0.18036576: [ibm852] SBCS 0.09312218: [windows-1250] SBCS 0.09312218: [iso-8859-2] SBCS 0.09312218: [iso-8859-16] SBCS 0.13316554: [x-mac-ce] SBCS 0.18025918: [ibm852] SBCS 0.23395953: [iso-8859-1] SBCS 0.23395953: [iso-8859-4] SBCS 0.23395953: [iso-8859-9] SBCS 0.23395953: [iso-8859-15] SBCS 0.23395953: [windows-1252] SBCS Group found best match [iso-8859-1] confidence 0.47388184. This is consistent with the Finnish model: UTF-unknown/src/Core/Probers/SBCSGroupProber.cs Lines 197 to 203 in d52af8d
Now the problem is the same as in #77 |
Hello, I try to identify the encoding of a file that should be windows-1252, but it finds a better match for windows-1255.
my.txt
It contains, for instance, C5, which is Å, but the file is identified as windows-1255, which does not contain it at all.
The text was updated successfully, but these errors were encountered: