-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instead of the encoding 'iso-8859-1' is detected 'iso-8859-15' #77
Comments
In the Status Log, the following metrics are the same:
It corresponds to one language: UTF-unknown/src/Core/Probers/SBCSGroupProber.cs Lines 153 to 156 in d52af8d
Also, the same metrics are present in the log in other languages |
As I understand it, in this case it is easier to get the same statistics |
Can we come up with a workaround or will we have to do as in #80? |
It seems that in order to maintain the ability to further define encodings, we need to change the API so that a collection of objects is returned. Thus, we can return the same encodings |
So we could fix this with a breaking change? |
As far as I remember, the last thing I thought about it was to look at the compilation of coefficients for a more accurate detection... but it seems that this is not an easy task The proposed option, with the return of similar encodings, is only a possible workaround |
@rstm-sf could we fix this for 3.0? |
Hello!
Instead of the encoding 'iso-8859-1' is defined 'iso-8859-15'.
file iso-8859-1.txt from uchardet test
Status Log
Get confidence:
-- new match found: confidence 0.01, index 0, charset windows-1251.
-- new match found: confidence 0.05902827, index 6, charset iso-8859-7.
-- new match found: confidence 0.067115635, index 13, charset tis-620.
-- new match found: confidence 0.3858822, index 15, charset iso-8859-1.
-- new match found: confidence 0.40375984, index 18, charset iso-8859-1.
-- new match found: confidence 0.41295946, index 21, charset iso-8859-2.
-- new match found: confidence 0.42356956, index 23, charset iso-8859-1.
-- new match found: confidence 0.8360017, index 32, charset iso-8859-15.
Get confidence done.
SBCS Group Prober --------begin status
SBCS 0.01: [windows-1251]
SBCS: 0.01 [windows-1251]
SBCS 0.01: [koi8-r]
SBCS: 0.01 [koi8-r]
SBCS 0.01: [iso-8859-5]
SBCS: 0.01 [iso-8859-5]
SBCS 0.01: [x-mac-cyrillic]
SBCS: 0.01 [x-mac-cyrillic]
SBCS 0.01: [ibm866]
SBCS: 0.01 [ibm866]
SBCS 0.01: [ibm855]
SBCS: 0.01 [ibm855]
SBCS 0.05902827: [iso-8859-7]
SBCS: 0.05902827 [iso-8859-7]
SBCS 0.05902827: [windows-1253]
SBCS: 0.05902827 [windows-1253]
SBCS 0.01: [iso-8859-5]
SBCS: 0.01 [iso-8859-5]
SBCS 0: [windows-1251]
SBCS: 0.00 [windows-1251]
SBCS 0: [windows-1255]
HEB: 0 - 0 [Logical-Visual score]
SBCS 0: [windows-1255]
SBCS: 0.00 [windows-1255]
SBCS 0: [windows-1255]
SBCS: 0.00 [windows-1255]
SBCS 0.067115635: [tis-620]
SBCS: 0.06711563 [tis-620]
SBCS 0.067115635: [iso-8859-11]
SBCS: 0.06711563 [iso-8859-11]
SBCS 0.3858822: [iso-8859-1]
SBCS: 0.3858822 [iso-8859-1]
SBCS 0.3858822: [iso-8859-15]
SBCS: 0.3858822 [iso-8859-15]
SBCS 0.3858822: [windows-1252]
SBCS: 0.3858822 [windows-1252]
SBCS 0.40375984: [iso-8859-1]
SBCS: 0.4037598 [iso-8859-1]
SBCS 0.40375984: [iso-8859-15]
SBCS: 0.4037598 [iso-8859-15]
SBCS 0.40375984: [windows-1252]
SBCS: 0.4037598 [windows-1252]
SBCS 0.41295946: [iso-8859-2]
SBCS: 0.4129595 [iso-8859-2]
SBCS 0.41295946: [windows-1250]
SBCS: 0.4129595 [windows-1250]
SBCS 0.42356956: [iso-8859-1]
SBCS: 0.4235696 [iso-8859-1]
SBCS 0.42356956: [windows-1252]
SBCS: 0.4235696 [windows-1252]
SBCS 0.41898435: [iso-8859-3]
SBCS: 0.4189844 [iso-8859-3]
SBCS 0.38790238: [iso-8859-3]
SBCS: 0.3879024 [iso-8859-3]
SBCS 0.38790238: [iso-8859-9]
SBCS: 0.3879024 [iso-8859-9]
SBCS inactive: [iso-8859-6] (i.e. confidence is too low).
SBCS 0: [windows-1256]
SBCS: 0.00 [windows-1256]
SBCS 0.16577692: [viscii]
SBCS: 0.1657769 [viscii]
SBCS 0.18163893: [windows-1258]
SBCS: 0.1816389 [windows-1258]
SBCS 0.8360017: [iso-8859-15]
SBCS: 0.8360017 [iso-8859-15]
SBCS 0.8360017: [iso-8859-1]
SBCS: 0.8360017 [iso-8859-1]
SBCS 0.8360017: [windows-1252]
SBCS: 0.8360017 [windows-1252]
SBCS 0.43422332: [iso-8859-13]
SBCS: 0.4342233 [iso-8859-13]
SBCS 0.40545458: [iso-8859-10]
SBCS: 0.4054546 [iso-8859-10]
SBCS 0.40545458: [iso-8859-4]
SBCS: 0.4054546 [iso-8859-4]
SBCS 0.42485002: [iso-8859-13]
SBCS: 0.42485 [iso-8859-13]
SBCS 0.42485002: [iso-8859-10]
SBCS: 0.42485 [iso-8859-10]
SBCS 0.42485002: [iso-8859-4]
SBCS: 0.42485 [iso-8859-4]
SBCS 0.366608: [iso-8859-1]
SBCS: 0.366608 [iso-8859-1]
SBCS 0.366608: [iso-8859-9]
SBCS: 0.366608 [iso-8859-9]
SBCS 0.366608: [iso-8859-15]
SBCS: 0.366608 [iso-8859-15]
SBCS 0.366608: [windows-1252]
SBCS: 0.366608 [windows-1252]
SBCS 0.36032423: [iso-8859-3]
SBCS: 0.3603242 [iso-8859-3]
SBCS 0.3647504: [windows-1250]
SBCS: 0.3647504 [windows-1250]
SBCS 0.3647504: [iso-8859-2]
SBCS: 0.3647504 [iso-8859-2]
SBCS 0.42094523: [MAC-CENTRALEUROPE]
SBCS: 0.4209452 [MAC-CENTRALEUROPE]
SBCS 0.40236503: [ibm852]
SBCS: 0.402365 [ibm852]
SBCS 0.32631624: [windows-1250]
SBCS: 0.3263162 [windows-1250]
SBCS 0.32631624: [iso-8859-2]
SBCS: 0.3263162 [iso-8859-2]
SBCS 0.40557358: [MAC-CENTRALEUROPE]
SBCS: 0.4055736 [MAC-CENTRALEUROPE]
SBCS 0.36612508: [ibm852]
SBCS: 0.3661251 [ibm852]
SBCS 0.35397846: [windows-1250]
SBCS: 0.3539785 [windows-1250]
SBCS 0.35397846: [iso-8859-2]
SBCS: 0.3539785 [iso-8859-2]
SBCS 0.41416448: [iso-8859-13]
SBCS: 0.4141645 [iso-8859-13]
SBCS 0.33398414: [iso-8859-16]
SBCS: 0.3339841 [iso-8859-16]
SBCS 0.3964395: [MAC-CENTRALEUROPE]
SBCS: 0.3964395 [MAC-CENTRALEUROPE]
SBCS 0.43202174: [ibm852]
SBCS: 0.4320217 [ibm852]
SBCS 0.42139196: [iso-8859-1]
SBCS: 0.421392 [iso-8859-1]
SBCS 0.42139196: [iso-8859-4]
SBCS: 0.421392 [iso-8859-4]
SBCS 0.42139196: [iso-8859-9]
SBCS: 0.421392 [iso-8859-9]
SBCS 0.42139196: [iso-8859-13]
SBCS: 0.421392 [iso-8859-13]
SBCS 0.42139196: [iso-8859-15]
SBCS: 0.421392 [iso-8859-15]
SBCS 0.42139196: [windows-1252]
SBCS: 0.421392 [windows-1252]
SBCS 0.42121872: [iso-8859-1]
SBCS: 0.4212187 [iso-8859-1]
SBCS 0.42121872: [iso-8859-3]
SBCS: 0.4212187 [iso-8859-3]
SBCS 0.42121872: [iso-8859-9]
SBCS: 0.4212187 [iso-8859-9]
SBCS 0.42121872: [iso-8859-15]
SBCS: 0.4212187 [iso-8859-15]
SBCS 0.42121872: [windows-1252]
SBCS: 0.4212187 [windows-1252]
SBCS 0.36684126: [windows-1250]
SBCS: 0.3668413 [windows-1250]
SBCS 0.36684126: [iso-8859-2]
SBCS: 0.3668413 [iso-8859-2]
SBCS 0.40297794: [iso-8859-13]
SBCS: 0.4029779 [iso-8859-13]
SBCS 0.37994418: [iso-8859-16]
SBCS: 0.3799442 [iso-8859-16]
SBCS 0.40297794: [MAC-CENTRALEUROPE]
SBCS: 0.4029779 [MAC-CENTRALEUROPE]
SBCS 0.4339976: [ibm852]
SBCS: 0.4339976 [ibm852]
SBCS 0.42192674: [windows-1252]
SBCS: 0.4219267 [windows-1252]
SBCS 0.42192674: [windows-1257]
SBCS: 0.4219267 [windows-1257]
SBCS 0.42192674: [iso-8859-4]
SBCS: 0.4219267 [iso-8859-4]
SBCS 0.42192674: [iso-8859-13]
SBCS: 0.4219267 [iso-8859-13]
SBCS 0.42192674: [iso-8859-15]
SBCS: 0.4219267 [iso-8859-15]
SBCS 0.38324198: [iso-8859-1]
SBCS: 0.383242 [iso-8859-1]
SBCS 0.38324198: [iso-8859-9]
SBCS: 0.383242 [iso-8859-9]
SBCS 0.38324198: [iso-8859-15]
SBCS: 0.383242 [iso-8859-15]
SBCS 0.38324198: [windows-1252]
SBCS: 0.383242 [windows-1252]
SBCS 0.40346685: [windows-1250]
SBCS: 0.4034669 [windows-1250]
SBCS 0.40346685: [iso-8859-2]
SBCS: 0.4034669 [iso-8859-2]
SBCS 0.40346685: [iso-8859-16]
SBCS: 0.4034669 [iso-8859-16]
SBCS 0.4482638: [ibm852]
SBCS: 0.4482638 [ibm852]
SBCS 0.4214702: [windows-1250]
SBCS: 0.4214702 [windows-1250]
SBCS 0.4214702: [iso-8859-2]
SBCS: 0.4214702 [iso-8859-2]
SBCS 0.4214702: [iso-8859-16]
SBCS: 0.4214702 [iso-8859-16]
SBCS 0.4214702: [MAC-CENTRALEUROPE]
SBCS: 0.4214702 [MAC-CENTRALEUROPE]
SBCS 0.4533166: [ibm852]
SBCS: 0.4533166 [ibm852]
SBCS 0.60846615: [iso-8859-1]
SBCS: 0.6084661 [iso-8859-1]
SBCS 0.60846615: [iso-8859-4]
SBCS: 0.6084661 [iso-8859-4]
SBCS 0.60846615: [iso-8859-9]
SBCS: 0.6084661 [iso-8859-9]
SBCS 0.60846615: [iso-8859-15]
SBCS: 0.6084661 [iso-8859-15]
SBCS 0.60846615: [windows-1252]
SBCS: 0.6084661 [windows-1252]
SBCS Group found best match [iso-8859-15] confidence 0.8360017.
The text was updated successfully, but these errors were encountered: