Handle non-ascii characters in url #193

kolesar-andras · 2019-11-11T04:40:52Z

Zombie driver fails when url contains "high bytes", non-ascii characters. The following example contains a valid Hungarian with accented characters.

https://hu.wikipedia.org/wiki/Műemlék

Desktop browsers and Mink Goutte driver translate the high bytes correctly:

https://hu.wikipedia.org/wiki/M%C5%B1eml%C3%A9k

Zombie driver sends string as-is to javascript, then bytes above 0x7f go wrong somewhere in Zombie:

https://hu.wikipedia.org/wiki/Mqeml\xe9k

It's a bit strange how characters are truncated:

Characters that don't exist in ISO-8859-1 encoding are represented with regular letters, for example q, damage is irreversible.

Example shows that desktop browsers translate non-asci characters to percent-encoded bytes using their UTF-8 character codes:

That's correct, web servers expect urls in this way.

The text was updated successfully, but these errors were encountered:

kolesar-andras linked a pull request Nov 11, 2019 that will close this issue

Handle non-ascii characters in url #194

Open

Provide feedback