xml.etree.ElementTree can't read some XMLs #1300

NValerij · 2016-06-17T15:39:04Z

Hello.
There are 3 issues with this file.
Code to reproduce:

import xml.etree.ElementTree as ET
ET.parse('test.xml')

BOM is not recognized: xmllib.Error: Syntax error at line 1: illegal data at start of file.
OK, I can workaround it with ET.parse(codecs.open(r'D:\NLC\LexicalSpanAnnotator\TestData\test.xml', 'r', encoding = 'utf-8'))
Symbol with code 8233 brakes parsing: xmllib.Error: Syntax error at line 3: illegal character in content.
I also can do workaround it (load text and replace this symbol with   mnemonic, but it is not a good idea in general).
There are no empty line in the end and very strange message about it: xmllib.Error: Syntax error at line 4: data not in content

I've checked this file with ElementTree parser from Python 3.4 (sorry, no 2.7 installed) and with msxml-parser. Both have done this task OK.

The text was updated successfully, but these errors were encountered:

slide · 2016-07-29T05:26:42Z

The reason this happens is because we don't have pyexpat implemented.

kunom · 2016-08-12T06:21:54Z

To be a bit more detailed: The ElementTree implementation was patched to use xmllib instead of pyexpat as underlying XML parser. xmllib has been deprecated with Python2.0, but it is a pure Python implementation, which makes integration into IronPython much easier.

See also the checkin comment of commit cb73948.

kunom · 2016-08-16T07:15:00Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xml.etree.ElementTree can't read some XMLs #1300

xml.etree.ElementTree can't read some XMLs #1300

NValerij commented Jun 17, 2016

slide commented Jul 29, 2016

kunom commented Aug 12, 2016

kunom commented Aug 16, 2016

xml.etree.ElementTree can't read some XMLs #1300

xml.etree.ElementTree can't read some XMLs #1300

Comments

NValerij commented Jun 17, 2016

slide commented Jul 29, 2016

kunom commented Aug 12, 2016

kunom commented Aug 16, 2016