-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to identify escaped/unescaped html entity in the text nodes #2206
Comments
Tried different methods in TextNode to get the original input text content, but did not worked. Example:
Expected (in any one of the method):
|
Can you explain the value here of this suggestion? What's a real example where this would be helpful? |
With this bug we'll not be able to differentiate whether
|
Sure, but I am not clear on what you are actually trying to achieve. What feature are you trying to build that would utilize functionality like this? jsoup parses HTML. That's fundamentally what it does. If we didn't decode, we wouldn't be parsing. Escape modes control the output, not the input. Input HTML always decodes using the full set. |
Not able to identify whether the input document has
&
or&
in the text node, since Jsoup escapes the character in text node. Same goes to other entities like<
/<
.This does not provide any control to the Jsoup users where they can take any action based on input. For example; If we want to remove
<
character in text node but preserve when given as entity<
Note: Please let me know if there is already a way to differentiate this.
Providing an option where I could input Jsoup to not modify the text node will be super helpful. This provides more flexibility and control to the customers.
@jhy
The text was updated successfully, but these errors were encountered: