You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Optimized the performance of complex CSS selectors, by adding a cost-based query planner. Evaluators are sorted by their relative execution cost, and executed in order of lower to higher cost. This speeds the matching process by ensuring that simpler evaluations (such as a tag name match) are conducted prior to more complex evaluations (such as an attribute regex, or a deep child scan with a :has).
Added support for <svg> and <math> tags (and their children). This includes tag namespaces and case preservation on applicable tags and attributes. #2008
When converting jsoup Documents to W3C Documents in W3CDom, HTML documents will be placed in the http://www.w3.org/1999/xhtml namespace by default, per the HTML5 spec. This can be controlled by setting W3CDom#namespaceAware(boolean false). #1848
Speed optimized the Structural Evaluators by memoizing previous evaluations. Particularly the ~ (any preceding sibling) and :nth-of-type selectors are improved. #1956
Tweaked the performance of the ElementnextElementSibling, previousElementSibling, firstElementSibling, lastElementSibling, firstElementChild, and `lastElementChild. They now inplace filter/skip in the child-node list, vs having to allocate and scan a complete Element filtered list.
Optimized internal methods that previously called Element.children() to use filter/skip child-node list accessors instead, reducing new Element List allocations.
Tweaked the performance of parsing :pseudo selectors.
When using the :empty pseudo-selector, blank textnodes are now considered empty. Previously, an element containing any whitespace was not considered empty. #1976
In forms, <input type="image"> should be excluded from Element.formData() (and hence from form submissions). #2010
In Safelist, made isSafeTag() and isSafeAttribute() public methods, for extensibility. #1780
Bug Fixes
Bugfix: form elements and empty elements (such as img) did not have their attributes de-duplicated. #1950
In Jsoup.connect(String url), URL paths containing a %2B were incorrectly recoded to a '+', or a '+' was recoded to a ' '. Fixed by reverting to the previous behavior of not encoding supplied paths, other than normalizing to ASCII. #1952
In Jsoup.connect(String url), strings containing supplemental characters (e.g. emoji) were not URL escaped correctly.
In Jsoup.connect(String url), the ConstrainableInputStream would clear Thread interrupts when reading the body. This precluded callers from spawning a thread, running a number of requests for a length of time, then joining that thread after interrupting it. #1991
When tracking HTML source positions, the closing tags for H1...H6 elements were not tracked correctly. #1987
When calling Element.cssSelector() on an extremely deeply nested element, a StackOverflowError could occur. Further, a StackOverflowError may occur when running the query. #2001
Appending a node back to its original Element after empty() would throw an Index out of bounds exception. Also, now the child nodes that were removed have their parent node cleared, fully detaching them from the original parent. #2013
In Connection when adding headers, the value may have been assumed to be an incorrectly decoded ISO_8859_1 string, and re-encoded as UTF-8. The value is now left as-is.
Changes
Removed previously deprecated methods Document.normalise(), Element.forEach(org.jsoup.helper.Consumer<>), Node.forEach(org.jsoup.helper.Consumer<>), and the org.jsoup.helper.Consumer interface; the latter being a previously required compatibility shim prior to Android's de-sugaring support.
The previous compatibility shim org.jsoup.UncheckedIOException is deprecated in favor of the now supported java.io.UncheckedIOException. If you are catching the former, modify your code to catch the latter instead. #1989
Blocked noscript tags from being added to Safelists, due to incompatibilities between parsers with and without script-mode enabled.