sparklemotion/nokogiri

Update parse options to reflect modern libxml usage

Open

#3439 opened on Feb 19, 2025

View on GitHub
 (2 comments) (0 reactions) (1 assignee)Ruby (5,615 stars) (806 forks)batch import
help wantedtopic/documentationupstream/libxml2

Description

Context

In #3360 the libxml2 maintainer left some suggestions about how we're exposing and documenting some of the parse options.

He mentioned:

  • DTDATTR and DTDVALID imply DTDLOAD and are unsafe as well.
  • SAX1 should probably not be exposed.
  • NODICT should probably not be exposed.
  • XINCLUDE, NOXINCNODE and NOBASEFIX are only used by the XML Reader and XInclude API.
  • HUGE is safe these days (since 2.10)

and some forward-looking statements about the upcoming 2.14 release:

  • UNZIP: Enable decompression. This option has no real effect for now. The plan is that users who really need decompression start to add the option. At a later point, it will be required to enable decompression.
  • NO_SYS_CATALOG: Don't use system catalogs when resolving DTDs or entities.
  • CATALOG_PI: Enable oasis-xml-catalog PIs. This is a really obscure feature that should have never been enabled by default. I don't think your users need it.

Actions

I think the actions I'd like to take re: documentation:

  • Make the following bits :nodoc:: SAX1, NODICT
  • Update documentation for DTDATTR and DTDVALID to imply DTDLOAD and include safety warnings
    • And double-check that these are all off by default
  • Update documentation for the XINCLUDE set to specify they're only used by Reader and Node#process_xincludes

And the functional action I'd like to take:

  • Add HUGE to all the default bitsets if the libxml2 version is >= 2.10.0

I'd like to wait until the UNZIP bit is useful before adding it. We don't expose the catalog bits, so nothing to do there.

Contributor guide