danfickle/openhtmltopdf

RFC - Roadmap for version 1

Open

#170 opened on Jan 16, 2018

View on GitHub
 (20 comments) (10 reactions) (1 assignee)Java (1,597 stars) (305 forks)batch import
help wantedquestion

Description

These are my thought on the issues that need to be addressed before version 1 is released, in no particular order:

  • #161 - MathML support - COMPLETE
  • NO-ISSUE-YET, COMPLETE - Entity support such as nbsp for HTML, InvisibleTimes for MathML and SVG entities. This is tricky as there is no way to programatically inject entities using the Java XML parser. I propose that we add a doctype dynamically to the start of the XML input, with the desired entities. However this means we have to read XML input into a string, rather than just passing a file or input stream to the XML reader. The builder can be used to specify which entities to load. Used custom doc types and an external entity resolver instead. DOCUMENT.
  • #38 - Transforms. A few issues remaining to implement:
    • Link placing doesn't take account of transforms.
    • Translate is not implemented.
    • Some work for transformed boxes in page margins.
    • Testing. Do transforms of MathML, SVG and custom objects?
  • NO-ISSUE-YET - Logging / error handling overhaul - Currently error handling is ad-hoc. For example should we continue on a load failure or fatally throw? I propose to allow this to be configurable by allowing the user to hook logging on a per-run basis and halt on any log message (which will be changed to enum constants) with a poison exception.
  • #60 - CSS3 Columns - Currently implemented for text only. Need to debug to allow other box types in columns.
  • #126 - Overflowing pages - Currently content that goes past the right margin is cut off silently. This is mostly a problem with tables. I propose a CSS property that allows cut off content to be printed on the next page. DOCUMENT.
  • #204 - Multi run cache - Currently there is a multi-run cache hook method, but the objects stored may not be thread safe. This means it is unsuitable for many use-cases. Propose to remove all caches except font metrics cache.
  • NO-ISSUE-YET - Per run cache - Need to make sure nothing is being placed into a PDF document more than once. For example, is an img from the img tag and a background image from the same url embedded twice?
  • #83 - Unicode font justification fix - There is a fix in #143 but we are waiting for PDF-BOX 2.0.9 to implement it.
  • #123 - RTL table layout - Altering table layout to correct RTL scares me but there have been a couple of requests so should try.
  • NO-ISSUE-YET - Remove remnants of configuration class and move all config to builders. There are still some config values that are coming from various file locations.
  • #145 - Padding with percentages not working - It appears that it is resolving padding percentage values with a zero base value.
  • NO-ISSUE-YET - Make sure all dependencies are up to date. Do this after test system introduced.
  • #208 - Semi automatic testing. Propose some sort of semi-auto testing with image diff. This would allow you to run before and after changes to make sure nothing has been broken. Unfortunately, we can't have one-true-source of reference results as reportedly font-handling, etc can change slightly between JREs.
  • NO-ISSUE-YET - Java2D cleanup. Make sure all Java2D functionality is in the Java2D module and delete broken code samples and tools. Also make sure Java2D RTL works.
  • NO-ISSUE-YET - Documentation. Review and complete the template author's guide, integration guide, create comparison with other solutions such as Flying Saucer, headless-browsers, etc.
  • #180 - Performance and memory improvements - IN PROGRESS.
  • #143 - Other improvements from this pull-request.
  • NO-ISSUE-YET - Floating elements escape elements with overflow:hidden set.

Hopefully, most of the other open issues can wait for subsequent releases. NOTE: There will be several more release candidate version before version 1.

I'd appreciate feedback from anybody, especially @rototor. Any other issues that need to be addressed before version 1?

Contributor guide