docling-project/docling

Add_node_items doesn't update caption reference

Open

#2298 opened on Sep 22, 2025

View on GitHub
 (4 comments) (0 reactions) (1 assignee)Python (59,751 stars) (4,140 forks)batch import
bugdocling-documentgood first issue

Description

Bug

DoclingDocument's add_node_items doesn't update caption reference.

Steps to reproduce

convert a document:

source = "https://arxiv.org/pdf/2408.09869"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)

create a new one, only from the first table:

new_document = DoclingDocument(name="Table0")

new_document.add_node_items(doc=document, 
                            parent=new_document.body, 
                            node_items=[document.tables[0]])

The caption's text reference remains '#/texts/72' instead of '#/texts/0'

  "tables": [
    {
      "self_ref": "#/tables/0",
      "parent": {
        "cref": "#/body"
      },
      "children": [
        {
          "cref": "#/texts/0"
        }
      ],
      "content_layer": "body",
      "label": "table",
      "prov": [
        {
          "page_no": 5,
          "bbox": {
            "l": 133.27708435058594,
            "t": 634.9401245117188,
            "r": 478.2610168457031,
            "b": 542.4001007080078,
            "coord_origin": "BOTTOMLEFT"
          },
          "charspan": [
            0,
            0
          ]
        }
      ],
      "captions": [
        {
          "cref": "#/texts/72"
        }
      ],

Docling version

2.53.0

Python version

3.12

Contributor guide