docling-project/docling

When converting a docx document, inexplicable blank images appear, and a line of text disappears.

Open

#3315 opened on Apr 16, 2026

View on GitHub
 (3 comments) (0 reactions) (0 assignees)Python (59,751 stars) (4,140 forks)batch import
bugdocxgood first issue

Description

Bug

When converting a docx document, inexplicable blank images appear, and a line of text disappears.

Steps to reproduce

  1. download the zip from https://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGR1_124b/Docs/R1-2601816.zip , unzip this zip file and then you can get
    a file named R1-2601816 Discussion on other aspects of CSI acquisition and report for 6GR.docx. download the zip from https://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGR1_124b/Docs/R1-2601793.zip , unzip this zip file and then you can get
    a file named R1-2601793.docx.
  2. use docling to convert the docx to markdown file
            cmd = [
                "docling",
                "--from", "docx",
                "--to", "md",
                "--output", os.path.dirname(md_path),
                doc_path
            ]

            print(f"run command: {' '.join(cmd)}")
            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                check=True
            )
  1. I get the markdown, I have obtained the markdown file and carefully compared the contents of the docx file and the markdown file (I use typora to read markdown file).
  2. I found the error. The details are as follows:

"Agenda Item : 10.5.3.3" disappear "3GPP TSG RAN WG1 Meeting #124bis R1-2601793" dosappear inexplicable blank images appear

Docling version

Docling version: 2.88.0 Docling Core version: 2.72.0 Docling IBM Models version: 3.13.0 Docling Parse version: 5.8.0 Python: cpython-312 (3.12.12) Platform: Linux-6.8.0-90-generic-x86_64-with-glibc2.39

Python version

Python 3.12.12

Contributor guide