docling-project/docling

Docling cannot obtain the images inserted into the text box of docx file

Open

#3314 opened on Apr 16, 2026

View on GitHub
 (1 comment) (0 reactions) (1 assignee)Python (59,751 stars) (4,140 forks)batch import
bugdocxgood first issue

Description

Bug

Docling cannot obtain the images inserted into the text box of docx file

Steps to reproduce

  1. download the zip from https://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGR1_124b/Docs/R1-2602023.zip , unzip this zip file and then you can get
    a file named R1-2602023_Discussion on other aspects of CSI acquisition for 6GR.docx
  2. use docling to convert the docx to markdown file
            cmd = [
                "docling",
                "--from", "docx",
                "--to", "md",
                "--output", os.path.dirname(md_path),
                doc_path
            ]

            print(f"run command: {' '.join(cmd)}")
            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                check=True
            )
  1. I get the markdown, I have obtained the markdown file and carefully compared the contents of the docx file and the markdown file (I use typora to read markdown file).
  2. I found the error. The details are as follows: The display effect of the markdown file is as follows:
The display effect of the docx file is as follows:

Docling version

Docling version: 2.88.0 Docling Core version: 2.72.0 Docling IBM Models version: 3.13.0 Docling Parse version: 5.8.0 Python: cpython-312 (3.12.12) Platform: Linux-6.8.0-90-generic-x86_64-with-glibc2.39

Python version

Python 3.12.12

Contributor guide