docling-project/docling

Docling cannot obtain the images inserted into the text box of docx file

Open

#3,314 建立於 2026年4月16日

在 GitHub 查看
 (1 留言) (0 反應) (1 負責人)Python (59,751 star) (4,140 fork)batch import
bugdocxgood first issue

描述

Bug

Docling cannot obtain the images inserted into the text box of docx file

Steps to reproduce

  1. download the zip from https://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGR1_124b/Docs/R1-2602023.zip , unzip this zip file and then you can get
    a file named R1-2602023_Discussion on other aspects of CSI acquisition for 6GR.docx
  2. use docling to convert the docx to markdown file
            cmd = [
                "docling",
                "--from", "docx",
                "--to", "md",
                "--output", os.path.dirname(md_path),
                doc_path
            ]

            print(f"run command: {' '.join(cmd)}")
            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                check=True
            )
  1. I get the markdown, I have obtained the markdown file and carefully compared the contents of the docx file and the markdown file (I use typora to read markdown file).
  2. I found the error. The details are as follows: The display effect of the markdown file is as follows:
The display effect of the docx file is as follows:

Docling version

Docling version: 2.88.0 Docling Core version: 2.72.0 Docling IBM Models version: 3.13.0 Docling Parse version: 5.8.0 Python: cpython-312 (3.12.12) Platform: Linux-6.8.0-90-generic-x86_64-with-glibc2.39

Python version

Python 3.12.12

貢獻者指南