docling-project/docling
View on GitHubDocling cannot obtain the images inserted into the text box of docx file
Open
#3314 opened on Apr 16, 2026
bugdocxgood first issue
Description
Bug
Docling cannot obtain the images inserted into the text box of docx file
Steps to reproduce
- download the zip from
https://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGR1_124b/Docs/R1-2602023.zip, unzip this zip file and then you can get
a file namedR1-2602023_Discussion on other aspects of CSI acquisition for 6GR.docx - use docling to convert the docx to markdown file
cmd = [
"docling",
"--from", "docx",
"--to", "md",
"--output", os.path.dirname(md_path),
doc_path
]
print(f"run command: {' '.join(cmd)}")
result = subprocess.run(
cmd,
capture_output=True,
text=True,
check=True
)
- I get the markdown, I have obtained the markdown file and carefully compared the contents of the docx file and the markdown file (I use typora to read markdown file).
- I found the error. The details are as follows: The display effect of the markdown file is as follows:
The display effect of the docx file is as follows:
Docling version
Docling version: 2.88.0 Docling Core version: 2.72.0 Docling IBM Models version: 3.13.0 Docling Parse version: 5.8.0 Python: cpython-312 (3.12.12) Platform: Linux-6.8.0-90-generic-x86_64-with-glibc2.39
Python version
Python 3.12.12