docling-documentenhancementgood first issue
Description
all work fine so fare
but image count like i mentiont and same(similar) for tables
yes you can convert tables in markdown directly but if you make a json <b<much better for embedding you have trouble to find the right place copy back to markdown.
i made out of the json tables in json format
tables_data = []
for table_ix, table in enumerate(conv_res.document.tables):
if not hasattr(table, 'export_to_dataframe'):
_log.warning(f"Table {table_ix} has no export method.")
continue
try:
# Assuming clean_text is defined globally or passed in.
# If missing, you must define it or use str(val).replace('\n', ' ')
table_df = table.export_to_dataframe()
num_rows = len(table_df)
if hasattr(table_df, 'columns'):
columns_list = list(table_df.columns)
records = [
{col: clean_text(str(val)) for col, val in row.items()}
for row in table_df.to_dict(orient="records")
]
table_info = {
"table_index": table_ix + 1,
"num_rows": num_rows,
"num_columns": len(columns_list),
"columns": columns_list,
"data": records,
}
tables_data.append(table_info)
except Exception as e:
_log.error(f"Error processing table {table_ix}: {e}")
# Now doc_filename is safe to use here
json_filename = output_dir / f"{doc_filename}-tables.json"
Is there an easier way to save the tables in JSON format together with the plain text?