rapidsai/cudf

[BUG] Add support for `force_ascii=False` when writing to JSON with cuDF engine

Open

#15,211 建立於 2024年3月1日

在 GitHub 查看
 (4 留言) (1 反應) (0 負責人)C++ (6,000 star) (735 fork)batch import
PythonbugcuIOgood first issuelibcudf

描述

Describe the bug Ideally, we should eventually support engine="cudf" and force_ascii=False together with to_json. For now, we should update the documentation and/or provide a warning for users.

Steps/Code to reproduce bug

import cudf

df = cudf.DataFrame({"a": [1,2,3], "b": ["4","5","🌱"]})
df.to_json("test.jsonl", orient="records", lines=True, engine="cudf", force_ascii=False)

produces a TypeError: write_json() got an unexpected keyword argument 'force_ascii'.

I can do a df.to_json("test.jsonl", orient="records", lines=True, force_ascii=False) and see the emoji in the .jsonl file, and I can also do a df.to_json("test.jsonl", orient="records", lines=True, engine="cudf") and see the emoji represented as "\ud83c\udf31" in the .jsonl file. But I am unable to see the emoji represented as is in the file, while also writing with the cuDF engine.

Environment details I tested this with the latest cuDF version.

貢獻者指南