rapidsai/cudf

[BUG] Add support for `force_ascii=False` when writing to JSON with cuDF engine

Open

#15,211 opened on 2024年3月1日

GitHub で見る
 (4 comments) (1 reaction) (0 assignees)C++ (6,000 stars) (735 forks)batch import
PythonbugcuIOgood first issuelibcudf

説明

Describe the bug Ideally, we should eventually support engine="cudf" and force_ascii=False together with to_json. For now, we should update the documentation and/or provide a warning for users.

Steps/Code to reproduce bug

import cudf

df = cudf.DataFrame({"a": [1,2,3], "b": ["4","5","🌱"]})
df.to_json("test.jsonl", orient="records", lines=True, engine="cudf", force_ascii=False)

produces a TypeError: write_json() got an unexpected keyword argument 'force_ascii'.

I can do a df.to_json("test.jsonl", orient="records", lines=True, force_ascii=False) and see the emoji in the .jsonl file, and I can also do a df.to_json("test.jsonl", orient="records", lines=True, engine="cudf") and see the emoji represented as "\ud83c\udf31" in the .jsonl file. But I am unable to see the emoji represented as is in the file, while also writing with the cuDF engine.

Environment details I tested this with the latest cuDF version.

コントリビューターガイド

[BUG] Add support for `force_ascii=False` when writing to JSON with cuDF engine · rapidsai/cudf#15211 | Good First Issue