rapidsai/cudf

[FEA] Pass column indices as `index_col` in `read_csv`

Open

#15,127 创建于 2024年2月23日

在 GitHub 查看
 (7 评论) (0 反应) (0 负责人)C++ (6,000 star) (735 fork)batch import
0 - BacklogPythonfeature requestgood first issue

描述

If I want to use the index_col parameter to set certain columns as indices when reading a csv file, I cannot pass a list of column indices (like in pandas). I can pass a list of column labels though:

cudf.read_csv(filepath, index_col=[0])
KeyError: 'None of [0] are in the columns'

cudf.read_csv(filepath, index_col=['family'])

While this is not a huge issue, I imagine the following is a common scenario: You have know that the first 3 columns are index columns, but you don't exactly know how each are spelt ('date' vs 'Date' etc.). In this case, if passing a list of column indices were possible, index_col=[0,1,2] would have worked fine; otherwise, you will have to read the file without specifying index columns and set index later (or require trial and error to guess the column labels).

Is it possible for index_col to accept list of indices like in pandas?

贡献者指南

[FEA] Pass column indices as `index_col` in `read_csv` · rapidsai/cudf#15127 | Good First Issue