rapidsai/cudf

[FEA] Pass column indices as `index_col` in `read_csv`

Open

#15,127 opened on 2024年2月23日

GitHub で見る
 (7 comments) (0 reactions) (0 assignees)C++ (6,000 stars) (735 forks)batch import
0 - BacklogPythonfeature requestgood first issue

説明

If I want to use the index_col parameter to set certain columns as indices when reading a csv file, I cannot pass a list of column indices (like in pandas). I can pass a list of column labels though:

cudf.read_csv(filepath, index_col=[0])
KeyError: 'None of [0] are in the columns'

cudf.read_csv(filepath, index_col=['family'])

While this is not a huge issue, I imagine the following is a common scenario: You have know that the first 3 columns are index columns, but you don't exactly know how each are spelt ('date' vs 'Date' etc.). In this case, if passing a list of column indices were possible, index_col=[0,1,2] would have worked fine; otherwise, you will have to read the file without specifying index columns and set index later (or require trial and error to guess the column labels).

Is it possible for index_col to accept list of indices like in pandas?

コントリビューターガイド