pola-rs/polars

Add add_filename to pl.read_csv (and read operations others)

Open

#19,266 opened on Oct 16, 2024

View on GitHub
 (6 comments) (3 reactions) (0 assignees)Rust (38,496 stars) (2,826 forks)batch import
enhancementgood first issue

Description

Description

It is ocasionaly true that the filename of a data file is fairly critical information

Illustratively

Users/
   Alice.csv
   Bob.csv
   Charlie.csv

When using glob patterns to read this data, the file name itself is lost - which all but forces the user to loop over the files and read them manually.


# This does not preserve what row is for what user
df = pl.read_csv('Users/*.csv') 

 # This is a bit long
df = (pl.concat([
                pl.read_csv(file).with_columns(filename=pl.lit(file) 
                for file in glob('Users/*.csv')
            ])
        )

A parameter to add a column with the specific file name when reading data via a glob pattern would be a nice to have.

Contributor guide

Add add_filename to pl.read_csv (and read operations others) · pola-rs/polars#19266 | Good First Issue