pola-rs/polars

Add add_filename to pl.read_csv (and read operations others)

Open

#19.266 aperta il 16 ott 2024

Vedi su GitHub
 (6 commenti) (3 reazioni) (0 assegnatari)Rust (2826 fork)batch import
enhancementgood first issue

Metriche repository

Star
 (38.496 star)
Metriche merge PR
 (Merge medio 3g 18h) (175 PR mergiate in 30 g)

Descrizione

Description

It is ocasionaly true that the filename of a data file is fairly critical information

Illustratively

Users/
   Alice.csv
   Bob.csv
   Charlie.csv

When using glob patterns to read this data, the file name itself is lost - which all but forces the user to loop over the files and read them manually.


# This does not preserve what row is for what user
df = pl.read_csv('Users/*.csv') 

 # This is a bit long
df = (pl.concat([
                pl.read_csv(file).with_columns(filename=pl.lit(file) 
                for file in glob('Users/*.csv')
            ])
        )

A parameter to add a column with the specific file name when reading data via a glob pattern would be a nice to have.

Guida contributor