pola-rs/polars

Add add_filename to pl.read_csv (and read operations others)

Open

#19 266 ouverte le 16 oct. 2024

Voir sur GitHub
 (6 commentaires) (3 réactions) (0 assignés)Rust (2 826 forks)batch import
enhancementgood first issue

Métriques du dépôt

Stars
 (38 496 stars)
Métriques de merge PR
 (Merge moyen 3j 18h) (175 PRs mergées en 30 j)

Description

Description

It is ocasionaly true that the filename of a data file is fairly critical information

Illustratively

Users/
   Alice.csv
   Bob.csv
   Charlie.csv

When using glob patterns to read this data, the file name itself is lost - which all but forces the user to loop over the files and read them manually.


# This does not preserve what row is for what user
df = pl.read_csv('Users/*.csv') 

 # This is a bit long
df = (pl.concat([
                pl.read_csv(file).with_columns(filename=pl.lit(file) 
                for file in glob('Users/*.csv')
            ])
        )

A parameter to add a column with the specific file name when reading data via a glob pattern would be a nice to have.

Guide contributeur