pola-rs/polars

Add add_filename to pl.read_csv (and read operations others)

Open

#19.266 geöffnet am 16. Okt. 2024

Auf GitHub ansehen
 (6 Kommentare) (3 Reaktionen) (0 zugewiesene Personen)Rust (2.826 Forks)batch import
enhancementgood first issue

Repository-Metriken

Stars
 (38.496 Stars)
PR-Merge-Metriken
 (Durchschn. Merge 3T 18h) (175 gemergte PRs in 30 T)

Beschreibung

Description

It is ocasionaly true that the filename of a data file is fairly critical information

Illustratively

Users/
   Alice.csv
   Bob.csv
   Charlie.csv

When using glob patterns to read this data, the file name itself is lost - which all but forces the user to loop over the files and read them manually.


# This does not preserve what row is for what user
df = pl.read_csv('Users/*.csv') 

 # This is a bit long
df = (pl.concat([
                pl.read_csv(file).with_columns(filename=pl.lit(file) 
                for file in glob('Users/*.csv')
            ])
        )

A parameter to add a column with the specific file name when reading data via a glob pattern would be a nice to have.

Contributor Guide