pola-rs/polars

Add add_filename to pl.read_csv (and read operations others)

Open

#19.266 aberto em 16 de out. de 2024

Ver no GitHub
 (6 comments) (3 reactions) (0 assignees)Rust (2.826 forks)batch import
enhancementgood first issue

Métricas do repositório

Stars
 (38.496 stars)
Métricas de merge de PR
 (Mesclagem média 3d 18h) (175 fundiu PRs em 30d)

Description

Description

It is ocasionaly true that the filename of a data file is fairly critical information

Illustratively

Users/
   Alice.csv
   Bob.csv
   Charlie.csv

When using glob patterns to read this data, the file name itself is lost - which all but forces the user to loop over the files and read them manually.


# This does not preserve what row is for what user
df = pl.read_csv('Users/*.csv') 

 # This is a bit long
df = (pl.concat([
                pl.read_csv(file).with_columns(filename=pl.lit(file) 
                for file in glob('Users/*.csv')
            ])
        )

A parameter to add a column with the specific file name when reading data via a glob pattern would be a nice to have.

Guia do colaborador