pola-rs/polars
Auf GitHub ansehenAdd add_filename to pl.read_csv (and read operations others)
Open
#19.266 geöffnet am 16. Okt. 2024
enhancementgood first issue
Repository-Metriken
- Stars
- (38.496 Stars)
- PR-Merge-Metriken
- (Durchschn. Merge 3T 18h) (175 gemergte PRs in 30 T)
Beschreibung
Description
It is ocasionaly true that the filename of a data file is fairly critical information
Illustratively
Users/
Alice.csv
Bob.csv
Charlie.csv
When using glob patterns to read this data, the file name itself is lost - which all but forces the user to loop over the files and read them manually.
# This does not preserve what row is for what user
df = pl.read_csv('Users/*.csv')
# This is a bit long
df = (pl.concat([
pl.read_csv(file).with_columns(filename=pl.lit(file)
for file in glob('Users/*.csv')
])
)
A parameter to add a column with the specific file name when reading data via a glob pattern would be a nice to have.