splitCsv handles commas incorrectly when cells contain commas within quoted strings
#4,874 opened on Apr 2, 2024
Description
Bug report
I have a csv with cells that contain paired-end fastq globs formatted for use as inputs to fromFilePairs(). This means the globs contain commas (i.e. "data{1,2}.fq.gz"). When using commas as the csv separator, the editor (libreoffice) wraps the fastq strings in quotation marks. When loading this file with splitCsv(), the commas enclosed within the quotes used by libreoffice are treated as cell delimiters rather than as part of the cell contents.
Expected behavior and actual behavior
Given a demo.csv file with the following plaintext contents:
path,ref
"file{1,2}.fq.gz",hg38
I expect that running the following program:
workflow { channel.fromPath("demo.csv") | splitCsv(header: true) | view }
Will produce the output:
[path:file{1,2}.fq.gz", ref:hg38]
It actually outputs:
[path:"file{1, ref:2}.fq.gz"]
Changing the file format to tsv and calling splitCsv with sep: '\t' solves the problem, but I'd like to be able to support csvs as well.
Environment
- Nextflow version: 23.10.1 build 5891
- Java version: openjdk 20.0.2-internal 2023-07-18
- Operating system: Linux
- Bash version: GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)