dataframegood first issueioneeds info
説明
With #3284, we can now read ORC files into dask dataframes. It would be good/interesting to benchmark this implementation and see if there are any easy gains we're missing (this was never done). This would ideally be done at two levels:
- Pandas/Arrow (are we getting the bandwidth we'd expect from the ORC c++ reader)
- Dask (are we getting parallelism, is our overhead low)
We'd need to use some other system (spark, hive, etc...) to generate test files, as no python writer exists.