BurntSushi/xsv

feature: binned histograms

Open

#84 建立於 2017年6月13日

在 GitHub 查看
 (3 留言) (2 反應) (0 負責人)Rust (9,730 star) (315 fork)batch import
enhancementhelp wanted

描述

Histograms could be done in numerous ways. Here are some thoughts:

  • like most of xsv, should operate over huge tables with a single pass
  • .idx files could store metadata helpful for binning? eg, min and max values per column
  • "chose N evenly spaced numerical bins" seems to require more than one pass (or keeping all values in memory). Keeping a tree of round-sized bins and merging them when the tree gets too big would avoid that
  • logarithmic bins
  • power-of-two or power-of-1024 (eg, for file sizes)
  • binning of strings or decimal numbers by prefix
  • "other" / "NaN" / null bins
  • csv/tsv output by default, then a separate mode like xsv table to pretty print bars

貢獻者指南