BurntSushi/xsv

feature: binned histograms

Open

#84 opened on Jun 13, 2017

View on GitHub
 (3 comments) (2 reactions) (0 assignees)Rust (9,730 stars) (315 forks)batch import
enhancementhelp wanted

Description

Histograms could be done in numerous ways. Here are some thoughts:

  • like most of xsv, should operate over huge tables with a single pass
  • .idx files could store metadata helpful for binning? eg, min and max values per column
  • "chose N evenly spaced numerical bins" seems to require more than one pass (or keeping all values in memory). Keeping a tree of round-sized bins and merging them when the tree gets too big would avoid that
  • logarithmic bins
  • power-of-two or power-of-1024 (eg, for file sizes)
  • binning of strings or decimal numbers by prefix
  • "other" / "NaN" / null bins
  • csv/tsv output by default, then a separate mode like xsv table to pretty print bars

Contributor guide