BenchmarksGood first issueInternalTeststatus: triagedtech debt
Description
Currently the validation benchmarks don't seem to have any golden tests ensuring that what scripts evaluate to doesn't change. Unlike e.g. nofib or bitwise benchmarks (and lists ones have some property tests). We should fix that, otherwise an apparent optimization might as well turn out to be a bug.
Same about marlowe?