swiftlang/swift

[SR-8905] Gaps in String benchmarking

Open

#51,411 opened on Oct 3, 2018

View on GitHub
 (10 comments) (0 reactions) (0 assignees)Swift (69,989 stars) (10,719 forks)batch import
benchmarksgood first issuestandard library

Description

Previous ID SR-8905
Radar None
Original Reporter @milseman
Type Task
Votes 1
Component/s Standard Library
Labels Task, Benchmark, StarterBug
Assignee None
Priority Medium

md5: 310808e3ff76ad0c9651e7a3550b6d27

Sub-Tasks:

  • SR-9226 Breadcrumb benchmarks
  • SR-10855 Benchmark non-ASCII with AngryPhonebook

Issue Description:

While inspecting our benchmarking story, there are many micro-benchmark gaps that we should fill. This bug holds a listing of such gaps, and is a start task. Anyone interested in covering a gap should create a new bug for it, assign to themselves, and apply the fix. I can review the PR or provide further guidance.

  • Some String RangeReplaceableCollection operations. We're missing benchmarking for:

    • insert<C: Collection>(_: C)

      • Arguments of types String, Substring, Array<Character>, Repeated<Character>, etc
    • See benchmark/single-source/RemoveWhere.swift for an example of some RRC operations.

  • Grapheme breaking

    • We have many benchmarks present, but they're disabled (for bad historical reasons). We should re-enable them.

    • Also, we have unicode-scalar breaking variants that are enabled, however this is a highly redundant suite as unicode-scalar-breaking is far more uniform. The list of workloads for unicode-scalar breaking should be pruned to: ascii, russian, chinese, and emoji, and renamed to not imply Character iteration

    • We don't benchmark grapheme-breaking on bridged NSStrings strings. Historically this hasn't exhibited much perf difference, but it's a bind spot currently.

    • We also count via iteration, but in theory, String.count could be made faster than raw iteration. We should pick one workload to just run String.count on.

    • See benchmark/single-source/StringWalk.swift.gyb

  • Substring-based comparison/hashing and benchmark unification

    • We have some for Substring without a very diverse payload in Substring.swift, and diverse payloads only for String in StringComparison.swift.

    • We should merge these two benchmarks together, getting the diversity of StringComparison.swift and the same-buffer-but-different-pointer variants for Substring.swift, applied to Substrings as well as Strings.

    • Similarly, for Strings and Substrings from bridged NSStrings, though we might prune the datasets to reduce a combinatorial explosion in number of benchmarks

    • See benchmark/single-source/Substring.swift and benchmark/single-source/StringComparison.swift

  • Transcoding chunks of data from one encoding to another

    • Each encoding in Unicode.Encoding has a transcode<>() method, which we can benchmark.

    • UTF8 -> UTF16 is likely to be an increasingly important one for the future.

    • See benchmark/single-source/UTF8Decode.swift for some inspiration

  • Case conversion: String.lowercased() and String.uppercased()

    • ASCII and non-ASCII bridged NSStrings

    • See benchmarks/single-source/AngryPhonebook.swift for some guidance

Completed

Wait for approval:

Contributor guide