Description
| Previous ID | SR-8905 |
| Radar | None |
| Original Reporter | @milseman |
| Type | Task |
| Votes | 1 |
| Component/s | Standard Library |
| Labels | Task, Benchmark, StarterBug |
| Assignee | None |
| Priority | Medium |
md5: 310808e3ff76ad0c9651e7a3550b6d27
Sub-Tasks:
Issue Description:
While inspecting our benchmarking story, there are many micro-benchmark gaps that we should fill. This bug holds a listing of such gaps, and is a start task. Anyone interested in covering a gap should create a new bug for it, assign to themselves, and apply the fix. I can review the PR or provide further guidance.
-
Some String RangeReplaceableCollection operations. We're missing benchmarking for:
-
insert<C: Collection>(_: C)
- Arguments of types String, Substring, Array<Character>, Repeated<Character>, etc
-
See benchmark/single-source/RemoveWhere.swift for an example of some RRC operations.
-
-
Grapheme breaking
-
We have many benchmarks present, but they're disabled (for bad historical reasons). We should re-enable them.
-
Also, we have unicode-scalar breaking variants that are enabled, however this is a highly redundant suite as unicode-scalar-breaking is far more uniform. The list of workloads for unicode-scalar breaking should be pruned to: ascii, russian, chinese, and emoji, and renamed to not imply Character iteration
-
We don't benchmark grapheme-breaking on bridged NSStrings strings. Historically this hasn't exhibited much perf difference, but it's a bind spot currently.
-
We also count via iteration, but in theory, String.count could be made faster than raw iteration. We should pick one workload to just run String.count on.
-
See benchmark/single-source/StringWalk.swift.gyb
-
-
Substring-based comparison/hashing and benchmark unification
-
We have some for Substring without a very diverse payload in Substring.swift, and diverse payloads only for String in StringComparison.swift.
-
We should merge these two benchmarks together, getting the diversity of StringComparison.swift and the same-buffer-but-different-pointer variants for Substring.swift, applied to Substrings as well as Strings.
-
Similarly, for Strings and Substrings from bridged NSStrings, though we might prune the datasets to reduce a combinatorial explosion in number of benchmarks
-
See benchmark/single-source/Substring.swift and benchmark/single-source/StringComparison.swift
-
-
Transcoding chunks of data from one encoding to another
-
Each encoding in Unicode.Encoding has a transcode<>() method, which we can benchmark.
-
UTF8 -> UTF16 is likely to be an increasingly important one for the future.
-
See benchmark/single-source/UTF8Decode.swift for some inspiration
-
-
Case conversion: String.lowercased() and String.uppercased()
-
ASCII and non-ASCII bridged NSStrings
-
See benchmarks/single-source/AngryPhonebook.swift for some guidance
-
Completed
-
String Breadcrumbing - https://bugs.swift.org/browse/SR-9226
-
insert(_:Character) - https://bugs.swift.org/browse/SR-8908
-
replaceSubrange<C: Collection>(_:C) - https://github.com/apple/swift/pull/25310
-
case conversion ASCII and non-ASCII - https://bugs.swift.org/browse/SR-10855
Wait for approval:
- UTF16 decoding - https://github.com/apple/swift/pull/34435