[SR-8905] Gaps in String benchmarking · swiftlang/swift#51411

2018-10-03T20:00:27.000Z

| | | |------------------|-----------------| |Previous ID | SR-8905 | |Radar | None | |Original Reporter | @milseman | |Type | Task | Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 1 | |Component/s | Standard Library | |Labels | Task, Benchmark, StarterBug | |Assignee | None | |Priority | Medium | md5: 310808e3ff76ad0c9651e7a3550b6d27 **Sub-Tasks**: * [SR-9226](https://bugs.swift.org/browse/SR-9226) Breadcrumb benchmarks * [SR-10855](https://bugs.swift.org/browse/SR-10855) Benchmark non-ASCII with AngryPhonebook **Issue Description:** While inspecting our benchmarking story, there are many micro-benchmark gaps that we should fill. This bug holds a listing of such gaps, and is a start task. Anyone interested in covering a gap should create a new bug for it, assign to themselves, and apply the fix. I can review the PR or provide further guidance. - Some String RangeReplaceableCollection operations. We're missing benchmarking for: - insert\ (\_: C) - Arguments of types String, Substring, Array\ , Repeated\ , etc - See benchmark/single-source/RemoveWhere.swift for an example of some RRC operations. - Grapheme breaking - We have many benchmarks present, but they're disabled (for bad historical reasons). We should re-enable them. - Also, we have unicode-scalar breaking variants that are enabled, however this is a highly redundant suite as unicode-scalar-breaking is far more uniform. The list of workloads for unicode-scalar breaking should be pruned to: ascii, russian, chinese, and emoji, and renamed to not imply Character iteration - We don't benchmark grapheme-breaking on bridged NSStrings strings. Historically this hasn't exhibited much perf difference, but it's a bind spot currently. - We also count via iteration, but in theory, String.count could be made faster than raw iteration. We should pick one workload to just run String.count on. - See benchmark/single-source/StringWalk.swift.gyb - Substring-based comparison/hashing and benchmark unification - We have some for Substring without a very diverse payload in Substring.swift, and diverse payloads only for String in StringComparison.swift. - We should merge these two benchmarks together, getting the diversity of StringComparison.swift and the same-buffer-but-different-pointer variants for Substring.swift, applied to Substrings as well as Strings. - Similarly, for Strings and Substrings from bridged NSStrings, though we might prune the datasets to reduce a combinatorial explosion in number of benchmarks - See benchmark/single-source/Substring.swift and benchmark/single-source/StringComparison.swift - Transcoding chunks of data from one encoding to another - Each encoding in Unicode.Encoding has a transcode\ () method, which we can benchmark. - UTF8 -\> UTF16 is likely to be an increasingly important one for the future. - See benchmark/single-source/UTF8Decode.swift for some inspiration - Case conversion: String.lowercased() and String.uppercased() - ASCII and non-ASCII bridged NSStrings - See benchmarks/single-source/AngryPhonebook.swift for some guidance Completed - String Breadcrumbing - - insert(\_:Character) - - replaceSubrange\ (\_:C) - - case conversion ASCII and non-ASCII - Wait for approval: - UTF16 decoding - https://github.com/apple/swift/pull/34435

(10 comments) (0 reactions) (0 assignees)Swift (10,719 forks)batch import

benchmarksgood first issuestandard library

Repository metrics

Stars: (69,989 stars)
PR merge metrics: (Avg merge 8d 17h) (510 merged PRs in 30d)

Description


Previous ID	SR-8905
Radar	None
Original Reporter	@milseman
Type	Task


Votes	1
Component/s	Standard Library
Labels	Task, Benchmark, StarterBug
Assignee	None
Priority	Medium

md5: 310808e3ff76ad0c9651e7a3550b6d27

Sub-Tasks:

SR-9226 Breadcrumb benchmarks
SR-10855 Benchmark non-ASCII with AngryPhonebook

Issue Description:

While inspecting our benchmarking story, there are many micro-benchmark gaps that we should fill. This bug holds a listing of such gaps, and is a start task. Anyone interested in covering a gap should create a new bug for it, assign to themselves, and apply the fix. I can review the PR or provide further guidance.

Some String RangeReplaceableCollection operations. We're missing benchmarking for:
- insert<C: Collection>(_: C)
  - Arguments of types String, Substring, Array<Character>, Repeated<Character>, etc
- See benchmark/single-source/RemoveWhere.swift for an example of some RRC operations.
Grapheme breaking
- We have many benchmarks present, but they're disabled (for bad historical reasons). We should re-enable them.
- Also, we have unicode-scalar breaking variants that are enabled, however this is a highly redundant suite as unicode-scalar-breaking is far more uniform. The list of workloads for unicode-scalar breaking should be pruned to: ascii, russian, chinese, and emoji, and renamed to not imply Character iteration
- We don't benchmark grapheme-breaking on bridged NSStrings strings. Historically this hasn't exhibited much perf difference, but it's a bind spot currently.
- We also count via iteration, but in theory, String.count could be made faster than raw iteration. We should pick one workload to just run String.count on.
- See benchmark/single-source/StringWalk.swift.gyb
Substring-based comparison/hashing and benchmark unification
- We have some for Substring without a very diverse payload in Substring.swift, and diverse payloads only for String in StringComparison.swift.
- We should merge these two benchmarks together, getting the diversity of StringComparison.swift and the same-buffer-but-different-pointer variants for Substring.swift, applied to Substrings as well as Strings.
- Similarly, for Strings and Substrings from bridged NSStrings, though we might prune the datasets to reduce a combinatorial explosion in number of benchmarks
- See benchmark/single-source/Substring.swift and benchmark/single-source/StringComparison.swift
Transcoding chunks of data from one encoding to another
- Each encoding in Unicode.Encoding has a transcode<>() method, which we can benchmark.
- UTF8 -> UTF16 is likely to be an increasingly important one for the future.
- See benchmark/single-source/UTF8Decode.swift for some inspiration
Case conversion: String.lowercased() and String.uppercased()
- ASCII and non-ASCII bridged NSStrings
- See benchmarks/single-source/AngryPhonebook.swift for some guidance

Completed

String Breadcrumbing - https://bugs.swift.org/browse/SR-9226
insert(_:Character) - https://bugs.swift.org/browse/SR-8908
replaceSubrange<C: Collection>(_:C) - https://github.com/apple/swift/pull/25310
case conversion ASCII and non-ASCII - https://bugs.swift.org/browse/SR-10855

Wait for approval:

UTF16 decoding - https://github.com/apple/swift/pull/34435

Contributor guide

Research direction: Select one of the listed benchmark gaps (e.g., insert operations, grapheme breaking, substring comparison, transcoding, case conversion). Study the existing benchmark files like RemoveWhere.swift, StringWalk.swift, Substring.swift, and StringComparison.swift in benchmark/single source/ to understand the pattern. Implement a new benchmark for the chosen gap, create a pull request, and follow the guidance from the issue reporter.
Tech stack: swift
Domain: performance
Issue type: Test
Difficulty: 2
Estimated time: Half day
Activity status: Active
Clarity: Clear
Prerequisites: SwiftGit
Newbie friendliness: 75

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.