Unexpected slug when header contains both of parentheses and Japanese characters · Redocly/redoc#1754

Repository metrics

Stars: (21,877 stars)
PR merge metrics: (No merged PRs in 30d)

Description

Describe the bug We found a behavior where the Japanese disappear when the markdown header contains both Japanese and parentheses. For example, we write markdown as below, each slug looks like the arrowhead.

# (a) → (a)

# (あ) → ()

# (い)  → ()

As we can see, the same slug will be generated even if the values inside the parentheses are different.

Expected behavior

# (a) → (a)

# (あ) → (あ)

# (い)  → (い)

We can reproduce unexpected slug bug using the below test codes by adding them into https://github.com/Redocly/redoc/blob/master/src/utils/__tests__/helpers.test.ts .

test('safeSlugify disappears Japanese word when contains parentheses', () => {
      expect(safeSlugify('(a)')).toEqual('(a)');
      expect(safeSlugify('(あ)')).toEqual('()');
      expect(safeSlugify('(い)')).toEqual('()');
    });

Possible solutions

This behavior is due to the fact that the slugify package removes Japanese characters. Since the slugify function allows you to optionally specify the characters to be removed, we can solve this problem by excluding the Japanese character set in addition to the default value as below,

export function safeSlugify(value: string): string {
  // default regex is here: https://github.com/simov/slugify/blob/1142e000f2b99552afb13d4118acbc25177df140/slugify.js#L38
 // Japanese unicode range is here: https://stackoverflow.com/questions/19899554/unicode-range-for-japanese
  const slug= (
    slugify(value,{remove:/[^\w\s$*_+~.()'"!\-:@\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uffef\u4e00-\u9faf]+/g}) ||
    value
      .toString()
      .toLowerCase()
      .replace(/\s+/g, '-') // Replace spaces with -
      .replace(/&/g, '-and-') // Replace & with 'and'
      .replace(/\--+/g, '-') // Replace multiple - with single -
      .replace(/^-+/, '') // Trim - from start of text
      .replace(/-+$/, '')
  ); // Trim - from end of text
  return slug
}

I would like to send a Pull Request contains these changes, so I ask maintainers whether it looks good.

Contributor guide

Research direction: Inspect the safeSlugify function and add Japanese Unicode ranges to the slugify remove option.
Tech stack: typescript
Domain: backenddocumentation
Issue type: Bug
Difficulty: 2
Estimated time: 1-3 hours
Activity status: Active
Clarity: Clear
Prerequisites: GitTypeScript
Newbie friendliness: 75

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.