Redocly/redoc

Unexpected slug when header contains both of parentheses and Japanese characters

Open

#1754 aperta il 20 set 2021

Vedi su GitHub
 (1 commento) (0 reazioni) (0 assegnatari)TypeScript (2272 fork)batch import
help wanted

Metriche repository

Star
 (21.877 star)
Metriche merge PR
 (Merge medio 3h 40m) (11 PR mergiate in 30 g)

Descrizione

Describe the bug We found a behavior where the Japanese disappear when the markdown header contains both Japanese and parentheses. For example, we write markdown as below, each slug looks like the arrowhead.

# (a) → (a)

# (あ) → ()

# (い)  → ()

As we can see, the same slug will be generated even if the values inside the parentheses are different.

Expected behavior

# (a) → (a)

# (あ) → (あ)

# (い)  → (い)

We can reproduce unexpected slug bug using the below test codes by adding them into https://github.com/Redocly/redoc/blob/master/src/utils/__tests__/helpers.test.ts .

test('safeSlugify disappears Japanese word when contains parentheses', () => {
      expect(safeSlugify('(a)')).toEqual('(a)');
      expect(safeSlugify('(あ)')).toEqual('()');
      expect(safeSlugify('(い)')).toEqual('()');
    });

Possible solutions

This behavior is due to the fact that the slugify package removes Japanese characters. Since the slugify function allows you to optionally specify the characters to be removed, we can solve this problem by excluding the Japanese character set in addition to the default value as below,

export function safeSlugify(value: string): string {
  // default regex is here: https://github.com/simov/slugify/blob/1142e000f2b99552afb13d4118acbc25177df140/slugify.js#L38
 // Japanese unicode range is here: https://stackoverflow.com/questions/19899554/unicode-range-for-japanese
  const slug= (
    slugify(value,{remove:/[^\w\s$*_+~.()'"!\-:@\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uffef\u4e00-\u9faf]+/g}) ||
    value
      .toString()
      .toLowerCase()
      .replace(/\s+/g, '-') // Replace spaces with -
      .replace(/&/g, '-and-') // Replace & with 'and'
      .replace(/\--+/g, '-') // Replace multiple - with single -
      .replace(/^-+/, '') // Trim - from start of text
      .replace(/-+$/, '')
  ); // Trim - from end of text
  return slug
}

I would like to send a Pull Request contains these changes, so I ask maintainers whether it looks good.

Guida contributor