Redocly/redoc

Unexpected slug when header contains both of parentheses and Japanese characters

Open

#1754 opened on Sep 20, 2021

View on GitHub
 (1 comment) (0 reactions) (0 assignees)TypeScript (21,877 stars) (2,272 forks)batch import
help wanted

Description

Describe the bug We found a behavior where the Japanese disappear when the markdown header contains both Japanese and parentheses. For example, we write markdown as below, each slug looks like the arrowhead.

# (a) → (a)

# (あ) → ()

# (い)  → ()

As we can see, the same slug will be generated even if the values inside the parentheses are different.

Expected behavior

# (a) → (a)

# (あ) → (あ)

# (い)  → (い)

We can reproduce unexpected slug bug using the below test codes by adding them into https://github.com/Redocly/redoc/blob/master/src/utils/__tests__/helpers.test.ts .

test('safeSlugify disappears Japanese word when contains parentheses', () => {
      expect(safeSlugify('(a)')).toEqual('(a)');
      expect(safeSlugify('(あ)')).toEqual('()');
      expect(safeSlugify('(い)')).toEqual('()');
    });

Possible solutions

This behavior is due to the fact that the slugify package removes Japanese characters. Since the slugify function allows you to optionally specify the characters to be removed, we can solve this problem by excluding the Japanese character set in addition to the default value as below,

export function safeSlugify(value: string): string {
  // default regex is here: https://github.com/simov/slugify/blob/1142e000f2b99552afb13d4118acbc25177df140/slugify.js#L38
 // Japanese unicode range is here: https://stackoverflow.com/questions/19899554/unicode-range-for-japanese
  const slug= (
    slugify(value,{remove:/[^\w\s$*_+~.()'"!\-:@\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uffef\u4e00-\u9faf]+/g}) ||
    value
      .toString()
      .toLowerCase()
      .replace(/\s+/g, '-') // Replace spaces with -
      .replace(/&/g, '-and-') // Replace & with 'and'
      .replace(/\--+/g, '-') // Replace multiple - with single -
      .replace(/^-+/, '') // Trim - from start of text
      .replace(/-+$/, '')
  ); // Trim - from end of text
  return slug
}

I would like to send a Pull Request contains these changes, so I ask maintainers whether it looks good.

Contributor guide