Unexpected slug when header contains both of parentheses and Japanese characters
#1754 opened on Sep 20, 2021
Description
Describe the bug We found a behavior where the Japanese disappear when the markdown header contains both Japanese and parentheses. For example, we write markdown as below, each slug looks like the arrowhead.
# (a) → (a)
# (あ) → ()
# (い) → ()
As we can see, the same slug will be generated even if the values inside the parentheses are different.
Expected behavior
# (a) → (a)
# (あ) → (あ)
# (い) → (い)
We can reproduce unexpected slug bug using the below test codes by adding them into https://github.com/Redocly/redoc/blob/master/src/utils/__tests__/helpers.test.ts .
test('safeSlugify disappears Japanese word when contains parentheses', () => {
expect(safeSlugify('(a)')).toEqual('(a)');
expect(safeSlugify('(あ)')).toEqual('()');
expect(safeSlugify('(い)')).toEqual('()');
});
Possible solutions
This behavior is due to the fact that the slugify package removes Japanese characters. Since the slugify function allows you to optionally specify the characters to be removed, we can solve this problem by excluding the Japanese character set in addition to the default value as below,
export function safeSlugify(value: string): string {
// default regex is here: https://github.com/simov/slugify/blob/1142e000f2b99552afb13d4118acbc25177df140/slugify.js#L38
// Japanese unicode range is here: https://stackoverflow.com/questions/19899554/unicode-range-for-japanese
const slug= (
slugify(value,{remove:/[^\w\s$*_+~.()'"!\-:@\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uffef\u4e00-\u9faf]+/g}) ||
value
.toString()
.toLowerCase()
.replace(/\s+/g, '-') // Replace spaces with -
.replace(/&/g, '-and-') // Replace & with 'and'
.replace(/\--+/g, '-') // Replace multiple - with single -
.replace(/^-+/, '') // Trim - from start of text
.replace(/-+$/, '')
); // Trim - from end of text
return slug
}
I would like to send a Pull Request contains these changes, so I ask maintainers whether it looks good.