BurntSushi/ripgrep

document Windows locale and path separator settings

Open

#1667 opened on Aug 27, 2020

View on GitHub
 (4 comments) (0 reactions) (0 assignees)Rust (63,768 stars) (2,559 forks)batch import
dochelp wanted

Description

What version of ripgrep are you using?

ripgrep 12.1.1 (rev 7cb211378a) -SIMD -AVX (compiled) +SIMD -AVX (runtime)

How did you install ripgrep?

Unzipped Windows 64 bit versions (both MSVC and GNU behave identical, using MSVC).

What operating system are you using ripgrep on?

Windows 10 Pro Build 18362.19h1_release.190318-1202

Describe your bug.

Inconsistent output of umlauts on Windows depending on running ripgrep from CMD or Git Bash.

What are the steps to reproduce the behavior?

Both shells are configured to use Lucida Console font.

Using standard CMD (PowerShell not tested): cd ripgrep-test

type subdir\ExampleWithUmlauts.cs

  • Displays garbled umlauts

rg SubjectCodes

  • Displays file name with Windows-like backslash
  • Displays correct umlauts

Using Git Bash (MINGW64): cd ripgrep-test

cat subdir/ExampleWithUmlauts.cs

  • Displays correct umlauts

rg SubjectCodes

  • Displays file name with unwanted backslash instead for UNIX-like (forward) slash
  • Displays garbled umlauts

What is the actual behavior?

CMD:

C:\Users\Sandra.Eickel\Documents\ripgrep-test>type subdir\ExampleWithUmlauts.cs
namespace ZUGFeRD_Test
{
    class ZugFerd1ExtendedWarenrechnungGenerator
    {
        private InvoiceDescriptor _generateDescriptor()
        {
            desc.AddNote("Es bestehen Rabatt- oder Bonusvereinbarungen.", SubjectCodes.AAK, ContentCodes.ST3);
            desc.AddNote("Der Verkäufer bleibt Eigentümer der Waren bis zu vollständigen Erfüllung der Kaufpreisforderung.", SubjectCodes.AAJ, ContentCodes.EEV);
            desc.AddNote("MUSTERLIEFERANT GMBH BAHNHOFSTRASSE 99 99199 MUSTERHAUSEN Geschäftsführung: Max Mustermann USt-IdNr: DE123456789 Telefon: +49 932 431 0 www.musterlieferant.de HRB Nr. 372876 Amtsgericht Musterstadt GLN 4304171000002 WEEE-Reg-Nr.: DE87654321",
                         SubjectCodes.REG);
            desc.AddNote("Leergutwert: 46,50");
        };
    }
}

C:\Users\Sandra.Eickel\Documents\ripgrep-test>rg --debug SubjectCodes
DEBUG|grep_regex::literal|crates\regex\src\literal.rs:58: literal prefixes detected: Literals { lits: [Complete(SubjectCodes)], limit_size: 250, limit_class: 10 }
DEBUG|globset|crates\globset\src\lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates\globset\src\lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates\globset\src\lib.rs:431: built glob set; 0 literals, 0 basenames, subdir\ExampleWithUmlauts.cs
7:            desc.AddNote("Es bestehen Rabatt- oder Bonusvereinbarungen.", SubjectCodes.AAK, ContentCodes.ST3);
8:            desc.AddNote("Der Verkäufer bleibt Eigentümer der Waren bis zu vollständigen Erfüllung der Kaufpreisforderung.", SubjectCodes.AAJ, ContentCodes.EEV);
10:                         SubjectCodes.REG);
12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates\globset\src\lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates\globset\src\lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

Git Bash:

$ cat subdir/ExampleWithUmlauts.cs
namespace ZUGFeRD_Test
{
    class ZugFerd1ExtendedWarenrechnungGenerator
    {
        private InvoiceDescriptor _generateDescriptor()
        {
            desc.AddNote("Es bestehen Rabatt- oder Bonusvereinbarungen.", SubjectCodes.AAK, ContentCodes.ST3);
            desc.AddNote("Der Verkäufer bleibt Eigentümer der Waren bis zu vollständigen Erfüllung der Kaufpreisforderung.", SubjectCodes.AAJ, ContentCodes.EEV);
            desc.AddNote("MUSTERLIEFERANT GMBH BAHNHOFSTRASSE 99 99199 MUSTERHAUSEN Geschäftsführung: Max Mustermann USt-IdNr: DE123456789 Telefon: +49 932 431 0 www.musterlieferant.de HRB Nr. 372876 Amtsgericht Musterstadt GLN 4304171000002 WEEE-Reg-Nr.: DE87654321",
                         SubjectCodes.REG);
            desc.AddNote("Leergutwert: 46,50");
        };
    }
}

$ rg --debug SubjectCodes
DEBUG|grep_regex::literal|crates\regex\src\literal.rs:58: literal prefixes detected: Literals { lits: [Complete(SubjectCodes)], limit_size: 250, limit_class: 10 }
DEBUG|globset|crates\globset\src\lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates\globset\src\lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates\globset\src\lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates\globset\src\lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates\globset\src\lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
subdir\ExampleWithUmlauts.cs
7:            desc.AddNote("Es bestehen Rabatt- oder Bonusvereinbarungen.", SubjectCodes.AAK, ContentCodes.ST3);
8:            desc.AddNote("Der Verkäufer bleibt Eigentümer der Waren bis zu vollständigen Erfüllung der Kaufpreisforderung.", SubjectCodes.AAJ, ContentCodes.EEV);
10:                         SubjectCodes.REG);

What is the expected behavior?

While it is nice that it produces readable umlauts when run from CMD, the behaviour when run via Git Bash is not that helpful. It should output readable umlauts and it should print paths with UNIX-like slashes instead of problematic backslashes, including support for drives like "/c/path" for "C:\path". I did not test with paths containing spaces or umlauts, which - at least for spaces - might have to be treated differently depending on the environment ...

ripgrep-test.zip

See also #234 and #530 - even if those refer to file name globbing differences.

Contributor guide