dotnet/roslyn

Order of evaluation for string concatenation still does not follow C# specification

Open

#38,641 opened on Sep 11, 2019

View on GitHub
 (3 comments) (0 reactions) (0 assignees)C# (20,414 stars) (4,257 forks)batch import
Area-CompilersBugCommunityhelp wanted

Description

Version Used: VS2019.3 Preview 3 / Roslyn 3.3.0-beta2

#35006 claims to have fixed #522, but the new behavior implemented does not match the C# specification either.

Expected Behavior: If I understood the specification correctly, the + operator applied to object + string should:

  1. Evaluate the left-hand side as an object
  2. Evaluate the right-hand side as a string
  3. Call ToString on the object in order to convert it to a string
  4. Concatenate the two strings

Actual Behavior: Starting in Roslyn 3.3.0-beta2, the actual behavior no longer matches the specification even in cases with only two operands (the old behavior only differed from the specification when there were more than two operands, see #522 for details). The new implemented behavior is:

  1. Evaluate the left-hand side as an object
  2. Call ToString on the object in order to convert it to a string
  3. Evaluate the right-hand side as a string
  4. Concatenate the two strings

Full example for 2 operands on SharpLab

More than two operands: The expected behavior gets more tricky when there are multiple operands involved. For

new C(0) + GetString() + new C(1)

The + operator groups left-associative, so we're evaluating (new C(0) + Space()) + new C(1). The outer + first evaluates the LHS, so from the spec I would expect the evaluation order:

  • new C(0)
  • GetString()
  • C(0).ToString()
  • new C(1)
  • C(1).ToString() But that's not what the Roslyn master currently does; it instead results in:
new C(0)
C(0).ToString()
GetString()
new C(1)
C(1).ToString()

Effectively, Roslyn master immediately converts the non-string to string, before even evaluating the other argument to string.op_Addition(object, string). Full example for 3 operands on SharpLab

It gets even more interesting if we use parentheses to re-group the concatenation:

Console.WriteLine(new C(0) + (GetString() + new C(1)));

Now from the spec I would expect this evaluate as in op_Addition(object, op_Addition(string, object)), so:

  • new C(0)
  • GetString()
  • new C(1)
  • C(1).ToString()
  • C(0).ToString() Roslyn master evaluates this as:
new C(0)
C(0).ToString()
GetString()
new C(1)
C(1).ToString()

So my takeaway is: The old compiler was optimizing string concatenation in a way that doesn't follow the language spec; the new compiler is optimizing string concatenation further in a way that still doesn't follow language spec. Though arguably this time one could say that it's the spec that is broken. The new compiler behavior intuitively makes sense: every object is immediately converted to string, before evaluating any further operands.

@gafter @canton7

Contributor guide