astral-sh/ruff

`non-augmented-assignment` (`PLR6104`): Can we reduce the number of unsafe recommendations from this rule?

Open

#12,890 opened on 2024年8月14日

GitHub で見る
 (2 comments) (3 reactions) (0 assignees)Rust (47,527 stars) (2,088 forks)batch import
help wantedrule

説明

In #12858, I proposed stabilising non-augmented-assignment (PLR6104). However, this was reverted in https://github.com/astral-sh/ruff/pull/12884 after concerns were raised that the rule too often gives recommendations that would cause developers to make unsafe changes to their code. This issue is to discuss if there are any ways of improving the rule's heuristics to reduce this problem.


In https://github.com/astral-sh/ruff/pull/12858#issuecomment-2287060832, @kdebrab wrote:

@AlexWaygood I'm a bit concerned about this rule, because there are quite some cases where the application would unintentionally introduce nasty and hard-to-find bugs, in particular when it is applied on input arguments of a function. I'm afraid one could sometimes not be aware that applying the fix causes bugs for these cases. It is especially prevalent for data science libraries when working with numpy arrays or pandas Series / DataFrames, but it could also happen in case the argument is a dictionary or list. In my view, these are false positives which could and should be avoided by not applying the rule on variables that are function arguments.

Skimming through the examples above from the ecosystem, I found 10 such cases where applying the fix would lead to (not always evident) bugs:

PlasmaPy/PlasmaPy

pandas-dev/pandas

Would it be possible to somehow avoid raising PLR6104 for variables that are function arguments?

I agree that changing code from x = x + y to x += y is often an unsafe change if x is a mutable object like a list or array. It's even something that the rule itself is already partly aware of: the autofix for the rule is marked as unsafe unless we're able to trivially infer that x is an instance of an immutable type such as int or str.

Unfortunately, detecting in general whether an object is an instance of a mutable type is a hard problem without generalised type inference (which we won't have for a long time). So the question becomes: without that capability, how can we reduce unsafe recommendations from this rule? One option would be to make the rule much more conservative. We could match our current behaviour when it comes to deciding whether the autofix is safe or not: we could only emit a diagnostic if the object in question can be trivially inferred to be an instance of an immutable type. Another option would be to apply some kind of heuristic like @kdebrab suggests: we could only emit the diagnostic on variables that are local to a function, since local variables are likely to have shorter lifetimes, so mutating them is less likely to be unsafe even if they are instances of mutable types.

I think applying a heuristic would be likely to reduce the false positives, but I'm not a massive fan of the idea, as it would be unlikely to get rid of false positives completely, it would also increase false negatives, and it might be hard for users to understand the logic we're applying. Applying augmented assignment operators to variables bound via function parameters is perfectly safe and idiomatic if you know that the function is only going to ever be passed instances of immutable types. For example, here's a function from the standard library's _pylong.py module that seems quite idiomatic to me:

def _div2n1n(a, b, n):
    """Divide a 2n-bit nonnegative integer a by an n-bit positive integer
    b, using a recursive divide-and-conquer algorithm.

    Inputs:
      n is a positive integer
      b is a positive integer with exactly n bits
      a is a nonnegative integer such that a < 2**n * b

    Output:
      (q, r) such that a = b*q+r and 0 <= r < b.

    """
    if a.bit_length() - n <= _DIV_LIMIT:
        return divmod(a, b)
    pad = n & 1
    if pad:
        a <<= 1
        b <<= 1
        n += 1
    half_n = n >> 1
    mask = (1 << half_n) - 1
    b1, b2 = b >> half_n, b & mask
    q1, r = _div3n2n(a >> n, (a >> half_n) & mask, b, b1, b2, half_n)
    q2, r = _div3n2n(r, a & mask, b, b1, b2, half_n)
    if pad:
        r >>= 1
    return q1 << half_n | q2, r

I'm very interested in hearing whether any members of the Ruff community have any other ideas about how we could reduce false positives in a principled way here, or what they think of the ideas I posted above!

コントリビューターガイド