typecheck failure for assignment of `empty(DynArray[..., N]` with mismatched length · vyperlang/vyper#3465

(5 comments) (0 reactions) (0 assignees)Python (849 forks)batch import

bug - type 0help wanted

Repository metrics

Stars: (4,710 stars)
PR merge metrics: (Avg merge 10d 21h) (34 merged PRs in 30d)

Description

Version Information

vyper Version (output of vyper --version): 0.3.8+commit.036f153
OS: linux
Python Version (output of python --version): 3.11.3

What's your issue about?

This behaviour has been introduced in Vyper 0.3.8. So TL;DR: Vyper 0.3.8 throws if returned type of DynArray is only a subset wrt to length of allocated return type:

vyper.exceptions.TypeCheckFailure: Bad type for clearing bytes: expected DynArray[String[4], 1368] but got DynArray[String[4], 1]

Example (encode):

# @dev Sets the maximum input and output length
# allowed. For an n-byte input to be encoded, the
# space required for the Base64-encoded content
# (without line breaks) is "4 * ceil(n/3)" characters.
_DATA_INPUT_BOUND: constant(uint256) = 1024
_DATA_OUTPUT_BOUND: constant(uint256) = 1368


# @dev Defines the Base64 encoding tables. For encoding
# with a URL and filename-safe alphabet, please refer to:
# https://www.rfc-editor.org/rfc/rfc4648#section-5.
_TABLE_STD_CHARS: constant(String[65]) = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="
_TABLE_URL_CHARS: constant(String[65]) = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_="


@external
@pure
def encode(data: Bytes[_DATA_INPUT_BOUND], base64_url: bool) -> DynArray[String[4], _DATA_OUTPUT_BOUND]:
    """
    @dev Encodes a `Bytes` array using the Base64
         binary-to-text encoding scheme.
    @notice Due to the Vyper design with fixed-size
            string parameters, string concatenations
            with itself in a loop can lead to length
            mismatches (the underlying issue is that
            Vyper does not support a mutable `Bytes`
            type). To circumvent this issue, we choose
            a dynamic array as the return type.
    @param data The maximum 1024-byte data to be
           Base64-encoded.
    @param base64_url The Boolean variable that specifies
           whether to use a URL and filename-safe alphabet
           or not.
    @return DynArray The maximum 4-character user-readable
            string array that combined results in the Base64
            encoding of `data`.
    """
    data_length: uint256 = len(data)
    if (data_length == empty(uint256)):
        return empty(DynArray[String[4], 1])

    # If the length of the unencoded input is not
    # a multiple of three, the encoded output must
    # have padding added so that its length is a
    # multiple of four.
    padding: uint256 = data_length % 3
    data_padded: Bytes[_DATA_INPUT_BOUND + 2] = b""
    if (padding == 1):
        data_padded = concat(data, b"\x00\x00")
    elif (padding == 2):
        data_padded = concat(data, b"\x00")
    else:
        data_padded = data

    char_chunks: DynArray[String[4], _DATA_OUTPUT_BOUND] = []
    idx: uint256 = 0
    for _ in range(_DATA_INPUT_BOUND):
        # For the Base64 encoding, three bytes (= chunk)
        # of the bytestream (= 24 bits) are divided into
        # four 6-bit blocks.
        chunk: uint256 = convert(slice(data_padded, idx, 3), uint256)

        # To write each character, we right shift the 3-byte
        # chunk (= 24 bits) four times in blocks of six bits
        # for each character (18, 12, 6, 0). Note that masking
        # is not required for the first part of the block, as
        # 6 bits are already extracted when the chunk is shifted
        # to the right by 18 bits (out of 24 bits). To illustrate
        # why, here is an example:
        # Example case for `c1`:
        #   6bit   6bit   6bit   6bit
        # │------│------│------│------│
        #  011100 000111 100101 110100
        #
        # `>> 18` (right shift `c1` by 18 bits)
        #   6bit   6bit   6bit   6bit
        # │------│------│------│------│
        #  000000 000000 000000 011100
        #
        # 63 (or `0x3F`) is `000000000000000000111111` in binary.
        # Thus, the bitwise `AND` operation is redundant.
        c1: uint256 = shift(chunk, -18)
        c2: uint256 = shift(chunk, -12) & 63
        c3: uint256 = shift(chunk, -6) & 63
        c4: uint256 = chunk & 63

        # Base64 encoding with an URL and filename-safe
        # alphabet.
        if (base64_url):
            char_chunks.append(concat(slice(_TABLE_URL_CHARS, c1, 1), slice(_TABLE_URL_CHARS, c2, 1), slice(_TABLE_URL_CHARS, c3, 1),\
                                      slice(_TABLE_URL_CHARS, c4, 1)))
        # Base64 encoding using the standard characters.
        else:
            char_chunks.append(concat(slice(_TABLE_STD_CHARS, c1, 1), slice(_TABLE_STD_CHARS, c2, 1), slice(_TABLE_STD_CHARS, c3, 1),\
                                      slice(_TABLE_STD_CHARS, c4, 1)))

        # The following line cannot overflow because we have
        # limited the for loop by the `constant` parameter
        # `_DATA_INPUT_BOUND`, which is bounded by the
        # maximum value of `1024`.
        idx = unsafe_add(idx, 3)

        # We break the loop once we reach the end of `data`
        # (including padding).
        if (idx == len(data_padded)):
            break

    # Case 1: padding of "==" added.
    if (padding == 1):
        last_chunk: String[2] = slice(char_chunks.pop(), 0, 2)
        char_chunks.append(concat(last_chunk, "=="))
    # Case 2: padding of "=" added.
    elif (padding == 2):
        last_chunk: String[3] = slice(char_chunks.pop(), 0, 3)
        char_chunks.append(concat(last_chunk, "="))

    return char_chunks

The line that throws is return empty(DynArray[String[4], 1]). Using return [] or return empty(DynArray[String[4], _DATA_OUTPUT_BOUND]) would fix this but am wondering why this behavior was introduced?

How can it be fixed?

Allow returning subset DynArray using empty.

Contributor guide

Research direction: Investigate the typechecking logic for empty dynarray assignments in the Vyper compiler. Focus on how the length parameter is validated and allow subset lengths when returning empty.
Tech stack: python
Domain: blockchain
Issue type: Bug
Difficulty: 2
Estimated time: 1-3 hours
Activity status: Active
Clarity: Clear
Prerequisites: PythonVyper
Newbie friendliness: 70