crytic/slither

[Bug-Candidate]: Consider non-ASCII when source mapping

Open

#1,164 opened on Apr 11, 2022

View on GitHub
 (1 comment) (0 reactions) (0 assignees)Python (4,769 stars) (886 forks)batch import
enhancementhelp wanted

Description

Describe the issue:

Hey!

Just noticed that source_mapping will be misaligned when a .sol file containing non-ASCII is opened. Maybe "misaligned" isn't the correct phrasing as it depends on how the .sol file is opened in Python. If I open like in Slither and Crytic compile, a normal read with encoding=utf-8 it will be misaligned, but it works fine reading it as bytes (rb).

Not sure if this has any implication on Slither itself, but might be handy to know when trying to slice the source based on Slither output.

Code example to reproduce the issue:

from slither.slither import Slither


# Write simple test contract with non-ASCII
with open('test.sol', 'w') as sol:
    sol.write(
        ''' 
        // 有趣的  // <- will cause shift with utf-8 read as in bytes this is \xe6\x9c\x89\xe8\xb6\xa3\xe7\x9a\x84
        contract A {
            
            uint public x;
        
            constructor() public {
                x = 1;
            }
        
        }
        '''
    )

# Parse with Slither
slither = Slither('test.sol')

# Open file with utf-8 as done in Slither and print contract positions
with open('test.sol', encoding='utf-8') as sol_file:
    source_code = sol_file.read()
    contract_mapping = slither.contracts[0].source_mapping
    start, end = contract_mapping['start'], contract_mapping['start'] + contract_mapping['length']
    print(source_code[start:end])

Will print the incorrect part of the source code:

ct A {
            
            uint public x;
        
            constructor() public {
                x = 1;
            }
        
        }

Can change it to binary read and then decode and it looks fine:

with open('test.sol', 'rb') as sol_file: # read binary
    source_code = sol_file.read()
    contract_mapping = slither.contracts[0].source_mapping
    start, end = contract_mapping['start'], contract_mapping['start'] + contract_mapping['length']
    print(source_code[start:end].decode('utf-8')) # decode binary

Will print:

contract A {
            
            uint public x;
        
            constructor() public {
                x = 1;
            }
        
        }

Version:

Slither: 0.8.2

Relevant log output:

No response

Contributor guide