Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disasm: Return values #13

Open
shazow opened this issue Oct 31, 2022 · 7 comments
Open

disasm: Return values #13

shazow opened this issue Oct 31, 2022 · 7 comments

Comments

@shazow
Copy link
Owner

shazow commented Oct 31, 2022

Unfortunately selector hashes don't include the return value, so none of the 4byte databases include return types.

Questions:

  1. How do we detect whether a function has a return value at all?
  2. If it does, can we do anything to guess the type or size?

What we have:

  • Function selectors with instruction pointers
  • Boundaries for selectors' functions (they seem to be assembled contiguously based on a few anecdotal examinations).

Updated challenges:

  • Old Solidity (e.g. WETH compiled with 0.4.x) assembles functions with simple return macros, so those are fairly easily detectable by looking back for RETURN from the end of each selector function's boundary.
  • Modern Solidity assembles returns through chains of helper branches that prepare the data. I can't think of a way to resolve these in a ~single pass. Anyone have ideas?
    • One of the helper branches is a STOP branch, which shouldn't be too hard to find in isolation (basically JUMPDEST STOP, sometimes there are multiples, not sure why). Could we just use the absence of a STOP or JUMP to a STOP offset as an indicator whether there is a return value of somekind?
  • In either case, I'm having trouble finding a reliable pattern for extracting the size of the return values, even in the old-Solidity simple case.
@shazow
Copy link
Owner Author

shazow commented Oct 31, 2022

I'm assuming the RETURN opcode with non-zero size will indicate if a function returns a value, but relying on that means we'd need to construct instruction ranges for each function (should be possible assuming the selector table yields back-to-back functions). [Update: This looks fine]

On the upside, that should be sufficient to give us the return size, which is often a good proxy for guessing what the type is (e.g. 160 bits -> probably address). [Update: This is false]

@peetzweg
Copy link

Using the dummy output value of [{type:"byte32"}] seem to work to get at least a "readable" value for uint256 and address types. string, gets butchered and probably tuples etc. as well.

@shazow
Copy link
Owner Author

shazow commented Oct 31, 2022

If a function returns a size that is larger than bytes32, what's a good strategy for returning an undecoded type to fit it? Like say it's 32+16+32 = 80 bytes (but we don't know the layout, we just see 80 bytes). Naive approach feels like returning 32,32,16 (basically binpacking from largest to smallest). Is there something better we could do?

Or maybe it's better to just use string type for anything >32?

@shazow
Copy link
Owner Author

shazow commented Nov 3, 2022

Started a WIP PR in #14, here are the vibes so far (from PR):

Still in the research phase, trying to find a way to detect output sizes but that's looking harder than I hoped.

It looks like modern solidity wraps most outputs through a chain of jumps that prepares the data. It's going to be quite hard to do this with a single-pass static analysis.

Older solidity (e.g. WETH contract with v0.4.x) does a simpler return macro per function window, those aren't hard to detect but extracting sizing reliably still seems hard.

Also I thought it'd be easier to detect address type outputs because they're 20 bytes rather than the usual 32, but I forgot that things get padded so it still ends up being 32 bytes.

I probably need to sleep on this in case there's other clever solutions but not looking great for single-pass static analysis right now. 😅

@shazow
Copy link
Owner Author

shazow commented Nov 4, 2022

Updated the current state and challenges in the issue description, going to pass it around to some folks to see if anyone else has ideas. Feel free to re-share. :)

@shazow
Copy link
Owner Author

shazow commented Jan 22, 2023

I just merged a branch which does more advanced static analysis into master, haven't done a release yet.

In some cases, it manages to successfully guess whether there are inputs or outputs (not super reliable, I'd say like... 60%?), but there have been major changes behind the scenes with how the static analysis works so we can do more advanced things moving forward.

Also we now have stateMutability included in the ABI, which is reliable in detecting payable functions, but not reliable in distinguishing nonpayable vs view yet.

Would appreciate some testing and feedback before I do a proper release. :)

@shazow
Copy link
Owner Author

shazow commented Jan 22, 2023

Next release issue is here: #18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants