disasm: Return values #13

shazow · 2022-10-31T20:19:30Z

Unfortunately selector hashes don't include the return value, so none of the 4byte databases include return types.

Questions:

How do we detect whether a function has a return value at all?
If it does, can we do anything to guess the type or size?

What we have:

Function selectors with instruction pointers
Boundaries for selectors' functions (they seem to be assembled contiguously based on a few anecdotal examinations).

Updated challenges:

Old Solidity (e.g. WETH compiled with 0.4.x) assembles functions with simple return macros, so those are fairly easily detectable by looking back for RETURN from the end of each selector function's boundary.
Modern Solidity assembles returns through chains of helper branches that prepare the data. I can't think of a way to resolve these in a ~single pass. Anyone have ideas?
- One of the helper branches is a STOP branch, which shouldn't be too hard to find in isolation (basically JUMPDEST STOP, sometimes there are multiples, not sure why). Could we just use the absence of a STOP or JUMP to a STOP offset as an indicator whether there is a return value of somekind?
In either case, I'm having trouble finding a reliable pattern for extracting the size of the return values, even in the old-Solidity simple case.

The text was updated successfully, but these errors were encountered:

shazow · 2022-10-31T20:26:26Z

I'm assuming the RETURN opcode with non-zero size will indicate if a function returns a value, but relying on that means we'd need to construct instruction ranges for each function (should be possible assuming the selector table yields back-to-back functions). [Update: This looks fine]

On the upside, that should be sufficient to give us the return size, which is often a good proxy for guessing what the type is (e.g. 160 bits -> probably address). [Update: This is false]

peetzweg · 2022-10-31T20:49:51Z

Using the dummy output value of [{type:"byte32"}] seem to work to get at least a "readable" value for uint256 and address types. string, gets butchered and probably tuples etc. as well.

shazow · 2022-10-31T21:01:19Z

If a function returns a size that is larger than bytes32, what's a good strategy for returning an undecoded type to fit it? Like say it's 32+16+32 = 80 bytes (but we don't know the layout, we just see 80 bytes). Naive approach feels like returning 32,32,16 (basically binpacking from largest to smallest). Is there something better we could do?

Or maybe it's better to just use string type for anything >32?

shazow · 2022-11-03T17:30:48Z

Started a WIP PR in #14, here are the vibes so far (from PR):

Still in the research phase, trying to find a way to detect output sizes but that's looking harder than I hoped.

It looks like modern solidity wraps most outputs through a chain of jumps that prepares the data. It's going to be quite hard to do this with a single-pass static analysis.

Older solidity (e.g. WETH contract with v0.4.x) does a simpler return macro per function window, those aren't hard to detect but extracting sizing reliably still seems hard.

Also I thought it'd be easier to detect address type outputs because they're 20 bytes rather than the usual 32, but I forgot that things get padded so it still ends up being 32 bytes.

I probably need to sleep on this in case there's other clever solutions but not looking great for single-pass static analysis right now. 😅

shazow · 2022-11-04T15:11:50Z

Updated the current state and challenges in the issue description, going to pass it around to some folks to see if anyone else has ideas. Feel free to re-share. :)

shazow · 2023-01-22T16:55:22Z

I just merged a branch which does more advanced static analysis into master, haven't done a release yet.

In some cases, it manages to successfully guess whether there are inputs or outputs (not super reliable, I'd say like... 60%?), but there have been major changes behind the scenes with how the static analysis works so we can do more advanced things moving forward.

Also we now have stateMutability included in the ABI, which is reliable in detecting payable functions, but not reliable in distinguishing nonpayable vs view yet.

Would appreciate some testing and feedback before I do a proper release. :)

shazow · 2023-01-22T17:01:11Z

Next release issue is here: #18

shazow mentioned this issue Nov 3, 2022

disasm: Detect output values #14

Merged

peetzweg mentioned this issue Dec 16, 2022

Make use of Whatsabi for Unknown ABIs peetzweg/notar#7

Open

shazow added this to the Analysis Coverage & Reliability milestone Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disasm: Return values #13

disasm: Return values #13

shazow commented Oct 31, 2022 •

edited

Loading

shazow commented Oct 31, 2022 •

edited

Loading

peetzweg commented Oct 31, 2022

shazow commented Oct 31, 2022 •

edited

Loading

shazow commented Nov 3, 2022 •

edited

Loading

shazow commented Nov 4, 2022

shazow commented Jan 22, 2023

shazow commented Jan 22, 2023

disasm: Return values #13

disasm: Return values #13

Comments

shazow commented Oct 31, 2022 • edited Loading

Questions:

What we have:

Updated challenges:

shazow commented Oct 31, 2022 • edited Loading

peetzweg commented Oct 31, 2022

shazow commented Oct 31, 2022 • edited Loading

shazow commented Nov 3, 2022 • edited Loading

shazow commented Nov 4, 2022

shazow commented Jan 22, 2023

shazow commented Jan 22, 2023

shazow commented Oct 31, 2022 •

edited

Loading

shazow commented Oct 31, 2022 •

edited

Loading

shazow commented Oct 31, 2022 •

edited

Loading

shazow commented Nov 3, 2022 •

edited

Loading