Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function argument fixes #198

Open
wants to merge 4 commits into
base: staging
Choose a base branch
from
Open

Conversation

mpinjr
Copy link
Contributor

@mpinjr mpinjr commented Sep 29, 2023

Fix argument counting error which cuts the argument capacity in half.
Fix conversion of uninitialized function arguments into arrays.
Plug some leaks and eliminate some uses-after-free.

Hope everyone is doing well,
Miguel

It's impossible for a node in a user-defined function to reference
arguments beyond those which are defined, so don't store them in the
arg lists. Evaluate them for side-effects and discard.

This change also plugs a tempcell leak in args[i] for i >= ndef.

===============> cat leak.sh
#/bin/sh

# Monitor awk's memory consumption while repeatedly calling a
# user-defined function with a couple of extra arguments.

${1:-./a.out} '
function f(x) { x }
BEGIN {
	mem = "ps -p $PPID -o rsz="
	system(mem)
	while (++i <= 100000) {
		f(i, i, i)
		if (i % 10000 == 0)
			system(mem)
	}
}
' 2>/dev/null
===============> paste <(./leak.sh ./master.out) <(./leak.sh ./a.out)
 2432	 2304
 3456	 2304
 4608	 2304
 5632	 2304
 6656	 2304
 7680	 2304
 8832	 2304
 9856	 2304
11008	 2304
12160	 2304
13312	 2304
When checking if a function call exceeds the NARGS maximum argument
count, arguments for defined parameters were counted twice, reducing
the call capacity in half.

===============> cat nargs.awk
# NARGS is currently 50

function args25(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10,
	     p11, p12, p13, p14, p15, p16, p17, p18, p19, p20,
	     p21, p22, p23, p24, p25) {
    print "25 arguments"
}

function args26(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10,
	     p11, p12, p13, p14, p15, p16, p17, p18, p19, p20,
	     p21, p22, p23, p24, p25, p26) {
    print "26 arguments"
}

function args50(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10,
	     p11, p12, p13, p14, p15, p16, p17, p18, p19, p20,
	     p21, p22, p23, p24, p25, p26, p27, p28, p29, p30,
	     p31, p32, p33, p34, p35, p36, p37, p38, p39, p40,
	     p41, p42, p43, p44, p45, p46, p47, p48, p49, p50) {
    print "50 arguments"
}

function args51(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10,
	     p11, p12, p13, p14, p15, p16, p17, p18, p19, p20,
	     p21, p22, p23, p24, p25, p26, p27, p28, p29, p30,
	     p31, p32, p33, p34, p35, p36, p37, p38, p39, p40,
	     p41, p42, p43, p44, p45, p46, p47, p48, p49, p50, p51) {
    print "51 arguments"
}

BEGIN {
    args25(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10,
	a11, a12, a13, a14, a15, a16, a17, a18, a19, a20,
	a21, a22, a23, a24, a25)

    args26(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10,
	a11, a12, a13, a14, a15, a16, a17, a18, a19, a20,
	a21, a22, a23, a24, a25, a26)

    args50(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10,
	a11, a12, a13, a14, a15, a16, a17, a18, a19, a20,
	a21, a22, a23, a24, a25, a26, a27, a28, a29, a30,
	a31, a32, a33, a34, a35, a36, a37, a38, a39, a40,
	a41, a42, a43, a44, a45, a46, a47, a48, a49, a50)

    # Call args50 with 51 arguments. The 51st does not correspond
    # to a defined parameter, so it doesn't count against NARGS
    # (since excess args are not stored in the argument lists).
    args50(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10,
	a11, a12, a13, a14, a15, a16, a17, a18, a19, a20,
	a21, a22, a23, a24, a25, a26, a27, a28, a29, a30,
	a31, a32, a33, a34, a35, a36, a37, a38, a39, a40,
	a41, a42, a43, a44, a45, a46, a47, a48, a49, a50, a51)

    args51(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10,
	a11, a12, a13, a14, a15, a16, a17, a18, a19, a20,
	a21, a22, a23, a24, a25, a26, a27, a28, a29, a30,
	a31, a32, a33, a34, a35, a36, a37, a38, a39, a40,
	a41, a42, a43, a44, a45, a46, a47, a48, a49, a50, a51)
}
===============>$ ./master.out -f nargs.awk
25 arguments
./master.out: function args26 has 52 arguments, limit 50
 source line number 38
===============>$ ./a.out -f nargs.awk
25 arguments
26 arguments
50 arguments
./a.out: function args50 called with 51 args, uses only 50
 source line number 53
50 arguments
./a.out: function args51 has 51 defined arguments, limit 50
 source line number 59
There are at least two reasons for why the propagation of the
conversion of an uninitialized argument into an array cannot wait
until the function has finished.

The first is that otherwise any arguments in the active frame that
share the origin of the converted argument will persist for the
life of the function as independent copies of the uninitialized
value instead of referring to the new array.

The second is that the exit statement's longjmp entirely skips
the function epilogue (where the conversion had been handled),
opening the door to incorrect values in END blocks.

Instead of waiting for the epilogue, toarray immediately initiates
a scan of the stack frames to update those arguments that have the
same origin as the converted argument.

The argument list is now strictly of a single cell subtype, CCOPY. It
had been allowed to contain a mix of subtypes (CCOPY and CVAR). This
restriction simplifies the logic of the epilogue significantly.

The oargs list is no more. CCOPY cells use cnext to record their
origin (oargs recorded the parent, but the origin may be a more
distant ancestor).

Passing by reference had been done by setting the argument to the
address of the array's cell. This is not allowed under the new
regime. Instead it uses a pointer to the array's symbol table.
Which means that the table must be immobilzed or the stack frames
must be scanned upon relocation. Enter clearsymtab to empty arrays
without moving their symtabs.

This commit also fixes several other defects, among them a symbol
table leak (in the epilogue, unaware of each other, undetected
siblings would assign their private symbol tables to the same
parent), the use-after-free discovered by @millert in onetrueawk#157
(apologies for the delay in getting this out), and the following
use-after-free from a dangling name pointer (manifests only when
debugging is enabled):

----->$ cat dangling-nval.sh

awk=${1:-nawk}

valgrind --leak-check=full $awk -d '
function f(x) {
	g()
	print x
}

function g() {
	delete a["hi"]
}

BEGIN {
	a["hi"] = "world"
	f(a["hi"])
}'
----->$

For the moment, function arguments are anonymous. This change forced
a small modification to a test in T.misc.
@mpinjr mpinjr changed the title Fix functions Function argument fixes Sep 29, 2023
run.c Outdated
*
* For now, args are anonymous. Debugging can use frame/arg indices.
*/
y->nval = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a concern since nval is printed by the debug and warning code. On Linux and BSD this is safe, since printf(3) will print "(null)" but on other systems it will cause a NULL dereference. So either we need to wrap nval with the NN macro everywhere it is printed or set it to some non-NULL value. Setting to x->nval(or a copy of it) is not really correct but perhaps is better than NULL.

Copy link
Contributor Author

@mpinjr mpinjr Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debugging statements need to be able to cope with anonymous cells. The temp cells used for expressions (including function return values) are anonymous. The jump cells are as well.

A quick grep found 13 instances of nval in DPRINTF statements, 10 wrapped, 3 unwrapped. I will wrap them all.

Also, parsing a function leaks the previous function definition's arglist. Compare 'function f(a) {}' with 'function f(a) {} function g() {}'. I will see about preserving the argument names while also eliminating the leak.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not just DPRINTF calls, also SYNTAX, WARNING and FATAL. Some of those may not be reachable but the ones in funnyvar() are.

Copy link
Collaborator

@plan9 plan9 Sep 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a lot to digest and review here. miguel, please keep nval NN wrapping as a separate PR. thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plan9, I will keep the NN wrapping separate. I hope to add a commit to this branch later today (perhaps tomorrow) to provide meaningful names for the args.

@mpinjr mpinjr marked this pull request as draft September 30, 2023 14:47
Use the new struct Function in much the same way as struct Array
(to make an object's internals reachable via Cell->sval).

While recording argument names, plug the leak of arglist nodes. In
the following program, f's arglist is leaked at the time of g's
definition:

	function f(a,b,c) {} function g() {}

The leak happens when varlist assigns to arglist in awkgram.y.

Note that though the parsing and defining of a function are now
limited only by available memory, invocations are still limited to
NARGS (an int).
@mpinjr mpinjr marked this pull request as ready for review October 3, 2023 05:37
Copy link
Contributor Author

@mpinjr mpinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the commit message:

Note that though the parsing and defining of a function are now
limited only by available memory

That's incorrect. yylval.i limits it to INT_MAX.

@mpinjr
Copy link
Contributor Author

mpinjr commented Oct 4, 2023

I recommend against using that last commit (b5e23fd). There are a few things about it that require tidying up. Also, what I had thought of as an unrelated branch of work on temp cells may intersect with this. I would like a little time to explore that.

The anonymous arg cells should be okay for now. There are already anonymous cells (CTEMP mostly) floating around when compile_time == RUNNING. If printing a NULL nval causes a crash on a non-BSD/Linux system, it would have served to uncover a print statement that needs fixing.

As @plan9 mentioned, there's already plenty to digest here. If we're in agreement, should I force push to my branch a reset to the previous HEAD or just leave things as they are? (I know how to use git, but github not so much).

Take care,
Miguel

@millert
Copy link
Contributor

millert commented Oct 4, 2023

@mpinjr I think it is fine to go ahead with the anonymous arg cells. I can work on a PR to add missing NN wrapping if you'd like.

@mpinjr
Copy link
Contributor Author

mpinjr commented Oct 4, 2023

@millert Thanks, but it's possible that it would be a waste of time. Here's what I have in mind...

As you know, it's not safe to call execute if there exists an unhandled cell from an earlier execute. To avoid use-after-free (and other) bugs, whatever is needed from the first execute's cell must be obtained before the next execute begins.

It occurred to me that if this discipline were applied consistently, we would only need one CTEMP cell. That cell could be statically allocated with access to it guarded by a boolean. An attempt to acquire it while busy would be a fatal error.

(Like the function calling work, this is one of the ideas I was exploring last year.)

So, if function arguments were named and if there were only a single tempcell, the implementation would be left with just 9 anonymous cells (defined near the top of run.c). Those could be named and then wrapping nval would be undesirable (as it would prevent sensitive platforms from denouncing the existence of an anonymous cell that should not be).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants