-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util: add getCWDURL
internal util fn
#48434
Conversation
Review requested:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be applied to all usage of cwd
within internal/modules
. I think there are 2 others where it's applicable:
node/lib/internal/modules/cjs/loader.js
Line 1028 in bd7a808
const parentPath = parent?.filename ?? process.cwd() + path.sep; |
node/lib/internal/modules/esm/resolve.js
Line 1021 in bd7a808
parentURL = pathToFileURL(`${process.cwd()}/`).href; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case CI needs me to re-approve after your push (sometimes it does, sometimes it doesn't 🤪)
lib/internal/util.js
Outdated
/** | ||
* Get the current working directory while accounting for the possibility that it has been deleted. | ||
* `process.cwd()` can fail if the parent directory is deleted while the process runs. | ||
* @returns {string} The current working directory or the volume root if it cannot be determined. | ||
*/ | ||
function getCwdSafe() { | ||
const { sep } = require('path'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be cached at module level, since users can change it?
/** | |
* Get the current working directory while accounting for the possibility that it has been deleted. | |
* `process.cwd()` can fail if the parent directory is deleted while the process runs. | |
* @returns {string} The current working directory or the volume root if it cannot be determined. | |
*/ | |
function getCwdSafe() { | |
const { sep } = require('path'); | |
const { sep } = require('path'); | |
/** | |
* Get the current working directory while accounting for the possibility that it has been deleted. | |
* `process.cwd()` can fail if the parent directory is deleted while the process runs. | |
* @returns {string} The current working directory or the volume root if it cannot be determined. | |
*/ | |
function getCwdSafe() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a legitimate reason to change this and expect it to continue to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Jordan: it's called getCwdSafe
, let's make it safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay sure. But what is the use-case of a user changing sep and expecting things to work? Wouldn't that break heaps of stuff in node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the word “safe” in the function name causes this friction. May I recommend removing the word safe from it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no use case I'm aware of to do it, but the platform shouldn't be breakable by user action.
That sounds out of scope then: even if we do this here, node will have exploded long before reaching this. If we need to make sep
safe, I think that should be handled all together (in a different PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mean sure, but requiring path at module level seems like the default and idiomatic thing to do; requiring things in function bodies is usually only done when using a module-level caching mechanism. why require it inline here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's imagine a user has this snippet somewhere in their code:
Object.defineProperty(require("node:path"), "sep", { __proto__:null, get() { throw this } });
It makes getCwdSafe
throw, when the whole point of the function is to never throw.
node will have exploded long before reaching this
Assuming this is true (I don't think it is), we should fix the other places rather than taking it as a reason for not making internals more robust 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why require it inline here?
Ack; when are we supposed to require it JIT? I thought the point of Joyee's huge PR was to reduce startup overhead (which is incurred by root-level requires).
when the whole point of the function is to never throw
Ah, okay. I do see your point: if that get() → throw is triggered outside, it isn't triggered by the "safe" util.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the point of Joyee's huge PR was to reduce startup overhead (which is incurred by root-level requires)
That's true for non-snapshoted modules, node:path
is part of the snapshot so the overhead is the same at the top-level as it is inlined. My understanding is that lazy-loading modules already in the snapshot is slightly worth than no lazy-loading, but that's beside the point: we can make this function actually safe, so we should do it IMO :)
The snapshot creation is explained in
node/lib/internal/bootstrap/node.js
Lines 7 to 16 in 4ff55ec
// By default, Node.js binaries come with an embedded V8 startup snapshot | |
// that is generated at build-time with a `node_mksnapshot` executable. | |
// The snapshot generation code can be found in `SnapshotBuilder::Generate()` | |
// from `src/node_snapshotable.cc`. | |
// This snapshot captures the V8 heap initialized by scripts under | |
// `lib/internal/bootstrap/`, including this file. When initializing the main | |
// thread, Node.js deserializes the heap from the snapshot, instead of actually | |
// running this script and others in `lib/internal/bootstrap/`. To disable this | |
// behavior, pass `--no-node-snapshot` when starting the process so that | |
// Node.js actually runs this script to initialize the heap. |
We have a test to assert that no new modules are added to the bootstrap sequence, and you can see that node:path
is already there:
'NativeModule path', |
when are we supposed to require it
I'd say at the top-level, like Jordan suggested.
@JakobJingleheimer I think this issue still exists. Ref: nodejs/node-core-utils#677 |
lib/internal/util.js
Outdated
function getCwdSafe() { | ||
const { sep } = require('path'); | ||
let cwd = ''; | ||
|
||
try { | ||
cwd = process.cwd(); | ||
} catch { | ||
/**/ | ||
} | ||
|
||
return cwd + sep; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the previous PR, let’s change this to always return the same as process.cwd()
(so no trailing slash) and update all the call sites to expect as such.
With regard to path.sep
, I don’t think we need to use it? Using /
as the fallback should work everywhere, including Windows, I would think? Then that avoids the prototype issue.
function getCwdSafe() { | |
const { sep } = require('path'); | |
let cwd = ''; | |
try { | |
cwd = process.cwd(); | |
} catch { | |
/**/ | |
} | |
return cwd + sep; | |
} | |
function getCwdSafe() { | |
try { | |
return process.cwd(); | |
} catch { | |
return '/'; | |
} | |
} |
Or does this need the trailing slash because if the fallback is /
, then the code using this needs to assume a trailing slash because when the fallback is returned it would have a trailing slash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, that's going to create path that look like C:\Users/
or //
instead of C:\Users\
and /
. It's going to make this change breaking for no good reason.
Also it's not a prototype issue that we are discussing above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, if we need the trailing slash at the end for some reason, we should do something like ${getCwdSafe()}/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree
With using /
instead of path.sep
, or with this not returning a trailing slash?
I’m fine with using path.sep
if you have a way of doing so safely, which I’m sure you do. As for the trailing slash, if it’s required because of the fallback, then we should call this function getCwdWithTrailingSlash
or getCwdWithTrailingSlashSafe
to make it clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bunch of tests are failing locally with the above implementation, @aduh95 is right.
I have experienced some troubles in the old PR when using process.cwd()
without the trailing slash at the end, but we are talking about a safe process.cwd()
, and not safe process.cwd()/
.
All of my changes do need the trailing slash at the end, but we will certainly have scenarios where we will not need it, shouldn't we remove it and add only where it's really necessary?
Or maybe we should add an argument to the function?
function getCwdSafe(endTrailingSlash = false) {
let cwd = '';
try {
cwd = process.cwd();
if (endTrailingSlash) {
cwd += sep;
}
return cwd;
} catch {
return sep;
}
}
EDIT: Or do something like @GeoffreyBooth said, getCwdWithTrailingSlashSafe()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want a
getSafeCwdURL
util, as I think that’s where the trailing slash is very important?
All but one of the call sites want a URL. Maybe this could be getSafeCwdURL
like this?
function getCwdSafe() { | |
const { sep } = require('path'); | |
let cwd = ''; | |
try { | |
cwd = process.cwd(); | |
} catch { | |
/**/ | |
} | |
return cwd + sep; | |
} | |
function getSafeCwdURL() { | |
try { | |
return pathToFileURL(process.cwd()); | |
} catch { | |
return new URL('file:///'); | |
} | |
} |
And the one call site that wants a path could just have the old logic inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GeoffreyBooth If the result of getSafeCwdURL
is not mutated, we can memorize the new URL('file:///')
creation for better performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the result of
getSafeCwdURL
is not mutated, we can memorize thenew URL('file:///')
creation for better performance.
The fallback is used very, very rarely: only if the cwd has been deleted since the Node app started. So it may not be worth the memory used to save it.
The value of pathToFileURL(process.cwd())
, on the other hand, would be worth memorizing assuming that process.cwd
cannot change after the app starts? Either as an instantiated URL object or as a string URL. If this gets called often enough, it might be worth creating something like process.cwdFileURL
that returns the cwd as a string URL directly from C++?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a really good optimization @GeoffreyBooth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file:///
is not a valid file: URL for Windows unfortunately, we'd have to use pathToFileURL('/')
to be portable unfortunately.
[90m at Module._compile (node:internal*modules*cjs*loader:1256:14)[39m | ||
[90m at Module._extensions..js (node:internal*modules*cjs*loader:1310:10)[39m | ||
[90m at Module.load (node:internal*modules*cjs*loader:1114:32)[39m | ||
[90m at Module._load (node:internal*modules*cjs*loader:961:12)[39m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MoLow shouldn’t these line/column numbers be obscured so that unrelated changes don’t require updates to snapshots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Maybe we should replace them with *
Something we should discuss is whether the fallback behavior in the original const parentPath = parent?.filename ?? getCwdSafe();
const pkg = readPackageScope(parentPath) || {}; What What if instead we cached the CWD on application startup and just always returned that? Then there would be no need for the fallback or the |
Would |
My goal when creating I agree that it should be a case-by-case decision, but we have to pay attention to the fact that I'm not using So maybe I should remove it from: node/lib/internal/modules/cjs/loader.js Line 1029 in eb37022
And: node/lib/internal/modules/esm/resolve.js Line 1022 in eb37022
Also, IMO, renaming it to something like |
The one in Since all the others expect a URL, you could make the helper function return a URL. Looking a little closer, all three of the remaining call sites are using the helper to define const cwdAtStartup = pathToFileURL(process.cwd());
function getCwdAtStartup() {
return cwdAtStartup;
} And I don’t think we need a So then the question becomes, does anything break if we define a What do others think? |
The CWD can be deleted by something that's not the application, in all likelihood Note that node/lib/internal/bootstrap/switches/does_own_process_state.js Lines 106 to 108 in ab89428
node/lib/internal/bootstrap/switches/does_own_process_state.js Lines 124 to 128 in ab89428
|
I think we need to create a util that looks like let cachedURL;
let cachedCWD;
function getCWDURL() {
let cwd;
try {
cwd = process.cwd() + sep;
} catch {}
if (cwd != null && cwd !== cachedCWD) {
cachedURL = pathToFileURL(cwd);
cachedCWD = cwd;
}
return cachedURL;
} I think that would take care of all the edge cases, while avoid re-creating |
If that’s the case, why/when would it ever throw? It tries to validate the path’s existence even though it’s been cached? So another solution could be |
No it always return the cached value when there's one, there's no validation, look at the code I shared above. There's no cached value when node doesn't own its own state (that's a build flag).
|
I’m asking about |
I'm not sure what to do to make myself clear, I'm soon going to run out of code snippets to share 😅 Wether if Lines 614 to 619 in 718f62b
it's going to affect which file between Lines 339 to 342 in ac0853c
In the default case (i.e. all the official binaries), node own its own state, meaning it can meaningfully cache the value of In the case where node is built by an embedder (i.e. Electron and the likes) that has turned off the built flag, node doesn't own its own state, so we can't cache and the raw libuv function (the one that can throw) is used every time. TL;DR I'd like to point out that I'm not an expert in |
To me, there are two issues here:
let parentURL;
try {
parentURL = pathToFileURL(cwd() + sep);
} catch {
/* Sometimes cwd() throws, in which case parentURL remains undefined */
}
// Following code should be able to handle a possibly undefined parentURL This can be abstracted if desired, but the same goal would apply: rather than returning the root of the volume as the parent URL when CWD cannot be determined, just return undefined; and the code using that return value should be able to continue working when that return value is undefined. Is such a refactoring possible @jlenon7? Alternatively, if it’s required that |
People, sorry for disappearing 😅. I was moving to Lisbon. Now things are much calmer and I can get back with this PR. I'm going to follow the implementation that @aduh95 mentioned, wdyt?
|
eb37022
to
0cfc616
Compare
This function was first implemented in nodejs#46826, but at some point of the PR implementation this fn was no longer related to the PR. Refs: nodejs#46826 (comment)
Use `getCwdSafe` in other scenarios of `internal/modules` package.
Increment the line number in `.snapshot` file since we have add the `require()` instruction of `getCwdSafe` in `internal/modules/cjs/loader.js` file.
Implement a function that can handle a second cachedCWD when Node.js process doesn't owns it own state.
b151075
to
269f624
Compare
Co-authored-by: Antoine du Hamel <[email protected]>
Co-authored-by: Antoine du Hamel <[email protected]>
Landed in c2cd744 |
This function was first implemented in #46826, but at some point of the PR implementation this fn was no longer related to the PR. Refs: #46826 (comment) PR-URL: #48434 Reviewed-By: Jacob Smith <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Antoine du Hamel <[email protected]>
This function was first implemented in nodejs#46826, but at some point of the PR implementation this fn was no longer related to the PR. Refs: nodejs#46826 (comment) PR-URL: nodejs#48434 Reviewed-By: Jacob Smith <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Antoine du Hamel <[email protected]>
This function was first implemented in nodejs#46826, but at some point of the PR implementation this fn was no longer related to the PR. Refs: nodejs#46826 (comment) PR-URL: nodejs#48434 Reviewed-By: Jacob Smith <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Antoine du Hamel <[email protected]>
This function was first implemented in nodejs#46826, but at some point of the PR implementation this fn was no longer related to the PR. Refs: nodejs#46826 (comment) PR-URL: nodejs#48434 Reviewed-By: Jacob Smith <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Antoine du Hamel <[email protected]>
This function was first implemented in #46826, but at some point of the PR implementation this fn was no longer related to the PR. Refs: #46826 (comment) PR-URL: #48434 Backport-PR-URL: #50669 Reviewed-By: Jacob Smith <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Antoine du Hamel <[email protected]>
This function was first implemented in #46826, but at some point of the PR implementation this fn was no longer related to the PR. Refs: nodejs/node#46826 (comment) PR-URL: nodejs/node#48434 Backport-PR-URL: nodejs/node#50669 Reviewed-By: Jacob Smith <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Antoine du Hamel <[email protected]>
This function was first implemented in #46826, but at some point of the PR implementation this fn was no longer related to the PR. Refs: nodejs/node#46826 (comment) PR-URL: nodejs/node#48434 Backport-PR-URL: nodejs/node#50669 Reviewed-By: Jacob Smith <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Antoine du Hamel <[email protected]>
getCWDURL
internal util generated by module: implementregister
utility #46826Refs: #46826 (comment)