-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[P4fmt]: attaching comments to IR Nodes #4845
base: main
Are you sure you want to change the base?
Conversation
82edb45
to
37344eb
Compare
This takes care of attachment of comments to basic parent Node types(Typedef, Type_Name, control blocks, structs etc). |
backends/p4fmt/attach.cpp
Outdated
} | ||
|
||
//////////////////////// TYPES //////////////////// | ||
bool Attach::preorder(const IR::Type_Bits *t) { return preorderImpl(t); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the criterion for a supported node here? Otherwise I would use IR::Statement
, IR::Type_Declaration
, or IR::StatOrDecl
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the algorithm presented in the blog post (the Bazel approach) is that the preorder traversal will visit all AST nodes, and it can pick the right AST node to attach comments to automatically. The first node with a comment right before it will get that comment attached to it, and that comment will be removed from the list of comments to be attached, so that any child AST node on the same line won't get the same comment attached again.
This algorithm can guarantee a pretty reasonable attachment of line comments. And a similar algorithm can be applied in a postorder traversal to attach the inline trailing/suffix comments.
@fruffy I don't have enough knowledge on this, but is there a more convenient way in P4C to express that I want to do the same thing for all AST nodes regardless of its type? Maybe as you mentioned in another comment, adding IR::Node
to a preorder
is the way to go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think enumerating all/most node types is a good idea (the fact the actual implementation is the same is a good hint for that!). It defeats the purpose of having the nodes in a class hierarchy and it would make this very fragile to IR node addition. From what @qobilidop described as the high-level algorithm, it might even work to have just a single preorder(Node *)
and postoder(Node *)
. Comment attachment does not really seem to dependent on whether the node is e.g. an addition or subtraction, it even does not seem depend on whether it is e.g. an expression or a type definition. Even in cases of comments inside statements like x = foo(/* block size */ a + /* block count */ b); // get ...
one would probably get a good enough comment attachment (the only question is really if the trailing comment belongs to the assignment or the call (where postorder
would likely place it. But I guess that can't be really said in general and either will be good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the point of the compiler in general, I think there should not the the mutable
in the SourceInfo
and I am concern if this could impact compiler performance.
Other than that, I don't think naming all the node types is a good approach.
lib/source_file.h
Outdated
mutable std::vector<Util::Comment *> before; | ||
mutable std::vector<Util::Comment *> after; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another point -- this make the SourceInfo
, which is ubiquitous in the compiler, quite a bit larger (by 6 pointers with the common impl of vector
). It would be good to have some benchmarks to see this does not unduly show down the compiler (not p4fmt
backend) or increase its memory consumption too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we'd go this way, I'd strongly suggest using small vectors that could hold up to 1 entry inline. But overall I concur that enlarging every IR node by this amount should be carefully benchmarked. We do create lots of nodes. There is huge malloc traffic everywhere. And while another 48 bytes might look small, it is an overhead paid on every node. Even if there are zero comments.
Also, comments are quite rare and here the price is paid by every node. Maybe the better solution is to use the technique similar to TrailingObjects
in LLVM / clang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of storing those prefix and suffix comments as a part of SourceInfo
, I was thinking how about storing that extra object in a hashmap, Hashmap<node-id, extra-comment-object>
. This would avoid the unnecessary overhead to the compiler.
Would this be a good and straightforward solution for the time being?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is your plan to update this side map on IR change? Overall, side long-living maps should be discouraged as they could easily go stale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to add some more context. We were thinking about using this side map only in p4fmt. With this constraint, there probably isn't any IR change expected I guess.
@snapdgn has also looked into using TrailingObjects
. I agree that would be a better solution. My suggestion is to go with a simple solution for now, to unblock further p4fmt experimentation, as long as the solution doesn't influence the rest of P4C. Then switching to the TrailingObjects
technique could be left as a future optimization.
@asl What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this side map is local, then certainly we're fine.
backends/p4fmt/attach.cpp
Outdated
} | ||
|
||
//////////////////////// TYPES //////////////////// | ||
bool Attach::preorder(const IR::Type_Bits *t) { return preorderImpl(t); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think enumerating all/most node types is a good idea (the fact the actual implementation is the same is a good hint for that!). It defeats the purpose of having the nodes in a class hierarchy and it would make this very fragile to IR node addition. From what @qobilidop described as the high-level algorithm, it might even work to have just a single preorder(Node *)
and postoder(Node *)
. Comment attachment does not really seem to dependent on whether the node is e.g. an addition or subtraction, it even does not seem depend on whether it is e.g. an expression or a type definition. Even in cases of comments inside statements like x = foo(/* block size */ a + /* block count */ b); // get ...
one would probably get a good enough comment attachment (the only question is really if the trailing comment belongs to the assignment or the call (where postorder
would likely place it. But I guess that can't be really said in general and either will be good enough.
2aff0aa
to
da4b314
Compare
Comments are stored in a side map for now, as discussed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick round of comments.
It would be good to have a reference output here. We can use the checker that you wrote. This way we can see the current behavior.
There are merge conflicts with the main branch. Please rebase/merge and resolve them. |
b87606d
to
0dfc44a
Compare
Almost all of the suggestions and comments have been addressed. I'm planning on taking care of optimizations in a separate PR. (Sorting comments for easy lookup, etc). I would appreciate a final review to confirm if this is ready for merging. |
Signed-off-by: Nitish <[email protected]>
Signed-off-by: Nitish <[email protected]>
Signed-off-by: Nitish <[email protected]>
Signed-off-by: Nitish <[email protected]>
Signed-off-by: Nitish <[email protected]>
Signed-off-by: Nitish <[email protected]>
Signed-off-by: Nitish <[email protected]>
Signed-off-by: Nitish <[email protected]>
Signed-off-by: Nitish <[email protected]>
Signed-off-by: Nitish <[email protected]>
Signed-off-by: Nitish <[email protected]> Signed-off-by: Nitish <[email protected]>
Signed-off-by: Nitish <[email protected]>
e4677d2
to
4dfe245
Compare
backends/p4fmt/p4fmt.cpp
Outdated
|
||
namespace P4::P4Fmt { | ||
|
||
std::optional<std::pair<const IR::P4Program *, const Util::InputSources *>> parseProgram( | ||
const ParserOptions &options) { | ||
auto *file = fopen(options.file.c_str(), "r"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of fopen
here? Just for "no such file"
diagnostics as file
is unused below? What does parseProgramSources
reports in case no input file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to std::filesystem::exists
, if that helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need this? What does parseProgramSources repors in case non-existent input file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is necessary as parseProgramSources
doesn't report anything if there's no input file, it just fails to parse and returns null. There are no other checks to validate this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments. There are no tests as well.
@qobilidop See my comments on concerns |
Signed-off-by: Nitish <[email protected]>
The tests are intended to be included as part of this PR. |
Signed-off-by: Nitish <[email protected]>
private: | ||
/// This Hashmap tracks each comment’s attachment status to IR nodes. Initially, all comments | ||
/// are set to 'false'. | ||
std::unordered_map<const Util::Comment *, bool> processedComments; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need bool
value here? Why can't you simply have std::unordered_set<Util::Comment *>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I deleted the comment from the set once it was attached, but modifying the container while iterating over it didn't seem like a good approach. I could use a separate 'Used Comments' set for this, if that seems better? Open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see why you need to modify this set during the iteration. Essentially, your code is:
for (auto &[comment, isAttached] : processedComments) {
// Skip if already attached
if (isAttached) {
continue;
}
switch (...) {
case A:
...
isAttached = true;
case B:
...
isAttached = true;
default:
error();
}
}
Why do you need to modify anything?
- The first
continue
seems to be redundant to me, as you always either attaching everything or bail out - You are always attaching the whole set.
So why can't you simply attach everything and then do processedComments.clear()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the dealyed reply, missed the notification :( . Also Sorry for not clarifying enough earlier. processedComments
is a shared list of all the comments from the file that may get attached to different nodes. I'm using the marking thing to ensure that the same comment won't get attached to another node later. Wouldn't clearing it remove everything from the container, making it difficult to use them for attaching comments to other nodes in the future ? Please let me know if I'm still missing something. :)
No description provided.