[P4fmt]: attaching comments to IR Nodes #4845

snapdgn · 2024-08-01T14:14:55Z

No description provided.

backends/p4fmt/p4fmt.cpp

backends/p4fmt/attach.cpp

backends/p4fmt/attach.h

snapdgn · 2024-08-08T15:01:21Z

This takes care of attachment of comments to basic parent Node types(Typedef, Type_Name, control blocks, structs etc).
The Node List is not exhaustive and underdeveloped, and support for further nodes will be added gradually.
It currently attempts to store the comments related to each node in two lists, 'before' (holds comments that are directly above the nodes) and after(contains comments that are on the same line as the nodes).
Plan is to hold all the comments related to a node and attach them at all at appropriate positions based on column information.
The comment list will store a list of code points, with each entry containing a pair of the SourcePosition and the corresponding comment.

backends/p4fmt/attach.cpp

fruffy · 2024-08-09T06:56:43Z

backends/p4fmt/attach.cpp

+}
+
+//////////////////////// TYPES ////////////////////
+bool Attach::preorder(const IR::Type_Bits *t) { return preorderImpl(t); }


What's the criterion for a supported node here? Otherwise I would use IR::Statement, IR::Type_Declaration, or IR::StatOrDecl here?

My understanding of the algorithm presented in the blog post (the Bazel approach) is that the preorder traversal will visit all AST nodes, and it can pick the right AST node to attach comments to automatically. The first node with a comment right before it will get that comment attached to it, and that comment will be removed from the list of comments to be attached, so that any child AST node on the same line won't get the same comment attached again.

This algorithm can guarantee a pretty reasonable attachment of line comments. And a similar algorithm can be applied in a postorder traversal to attach the inline trailing/suffix comments.

@fruffy I don't have enough knowledge on this, but is there a more convenient way in P4C to express that I want to do the same thing for all AST nodes regardless of its type? Maybe as you mentioned in another comment, adding IR::Node to a preorder is the way to go?

I don't think enumerating all/most node types is a good idea (the fact the actual implementation is the same is a good hint for that!). It defeats the purpose of having the nodes in a class hierarchy and it would make this very fragile to IR node addition. From what @qobilidop described as the high-level algorithm, it might even work to have just a single preorder(Node *) and postoder(Node *). Comment attachment does not really seem to dependent on whether the node is e.g. an addition or subtraction, it even does not seem depend on whether it is e.g. an expression or a type definition. Even in cases of comments inside statements like x = foo(/* block size */ a + /* block count */ b); // get ... one would probably get a good enough comment attachment (the only question is really if the trailing comment belongs to the assignment or the call (where postorder would likely place it. But I guess that can't be really said in general and either will be good enough.

vlstill

From the point of the compiler in general, I think there should not the the mutable in the SourceInfo and I am concern if this could impact compiler performance.

Other than that, I don't think naming all the node types is a good approach.

lib/source_file.h

vlstill · 2024-08-12T07:24:45Z

lib/source_file.h

+    mutable std::vector<Util::Comment *> before;
+    mutable std::vector<Util::Comment *> after;


Another point -- this make the SourceInfo, which is ubiquitous in the compiler, quite a bit larger (by 6 pointers with the common impl of vector). It would be good to have some benchmarks to see this does not unduly show down the compiler (not p4fmt backend) or increase its memory consumption too much.

If we'd go this way, I'd strongly suggest using small vectors that could hold up to 1 entry inline. But overall I concur that enlarging every IR node by this amount should be carefully benchmarked. We do create lots of nodes. There is huge malloc traffic everywhere. And while another 48 bytes might look small, it is an overhead paid on every node. Even if there are zero comments.

Also, comments are quite rare and here the price is paid by every node. Maybe the better solution is to use the technique similar to TrailingObjects in LLVM / clang.

Instead of storing those prefix and suffix comments as a part of SourceInfo, I was thinking how about storing that extra object in a hashmap, Hashmap<node-id, extra-comment-object>. This would avoid the unnecessary overhead to the compiler.
Would this be a good and straightforward solution for the time being?

What is your plan to update this side map on IR change? Overall, side long-living maps should be discouraged as they could easily go stale

Just want to add some more context. We were thinking about using this side map only in p4fmt. With this constraint, there probably isn't any IR change expected I guess.

@snapdgn has also looked into using TrailingObjects. I agree that would be a better solution. My suggestion is to go with a simple solution for now, to unblock further p4fmt experimentation, as long as the solution doesn't influence the rest of P4C. Then switching to the TrailingObjects technique could be left as a future optimization.

@asl What do you think?

if this side map is local, then certainly we're fine.

lib/source_file.h

vlstill · 2024-08-12T07:43:45Z

backends/p4fmt/attach.cpp

+}
+
+//////////////////////// TYPES ////////////////////
+bool Attach::preorder(const IR::Type_Bits *t) { return preorderImpl(t); }


I don't think enumerating all/most node types is a good idea (the fact the actual implementation is the same is a good hint for that!). It defeats the purpose of having the nodes in a class hierarchy and it would make this very fragile to IR node addition. From what @qobilidop described as the high-level algorithm, it might even work to have just a single preorder(Node *) and postoder(Node *). Comment attachment does not really seem to dependent on whether the node is e.g. an addition or subtraction, it even does not seem depend on whether it is e.g. an expression or a type definition. Even in cases of comments inside statements like x = foo(/* block size */ a + /* block count */ b); // get ... one would probably get a good enough comment attachment (the only question is really if the trailing comment belongs to the assignment or the call (where postorder would likely place it. But I guess that can't be really said in general and either will be good enough.

snapdgn · 2024-08-22T07:17:27Z

Comments are stored in a side map for now, as discussed.

fruffy

Quick round of comments.

It would be good to have a reference output here. We can use the checker that you wrote. This way we can see the current behavior.

lib/source_file.cpp

backends/p4fmt/p4fmt.cpp

lib/source_file.h

backends/p4fmt/attach.cpp

backends/p4fmt/p4fmt.cpp

backends/p4fmt/attach.h

backends/p4fmt/attach.cpp

qobilidop · 2024-08-22T15:49:34Z

There are merge conflicts with the main branch. Please rebase/merge and resolve them.

snapdgn · 2024-09-07T18:14:40Z

Almost all of the suggestions and comments have been addressed.

I'm planning on taking care of optimizations in a separate PR. (Sorting comments for easy lookup, etc).

I would appreciate a final review to confirm if this is ready for merging.

Signed-off-by: Nitish <[email protected]>

Signed-off-by: Nitish <[email protected]> Signed-off-by: Nitish <[email protected]>

Signed-off-by: Nitish <[email protected]>

qobilidop · 2024-09-12T03:33:54Z

@asl @vlstill Any remaining concerns? I'll try to merge this if no new issues are raised.

backends/p4fmt/attach.cpp

backends/p4fmt/attach.h

asl · 2024-09-12T20:54:11Z

backends/p4fmt/p4fmt.cpp


 namespace P4::P4Fmt {

+std::optional<std::pair<const IR::P4Program *, const Util::InputSources *>> parseProgram(
+    const ParserOptions &options) {
+    auto *file = fopen(options.file.c_str(), "r");


what is the purpose of fopen here? Just for "no such file" diagnostics as file is unused below? What does parseProgramSources reports in case no input file?

Switched to std::filesystem::exists , if that helps?

Do you really need this? What does parseProgramSources repors in case non-existent input file?

I believe this is necessary as parseProgramSources doesn't report anything if there's no input file, it just fails to parse and returns null. There are no other checks to validate this.

backends/p4fmt/p4fmt.cpp

asl

See comments. There are no tests as well.

backends/p4fmt/attach.h

asl · 2024-09-12T21:06:37Z

@asl @vlstill Any remaining concerns? I'll try to merge this if no new issues are raised.

@qobilidop See my comments on concerns

Signed-off-by: Nitish <[email protected]>

snapdgn · 2024-09-13T21:00:05Z

See comments. There are no tests as well.

The tests are intended to be included as part of this PR.

Signed-off-by: Nitish <[email protected]>

asl · 2024-09-27T22:17:41Z

backends/p4fmt/attach.h

+ private:
+    /// This Hashmap tracks each comment’s attachment status to IR nodes. Initially, all comments
+    /// are set to 'false'.
+    std::unordered_map<const Util::Comment *, bool> processedComments;


Do you really need bool value here? Why can't you simply have std::unordered_set<Util::Comment *>?

Initially, I deleted the comment from the set once it was attached, but modifying the container while iterating over it didn't seem like a good approach. I could use a separate 'Used Comments' set for this, if that seems better? Open to suggestions.

I do not see why you need to modify this set during the iteration. Essentially, your code is:

for (auto &[comment, isAttached] : processedComments) { // Skip if already attached if (isAttached) { continue; } switch (...) { case A: ... isAttached = true; case B: ... isAttached = true; default: error(); } }

Why do you need to modify anything?

The first continue seems to be redundant to me, as you always either attaching everything or bail out

You are always attaching the whole set.

So why can't you simply attach everything and then do processedComments.clear()?

Sorry for the dealyed reply, missed the notification :( . Also Sorry for not clarifying enough earlier. processedComments is a shared list of all the comments from the file that may get attached to different nodes. I'm using the marking thing to ensure that the same comment won't get attached to another node later. Wouldn't clearing it remove everything from the container, making it difficult to use them for attaching comments to other nodes in the future ? Please let me know if I'm still missing something. :)

ChrisDodd reviewed Aug 1, 2024

View reviewed changes

backends/p4fmt/p4fmt.cpp Show resolved Hide resolved

asl reviewed Aug 1, 2024

View reviewed changes

backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved

asl reviewed Aug 1, 2024

View reviewed changes

backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved

asl reviewed Aug 1, 2024

View reviewed changes

backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved

asl requested changes Aug 1, 2024

View reviewed changes

snapdgn force-pushed the comments-handling branch 3 times, most recently from 82edb45 to 37344eb Compare August 8, 2024 13:13

snapdgn marked this pull request as ready for review August 8, 2024 13:14

snapdgn marked this pull request as draft August 8, 2024 14:33

snapdgn marked this pull request as ready for review August 8, 2024 14:43

snapdgn marked this pull request as draft August 8, 2024 14:49

qobilidop reviewed Aug 8, 2024

View reviewed changes

backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved

qobilidop reviewed Aug 8, 2024

View reviewed changes

backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved

fruffy reviewed Aug 9, 2024

View reviewed changes

vlstill requested changes Aug 12, 2024

View reviewed changes

snapdgn force-pushed the comments-handling branch 4 times, most recently from 2aff0aa to da4b314 Compare August 20, 2024 08:30

snapdgn marked this pull request as ready for review August 21, 2024 05:52

fruffy reviewed Aug 22, 2024

View reviewed changes

qobilidop reviewed Aug 22, 2024

View reviewed changes

snapdgn force-pushed the comments-handling branch 4 times, most recently from b87606d to 0dfc44a Compare August 26, 2024 10:04

qobilidop requested review from fruffy and vlstill September 7, 2024 18:16

fruffy approved these changes Sep 8, 2024

View reviewed changes

snapdgn added 12 commits September 9, 2024 14:24

[P4fmt]: attach comments

21c7fac

Signed-off-by: Nitish <[email protected]>

fix: resolve issue with modifying container during iteration

64174bf

Signed-off-by: Nitish <[email protected]>

refactor: rename variables, remove unused headers

13ba1f1

Signed-off-by: Nitish <[email protected]>

doc comments

9a10d7d

Signed-off-by: Nitish <[email protected]>

use node's clone-id

4924399

Signed-off-by: Nitish <[email protected]>

rebase & resolve nits

781c810

Signed-off-by: Nitish <[email protected]>

expose IHasSourceInfo for the Comment class

d67ad3c

Signed-off-by: Nitish <[email protected]>

extract comments from InputSources instead of partial SourceInfo

629bab8

Signed-off-by: Nitish <[email protected]>

remove extraneous method

b534708

Signed-off-by: Nitish <[email protected]>

fix missing toString() override

5440d9d

Signed-off-by: Nitish <[email protected]>

chore: comment fixes & minor refactoring

6413157

Signed-off-by: Nitish <[email protected]> Signed-off-by: Nitish <[email protected]>

refactor parseProgram fn

4dfe245

Signed-off-by: Nitish <[email protected]>

snapdgn force-pushed the comments-handling branch from e4677d2 to 4dfe245 Compare September 9, 2024 08:55

asl reviewed Sep 12, 2024

View reviewed changes

backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved

asl reviewed Sep 12, 2024

View reviewed changes

backends/p4fmt/attach.h Outdated Show resolved Hide resolved

asl reviewed Sep 12, 2024

View reviewed changes

backends/p4fmt/p4fmt.cpp Outdated Show resolved Hide resolved

asl reviewed Sep 12, 2024

View reviewed changes

backends/p4fmt/attach.h Outdated Show resolved Hide resolved

resolve: nits & suggestions

622d1bd

Signed-off-by: Nitish <[email protected]>

switch to Inspector pass

19d55e4

Signed-off-by: Nitish <[email protected]>

snapdgn requested a review from asl September 24, 2024 15:50

asl reviewed Sep 27, 2024

View reviewed changes

qobilidop added the p4fmt Topics related to P4 formatter. label Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P4fmt]: attaching comments to IR Nodes #4845

[P4fmt]: attaching comments to IR Nodes #4845

snapdgn commented Aug 1, 2024

snapdgn commented Aug 8, 2024 •

edited

Loading

fruffy Aug 9, 2024

qobilidop Aug 10, 2024

vlstill Aug 12, 2024

vlstill left a comment

vlstill Aug 12, 2024

asl Aug 12, 2024 •

edited

Loading

snapdgn Aug 16, 2024 •

edited

Loading

asl Aug 16, 2024

qobilidop Aug 16, 2024

asl Aug 16, 2024

vlstill Aug 12, 2024

snapdgn commented Aug 22, 2024

fruffy left a comment

qobilidop commented Aug 22, 2024

snapdgn commented Sep 7, 2024 •

edited

Loading

qobilidop commented Sep 12, 2024 •

edited

Loading

asl Sep 12, 2024 •

edited

Loading

snapdgn Sep 13, 2024

asl Sep 13, 2024

snapdgn Sep 14, 2024 •

edited

Loading

asl left a comment •

edited

Loading

asl commented Sep 12, 2024

snapdgn commented Sep 13, 2024 •

edited

Loading

asl Sep 27, 2024

snapdgn Oct 8, 2024

asl Oct 8, 2024

snapdgn Oct 12, 2024 •

edited

Loading

		mutable std::vector<Util::Comment *> before;
		mutable std::vector<Util::Comment *> after;

[P4fmt]: attaching comments to IR Nodes #4845

Are you sure you want to change the base?

[P4fmt]: attaching comments to IR Nodes #4845

Conversation

snapdgn commented Aug 1, 2024

snapdgn commented Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vlstill left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asl Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

snapdgn Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

snapdgn commented Aug 22, 2024

fruffy left a comment

Choose a reason for hiding this comment

qobilidop commented Aug 22, 2024

snapdgn commented Sep 7, 2024 • edited Loading

qobilidop commented Sep 12, 2024 • edited Loading

asl Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

snapdgn Sep 14, 2024 • edited Loading

Choose a reason for hiding this comment

asl left a comment • edited Loading

Choose a reason for hiding this comment

asl commented Sep 12, 2024

snapdgn commented Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

snapdgn Oct 12, 2024 • edited Loading

Choose a reason for hiding this comment

snapdgn commented Aug 8, 2024 •

edited

Loading

asl Aug 12, 2024 •

edited

Loading

snapdgn Aug 16, 2024 •

edited

Loading

snapdgn commented Sep 7, 2024 •

edited

Loading

qobilidop commented Sep 12, 2024 •

edited

Loading

asl Sep 12, 2024 •

edited

Loading

snapdgn Sep 14, 2024 •

edited

Loading

asl left a comment •

edited

Loading

snapdgn commented Sep 13, 2024 •

edited

Loading

snapdgn Oct 12, 2024 •

edited

Loading