-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve link checker (possible rewrite) #1838
Comments
Another failure today: https://github.com/iterative/dvc.org/actions/runs/289778559 And I got an email from my name. |
This comment has been minimized.
This comment has been minimized.
https://dev.to/bnb/markdown-link-checking-in-github-with-actions-5dp0 - can be useful potentially |
Great resource! Making our own link check is shockingly tough, so leaning on something like Linkinator would be better for everyone if we can do it. |
I think it's worth noting that Check links with linkcheck is the most starred link check action on the marketplace by far, and it claims feats such as "It is 40 times faster than the only tool that goes to the same depth (linkchecker).". Once we start work on this, I think it's worth following the stars and giving |
This is the highest priority issue (somewhat) related to the website. I put it at the top in the project for now. Are there plans to work on it? Should we change the priority otherwise? Or are there any other website/engine-related issues you have under your radar, e.g. maybe #1063, iterative/gatsby-theme-iterative#156 (would be my personal pick as I keep bumping with that), #2394, #1115, or anything else from here ? Cc @rogermparent @julieg18 thanks |
I don't think this has even been considered for a while, so I suppose it's not quite a p0. Unless we decide to pick this up, it may stand to be demoted to p1- the link checker is considered important, so hopefully it doesn't fall by the wayside. If someone were to pick the link checker back up and fix its core issues described in the OP, it may warrant a rewrite that leans more into the CLI interface and lets programs use that- using GHA Checks directly caused 90% of pain on the original run of that project. |
I can take a look at this after the dataset/scenario updates, in two weeks.
|
It looks like the Rust library behind https://github.com/lycheeverse/lychee-action is modifiable even if the features don't cover our use cases. |
I can also use the tool @rogermparent mentioned. It's a Dart library and though may not cover everything we need, I can use it in a custom library. If you would like a link-checker from scratch, I can write it too. |
@iesahin My plan of attack would probably be:
While |
These steps are similar to what I planned. @rogermparent I'll test these 2 first and see if they are good enough. I think a thin wrapper around them may be needed at most. |
I started looking into other link checkers and it looks like 🤔 Though @rogermparent, are we wanting to replace our custom link-check entirely or have it use something like |
My preference would be to use That said, we can stand to change/omit some aspects of the current link check that are harder to replicate- for example, instead of diff mode grabbing links directly from the diff we can just check all links in modified files, since IIRC |
Makes sense that a wrapper would be needed! Though I'm inexperienced in this and I'm not sure what steps to take to create this "wrapper"... First thing that comes to mind is running The other way I can think of is to use Does anyone know of the best way to create this "wrapper"? cc @iterative/websites |
The best wrapper would be no wrapper, followed by as minimal a wrapper as possible. Hopefully we'll be able to use A few examples: If we want to run only on files added in the diff, I would think the ideal implementation would be a simple GitHub Actions workflow that gets a list of those added files via For something like keeping track of consecutive fails, we could run If we wanted to limit checked links to only those in the diff and ignore other links in modified files, we may be able to use There's also the option of adding If we must drop down into a node script that executes |
Thanks for the info and examples! I'll start working on a github action that works with |
Experimented with But it has a couple bugs:
The main issue is the 429, and I've got a couple of ideas on how we could fix this:
|
It's unfortunate that Lychee seemingly doesn't have per-URL configuration; we got around this with the current link-check by limiting concurrency to 1 and rate-limiting to one GH check per second, but doing so with Lychee would require we either wrap it or limit the rate/concurrency of all links. Looks like there's another workaround for GitHub specificially where you can use an access token to authenticate and ease the rate limiting greatly. This works for GitHub specifically, but thinking about this happening with other sites is a little worrying.
We only handle this with the current link checker by using a lot of exclusions, which should be able to translate into Lychee well enough |
We are using the |
I didn't notice that, I see the lychee docs suggest using the |
I think a big improvement we can get is focusing on CLI, as we can go from there to any other reporting mechanism like GitHub Actions and simplify away the issue of having multiple build types for CLI and GHA as well as possible usage on other platforms. |
Link checker seems to be broken in certain scenarios, has false positives, noisy reporting:
Ivan Shcheklein
Test on fork, test of force push.
The text was updated successfully, but these errors were encountered: