-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requests for comments: how does opam-repository scale? #23789
Comments
As a first observation: the opam repository is a git repository, so anyone who wants to install old releases can always use an old commit. One idea I discussed with @mro is to add an expiration date on each opam package when publishing, i.e. "this is being maintained for 3 months". I was initially skeptical, but think this is a fine idea since various contributors (authors) and companies have different release cycles, and for me personally code that I don't use and touch for several years should either be released as stable (with a 1.x) and expiration of infinite; or will bitrot. And for reference, I found these PRs and their discussion pretty insightful: Especially the comment #11400 (comment) is worth reading, where different people have different opinions (in respect to whether it "is good for reproducibility to remove packages"). And note to myself, there's |
Looks like nobody else cares (and Louis cared earlier, as outlined in closed issues and pull request). Closing. |
I do actually care, but I don't have a good solution yet and, worse, there is not yet a real consensus on what to do. I hope we will settle on some periodic stable releases and an archive repository where old packages go to rest, or some other compromise that will make things slimmer. |
If you ask me we should leave this open as an opportunity to get more ideas. There was also a discuss (or even more than one) thread where we had a discussion about this. If I can find it, I'll link it |
I also care. But what I would like to see before deleting packages is a process to move these deprecated/unmaintained packages to a separate repository. Keeping them around is really helpful when you need to test large-scale language features (for new release, but also to check langage statistics for old releases). I think the main blocker for this right now is to have the appropriate tooling (for instance to split an existing opam-repo easily). |
I doubt this is a tooling issue to be honest, since there's It feels more like a policy decision about what should opam-repository achieve - together with what compilers to support (see related issues #24868 ocaml/opam#5748 -- but also ocaml/infrastructure#87 ocaml/infrastructure#48).
What prevents you from adding a tag before deletion of anything in this repository, and establish a (e.g.) quarterly routine on what to delete (i.e. have a quarterly tag, 2023Q1..Q4)? Maybe it is just me, using a ~10 year old laptop, who suffers from this issue -- but it looks like the "opam.ocaml.org update job" is having quite some trouble (related to the amount of packages). To have a concrete proposal that can be discussed:
Yes, this will break "opam lockfiles" - in theory at least. In practise, feel free to point to hyperlinks of lockfiles that are broken by such a mechanism (and yes, I really really think these lockfiles need to be fixed to include the repository commit so that they are sustainable). How can we "discuss" such a policy? Is this issue good for it? How can we reach something that we then act on? Or will we just sit down and wait until it became so annoying that nobody is using it anymore? |
On opam’s side we’re thinking about adding a compressed format for the repository after 2.2.0 (ocaml/opam#5648). Such format would eliminate the issue for For the solver/clutter/git-growth/… though the problem still remains and removing packages could be a solution. It is also to note that we currently have nobody knowledgeable enough to work on the solver on a regular manner. |
What I do not really like with tags and saying that you can always look back at the history to install old packages is that we are constantly improving the quality of the metadata of opam-repository. So when you use an old opam-repo, you loose these improvements unless you have a (custom) merge of this old repo with the live repo. My ideal workflow would be to have layered repositories with a well-defined merge strategy - so our automated system could easily test old packages with recent metadata fixes. |
So a few more details on what I have in mind. I'm describing an hypothetical worfklow - we can get there incrementally or not.
What I am trying to get is some kind of guarantees that unstable/old/deprecated package metadata continue to somehow be maintained but without putting too much hassle on our CI infrastucture and on opam-repo gatekeepers. And I'm also keen to have a system where "submitting a simple package for fun" doesn't add more load to our system/process. What do you think? |
I am new and started volunteer work in opam admin a few weeks ago. I haven't done much yet except for just attending a few weekly meetings. I shall have more time from this week. I am so glad to see this issue discussing the scaling problem. Here are some topics appearing in your discussions and my opinion. Retiring old packages I don't agree with retiring old packages directly. The direct reason is we cannot guarantee the packages will observe the plausible invariant from semantic versioning so that e.g. projects or libraries depending on 6.0.0 can always run without problems after changing dependency to 6.0.1. Reproducibility is such a painful problem. However, I do agree that we should suggest to use a better alternative of a package version. No matter when users try installing a package with a non-optimal version, or opam ( So my point is the package versioning suggests a good package choice but not reliable. That's the reason lockfiles are widely used. Using GitHub as the backend Another unclear point to me is when evaluating a design suggestion above, shall we assume that opam repo will stick to using GitHub (git) backend or is it also discussable? It will be helpful to consider whether a suggest is mainly for better design or a better implementation.
It has some debian flavor but I have to say using opam where there is just one official central repo is much easier than using ubuntu, with which I have to manually tune it the src-list for many times. And there do have some popular repo e.g. CI burden, Package author burden, Opam repo admin burden I would like to observe for some more time and hand on more work to figure out where the CI burden and repo admin burden comes from. I think I agree with the fact that CI sometimes runs slower to block things. However, in my very limited observation, some CI issues due to the service itself could be improved (accidental error). Some CI errors are caused by package incompatiblity (essential error), and they do rescue users from suffering from this problem. That's one big achievement of the current admin team and opam repo workflow outperforms many other package management tools I have used, in my opinion. I don't think changing the CI arrangement can reduce this problem. Or maybe it helps, if more packages are staying the unstable without moving into the stable then the overall CI check can be reduced. If we can offload some checking on the publishing side, some burdens are transferred from the CI/admin side to the author side. p.s. In recap, I am very interested this problem for long time. I have some plan to survey (and surveyed a little, but not sort them out yet) more package managers especially for programming languages. |
My take on scaling cares about growth and reliability.
Growth
There is no such thing as eternal growth, so if we design for a liveable future, we should reach a saturation in size. IMO this is simple via expiration. It can be 10 years for "LTS" versions (e.g. the last one before a major/minor jump) but it has to be less than "eternity". There is no maintenance promised during these 10 years, just mere existence. A HTTP 200 instead a 404. The eol date should be in the metadata inside the tarball (and a http header).
Who needs it longer (we come to that in a second), should be able to easily mirror/vendor single version tarballs.
Reliability
IMO we should embrace reproducible builds. This means we need reliably retrieveable sources incl. toolchain. Certain versions may do, it doesn't have to be any version. But once declared such (think 'best before Jan 1st 2040) it must to be available under the same url (or redirected). No need for 5 nines availability and neither bandwidth nor latency shouldn't matter too much, as you can easily cache/download stuff.
Maybe signing the tarballs may be useful. (Web of trust, uh)
In the end it's an url registry with signed, expiring entries.
For reliability (legal, technical, moral) we shouldn't tie ourself to any third party concerning the build toolchain. GitHub may have been fashionable recently, but IMO the opam repository should be consumable without touching billionaire-run infrastructure let alone namespaces. I would want my project to be buildable by peers in Teheran, Moskow, Paris, Beijing and New York alike. And if not, it should be our decision (ocaml.org) and not a billionaire-clerk's.
Monitoring may be helpful.
Hope this makes sense,
Marcus
…On Tue, 12 Dec 2023 21:43:20 -0800 Weng Shiwei 翁士伟 ***@***.***> wrote:
I am new and started volunteer work in opam admin a few weeks ago. I haven't done much yet except for just attending a few weekly meetings. I shall have more time from this week. I am so glad to see this issue discussing the scaling problem. Here are some topics appearing in your discussions and my opinion.
**Retiring old packages**
I don't agree with retiring old packages directly. The direct reason is we cannot guarantee the packages will observe the plausible invariant from _semantic versioning_ so that e.g. projects or libraries depending on 6.0.0 can always run without problems after changing dependency to 6.0.1. Reproducibility is such a painful problem.
However, I do agree that we should suggest to use a better alternative of a package version. No matter when users try installing a package with a non-optimal version, or opam (`.opam`) detects a non-optimal version is specified. But does it require the user to always have a most recent view of the latest opam-repo?
So my point is the package versioning suggests a good package choice but not reliable. That's the reason lockfiles are widely used.
**Using GitHub as the backend**
Another unclear point to me is when evaluating a design suggestion above, shall we assume that opam repo will stick to using GitHub (git) backend or is it also discussable? It will be helpful to consider whether a suggest is mainly for better design or a better implementation.
> We have 3 official opam repositories instead of one: unstable/stable/archive
It has some debian flavor but I have to say using opam where there is just one official central repo is much easier than using ubuntu, with which I have to manually tune it the src-list for many times. And there do have some popular repo e.g. `https://coq.inria.fr/opam/released`. Does the repo kind (unstable/stable/archive) also apply to other repoes, or is it actually a package metadata?
**CI burden, Package author burden, Opam repo admin burden**
I would like to observe for some more time and hand on more work to figure out where the CI burden and repo admin burden comes from.
I think `x-opam-repository-expiry` may increase the _Package author burden_ because it's too complicated and rarely possible to anticipate. I may lose some context or experience for this.
I agree with the fact that CI sometimes runs slower to block things. However, in my _very limited observation_, some CI issues due to the service itself could be improved (_accidental error_). Some CI errors are caused by package incompatiblity (_essential error_), and they do rescue users from suffering from this problem. That's one big achievement of the current admin team and opam repo workflow outperforms many other package management tools I have used, in my opinion. I don't think changing the CI arrangement can reduce this problem. Or maybe it helps, if more packages are staying the unstable without moving into the stable then the overall CI check can be reduced.
If we can offload some checking on the publishing side, some burdens are transferred from the CI/admin side to the author side.
p.s. In recap, I am very interested this problem for long time. I have some plan to survey (and surveyed a little, but not sort them out yet) more package managers especially for programming languages.
--
Reply to this email directly or view it on GitHub:
#23789 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
--
mro ***@***.***>
|
@arbipher you say "lockfiles are widely used" -- would you mind to elaborate on that? From what I can see, some people use lockfiles. Do you have a specific insight in which domain "lockfiles are widely used"? There are some loud voices on "lockfiles" that prevent any progress on pressing issues since "lockfiles may break". FWIW, if you're keen on reproducibility - lockfiles don't achieve this at all. You'll need to put some more effort (and include all the metadata into your build-information, together with environment variables and system packages that have been used). If you're fine with "sometimes you may be able to reproduce the same version / behaviour", obviously lockfiles are fine. |
@hannesm Ah, I realize I made a vague word. I mean lockfiles are widely used for projects within other package managers e.g. npm, rubygem, cabal, etc. I don't observe it's widely used in opam. I don't concern about reproducibility most of the time except for some manual instructions for building from source (in preparing research artifact). I do agree with your saying that lockfiles are far from ensuring reproducibility, but just a freezing of explicit package dependencies. |
Surely it must be possible to comb through opam-repository.git to get a maximal opam-repository with the latest and greatest metadata so your lock files don't break? Given that this problem is solvable I think it shouldn't block removing packages. Personally I have at least two packages I would like removed. They are let-if and ssh-agent-unix. The former was a fun experiment to show that you can get something similar to the "if let" construct from e.g. Rust, but it is not something I would use or recommend anyone using, and I have not heard of or found any usage of this library. The latter was accidentally published due to dune-release by default publishing all opam packages in a repository. The initially published package was of very poor quality (I didn't intend on publishing it), I don't think anyone uses it and I don't really want to maintain that package (though I am happy to maintain ssh-agent). I am sure there are many other packages that are in a similar poor state. Given the many, many CPU cycles, bandwidth and energy being wasted on this due to the number of users I think it is important to take a decision sooner rather than later. I believe this would make a great impact towards the carbon footprint policy, too. |
Even though this is a partial answer, the removal can already be achieved with PS I have not yet added a comment here since I don't yet have a clear vision on how to make it work properly, but I am following the thread and I would like to reach a good and scalable solution |
That' makes me sad. I thought that was something of the past, and would hope that such a bugfix would be ported to the 2.1 series and released as soon as possible. While a |
It is not currently slotted for 2.2.0 (though that could change) but could be slotted for 2.2.1. This is ocaml/opam#5400 and it needs a review from @dra27 for Windows who is not available at the moment |
TL;DR I suggest approaching package management much more like debian and providing a community repo with which is entirely maintained by package authors. I think there's some tension around opam being held to the expectations of popular language package managers like npm while opam's philosophy is much closer to that of debian's package manager. As an npm user I expect old packages to always be available (possibly with loud warnings on install) and I expect to release my own packages with very little effort to myself or npm's maintainers. Further, I don't expect npm to handle my package's CI, and I accept the risk that I might accidentally install a malicious package from npm. All the maintenance burden is placed on package authors so the human cost of repo maintenance is distributed in a way that scales as the repo grows. As a debian user I expect that every package in its repo has been vetted and tested by the debian repo maintainers (whom I implicitly trust as a debian user), but I don't expect to be able to install old versions of a package. Since the repo maintainers choose which packages are in the repo, and there are fewer versions of each package in the repo, the human and CO2 cost of maintaining the repo is limited. The debian repo isn't expected to grow very quickly as new software that's both useful and mature enough to consider including doesn't appear that frequently, however as OCaml becomes more popular it's likely we'll see the rate of new packages being released go up. Opam attempts to have the security benefits of debian, but it's expensive to scale since random people can suggest to add new packages via a PR which creates work for the human maintainers and every new package in the repo adds an ongoing maintenance cost. There's no blessed alternative way to release OCaml packages and so all the new users who want to make their hobby projects available by releasing them will be inadvertently adding to the repo maintenance burden. I propose three things:
|
@gridbugs what you propose is similar to what I am saying in #23789 (comment). Any specific difference you want to highlight? This is also compatible with the short-term goal of triming the central repository. So, I propose we aim for the following final state:
To get there:
WDYT? |
That generally sounds good to me. A small thing but let's not call it
@samoht What do you think about the idea of cutting official releases of the stable repo semi-regularly and reserving breaking changes for those releases? Kinda like how nixos has two big releases a year but also has an unstable branch you can follow if you want to subscribe to the rolling release model. |
I hope you're not suggesting that, if a user publishes a new version of their package one day, they might have to wait months before having it available on opam? If it's only a cadence for removing old releases that are subsumed, I imagine it can be useful, though. |
Only if they want to release their package into the stable repo. I'm proposing to use the unstable/community repo for quick releases and the stable repo for slow, debian-esque releases. |
As someone using lockfiles, I'd say that lockfiles done right is not an an aspect that creates additional problems. True, if you just have a list of packages and versions ( The approach in both Dune and opam-monorepo lockfiles doesn't have a problem with packages disappearing from the repo because once locked neither of them need access to the opam-repository. opam-monorepo doesn't because it assumes all packages can be built with dune, Dune doesn't because it copies the build instructions from the packages at lock-time. Of course, creating a new, updated lockfile for a project that depends on potentially removed packages is still a problem, but it is the same problem as you'd have with |
I disagree with the approach to split "opam-repository" into three branches. My intuition is, similar to Debian, this mainly adds confusion to newcomers, tardiness in getting software out. What I value about the opam-repository is the quickness of updates (bugfixes, new packages), how easy it is to install a package released yesterday, and also the impressive high quality (thanks to both manual checking and CI systems). I don't want to loose any of these properties. From a package authors perspective, I'd be confused whether to pick the "bazaar"/"community" or "stable" branch to submit something. How would I decide? As an opam user, how would I decide which branch to use? As a "zero-configuration CI system", which branch would be used (why stable? why community?)? If we're moving to a repository where it is fine to remove (or archive) packages, it is fine to submit to the "stable" branch, or am I misunderstanding something? GoalThe goal I have in mind, since the start of this issue, is to reduce the repository size. The impact is manifold:
And all of that while keeping the user experience (and workflows/tooling) for (a) package authors and (b) OCaml users the same (for 99.9% of users). Even though I don't see much value of an "archive" branch, I'm fine to have this around, and instead of "removing packages", keeping them in the archive branch (so people can do bulk builds and fix packages). The workflows could be:
Things neededThis also means that not much needs to be changed:
There's explicitly nothing needed in terms of "adding things to dune-release/opam-publish" or "revising documentation for package authors / opam users". Please let me know what you think, and which goal(s) you have in mind that are not covered above. Thanks for reading. |
I'm not sure what "opam export" is, at least with opam 2.1.5 this is not a valid subcommand. Are these 'plenty of "locked" builds' publicly available? Or are these locked builds locked away from the free and open source community?
Maybe you want to look into "opam switch export --full --freeze" as well - here you get something independent of opam-repository, including all patch files, everything versioned. From my experience, opam-monorepo does some intermediate step where it uses "opam lock" (maybe the "opam-monorepo lock" command?), thus if you consider the "lock" and "pull" step separately (and exchange the output of "lock"), you can get stuck. I've no clue about "dune package management" and their lockfiles. But I trust you to have a good sense of what is needed in there to be independent of opam-repository changes. |
Excuse me, I misrembembered the command, I meant
These locked builds are locked away from the FLOSS community because the software in question was a proprietary CRUD backend. However, I don't think that this is an issue, I believe we should also have the OCaml ecosystem work for proprietary software. And selling OCaml to your managers is hard if you need to fix a supposedly locked docker build every 2 weeks because some (well-meant!) change in opam-repository made the docker build fall over.
Ah, yes that's nice indeed! Unfortunately also well hidden. In any case, I don't want to derail this into a discussion into lockfiles, just wanted to point out that for lockfiles and locked builds the way opam-repository is scaled does not make that much of a difference in my opinion. |
@kit-ty-kate, if I remember correctly, a few parties agreed with @yawaramin in that first meeting -- it was mentioned that removing packages from the repository is not the only way to scale our ecosystem -- a lot of package ecosystem lives very well with a few orders of magnitude more packages than While I'm not opposed to splitting the repositories into various parts to ease some of the maintenance burden in the short term, I really would also like us to think about what will happen next and what needs to be done to ease the lives of opam-repository maintainers (do we need more maintainers? if yes, what is blocking people to contribute here?) and to reduce the size/cost/power consumption of our CI cluster. Edit: Also, it would be helpful to be super clear and state that "growing the number of available packages in opam" is an essential goal of the opam-repository team -- as this is a proxy metric of a healthy ecosystem. |
Exactly my thought process here. If we are going to chop up the opam-repository into two or more slices, we should recognize that this is a temporary solution and in the long run we need a more practical solution which does not require more chopping up in the future. It seems that the main issue is that as the repository scales up, more data is being moved around and stored, which is making opam clients and other processes more inefficient. I think we need to find a repository format that does not scale essentially linearly with the number of packages and that is efficient to query, download, and install packages even as it continues to scale up. Trees, anyone? |
One thing I've wanted from the very beginning of opam was the ability to refer to multiple repositories from within the metadata. @samoht ran out of time to build it into opam 1.0, but we're only feeling the pain a decade down the road ;-) It would be really nice if we could clone a lightweight "latest version of opam repository" but then bringing in other overlay repositories. Ideally if we break up opam-repository into multiple versions, it should be possible to have a reasonable way to build combined versions without having to remember all the various repository URLs. Perhaps all that is needed is an opam-repo CLI tool that can combine/split up an opam repo into multiple repositories, and rewrite metadata appropriately so that they are either standalone or overlays. Most of the opam CIs have had this ability to distinguish standalone opam repos from overlay ones designed to be merged with an existing repo. @raphael-proust to answer your questions above:
That's exactly the reason why it's easier. The opam-repo is one big datastructure, and when maintainers apply fixes, they are applied to a batch of packages. Therefore the CI should be building the one datastructure that we intend to merge, and not individual changes. From the maintainer perspective, there will just be a "queue of incoming packages" and then we look at breakages and push the right metadata to make that work into the mergeq. Right now it's very laborious to spot some revdeps breakage, then open a separate PR to fix those, then look for more breakage uncovered, repeat a few times, and it takes 5-6 PRs to merge a package, and a huge amount of unnecessary CI work. My goal with the mergeq is to reduce the amount of CI time needed by an order of magnitude, by removing all the duplicated work due to the lack of synchronisation around branches.
This will be the opam-repo maintainers, just as it happens today. Except instead of pushing revdeps failures in separate PRs and branches, you should focus the effort on the merge queue. When it passes, it becomes the main branch, and a new set of PRs can be chucked in the queue. What does "pass" mean? That's why at the very beginning of the doc (https://github.com/avsm/opam-repo-roadmap-thoughts) before anything about merge queues, I put down a set of candidate metrics for how to measure the opam-repo health. I'm looking forward to reading your draft resolution to see how they align up -- I no doubt missed a bunch of ideas that are coming to light from your meetings. (unfortunately, I've got a standing conflict at 2pm UK time with our faculty meetings at the university, so I can never make the community meetings you are organising during the academic term. I may be able to attend a synchronous meeting at some other time) |
Dear all, thanks for continuing the discussion. I attach here the notes of last meeting. In particular I would like to stress the following point from the conclusions:
Discussing the future of opam-repositoryPublic meeting on 2024-02-21. Roll callWere present during this meeting: Notes
Conclusion
Short term solution - DraftPhase 0 (this here draft)
Phase 1 (setup + unavailable/broken packages)
Phase 2 (OCaml < 4.05 -- the least aggressive pruning)
Phase 3 (spring cleaning)
Discussion
|
Reminder for everyone who wants to come to the next meeting to please fill the form as soon as you know when you are available, so that we can plan when the meeting is going to be. |
Based on the above poll, it looks like most of the main interested parties so far are available on:
so we propose to have the next public meeting at that time and date on https://meet.jit.si/opam-repo-meeting |
Meeting notes (2024/03/04)Present: Kate, Anil, Marcello, reynir, Thomas, rjbou, Shon, Hannes, Shiwei, Ryan Yawar was marked present but was in fact victim of a bug, see https://discuss.ocaml.org/t/discussions-on-the-future-of-the-opam-repository/13898/11, if it happens again in the future for anyone else, please ping us on Slack or Discord. We'll also try to keep a look at the Discuss page for people who have neither. Agenda
NotesClarifications about the draft:
Communications will need to be modified to include some quality metrics (installability, etc.) where it has historically focused on quantity ("we have that many packages and it's growing") In addition to the repository health (size, scalability, etc.) there are also CI cost concerns in terms of energy use Should retired packages have their documentation built for ocaml.org?
Actionable: talk to the CI team responsible for the docs-ci in ocaml.org and see what they say. If we want to be able to publish users' package faster/more npm-like manner, one of the values for the fields that specify the maintenance intent could be "don't touch my package ever, don't change metadata, nothing"; in this case the package could be moved immediately to archive as soon as it is broken. When a package is moved to the archive it would be good to have a comment/commit-message/x-field/… that tells people why the package has been moved to the archive
Conclusion
DraftPhase 0 (this here draft)
Phase 1 (setup + unavailable/broken packages)
Phase 2 (OCaml < 4.05 -- the least aggressive pruning)
Phase 3 (spring cleaning)
Phase 4 (OCaml < 4.08)
Phase 5 (coasting)
|
The next meeting will be on the same day / time again, the week after the next one. If that doesn't fit with your calendar this time, please don't hesitate to tell us.
|
I haven't been able to attend the meetings due to time zone and life constraints but the plan seems great. Thanks for the hard work. |
The meeting is starting now, is anyone else coming? |
Hello, Thanks for this discussion! Sorry I am late. I don't have much to contribute on the cleanup plan, thank you for the work there! I'd like to comment on the future of opam and its growth, and bring up an idea for us to consider. It's in the realm of creating more flexible workflows for users, in a way that scales up. I'm not sure if it's been mentioned before, but I find it interesting and potentially beneficial for the future growth of opam. What I was thinking of was to create a new repository for opam, which would list custom opam repositories that various users have (examples here, here, etc). This "meta" repository would be a registry of link towards other repos, so in essence a map from a name, to a git URL. This is an "opam-repos-registry" rather than a "opam-packages-registry". I am imagining that the process of adding a new entry to this repo could be similar to that of the existing main opam repository. Users would add an entry with a link to their repo, reserving a part of a new global namespace composed of the repo name (GitHub username, company name, etc.) and the name of individual packages defined there. The barrier for inclusion could include a linting step for the added configuration files, with some basic verification that the URL given is indeed pointing to a valid opam-repository. We could certainly draw inspiration from the current process in the main opam-repository for deciding what other considerations are important before merging PRs. By design, the opam maintainers would defer to the maintainer of each repo for defining the policy on maintenance, lifetime, and quality of packages listed in their custom repo. This is not a fully fleshed-out proposal, it's an idea that we could implement now. It could provide useful data from the community and potentially be part of a larger, incremental solution. If the repo gains traction, we could consider modifying some tools to gradually expose its data. For example including the packages there into tools like In a world where you'll soon be able to specify package dependencies directly with I'm open to discussing this further if some find this idea appealing. I believe it has several beneficial properties, including the fact that it doesn't require immediate changes to existing tools (opam handles custom opam-repositories beautifully), and it relates to the topics discussed here. And just to be clear, despite the timing of this post, this isn't an April Fools' joke. Unless the joke is on me and this idea has already been discussed or such meta repo already exists and I am just not aware of it!! 😄 |
Dear all, here are the notes from the last meeting. Sorry for the lengthy delay. Meeting notes (2024/03/18)Present: Kate, Marcello, rjbou, Ryan Agenda
Notes
Conclusion
PlanPhase 0 (this here draft)
Phase 1 (setup + unavailable/broken packages)
Phase 2 (OCaml < 4.05 -- the least aggressive pruning)
Phase 3 (spring cleaning)
Phase 4 (OCaml < 4.08)
Phase 5 (coasting)
|
Dear everyone, thanks for being involved and writing down the very nice plan. I now wonder what is the timeline? And is there anything I can do to move this? |
This is just an update to let you know that things are moving slowly but moving. PhilosophyThe PlanPhase 0
Phase 1Ready to implement
Preliminaries for Phase 2 from infra team #CI
Phase 2 (OCaml < 4.08 -- the least aggressive pruning) - aim: 3 months from Phase 1
The above steps will be repeated each time we go on with this point with a more recent version of the compiler as bound. Phase 3 (spring cleaning) - aim: 3-6 months from phase 2
To repeat each time
Tooling (basically to do once)
Phase 4 (coasting)
|
Hello, all 👋 We have drafted a policy that seeks to explicate and formalize the recurring practices and stable criteria described in the plan. You can read the draft here: https://github.com/ocaml/opam-repository/wiki/Package-Archiving:-Policy Review and critique by any interested parties would be appreciated. Comments may be left that document via HackMD, or make on this issues. Please let me know if you have trouble accessing the document! 🦥 |
Thanks for writing this up. The only thing I stumbled upon is the mention of "OCaml platform" out of the blue. Just to be clear, in my perspective this is all about the opam-repository and scaling issues, and would be glad if we keep the topic & perspective clear. |
Thanks for the review. I removed that phrase. |
I've moved the policy and plan to |
Thanks a lot!! |
I've observed over the years that there's the sentiment that "no package shall be removed from opam-repository" (I still don't quite understand the resoning behind it -- maybe it is
lock
files that fail to register the git commit of the opam-repository? Maybe it is for some platforms that require the opam-repository for removing opam packages?So, I'd like to raise the question why this sentiment exists. Answers (or pointers to arguments) are highly welcome.
Why am I asking this? Well, several years back Louis worked on "removing packages that aren't ever being installed" (with the reasoning that if foo in version 6.0.0 and in version 6.0.1 (with the very same dependencies) are available, 6.0.1 is always chosen (which makes sense due to it may very well be a bugfix).
Now, I also observe that rarely, really rarely old package releases get bumped their minor version -- i.e. the "long term support" of opam packages does not really exist (happy if you prove me wrong): bug fixes are incorporated with new features and API changes, and so new (major) versions are released.
Taking a step back, it is pretty clear that collecting more and more packages will lead to larger amount of work for the solver (which needs all the packages being parsed, and then find a solution). This means that the solver needs to improve speed-wise roughly every year. This is rather challenging, and in the end leads to opam not being very usable on smaller computers (Raspberry PI, or even tinier computers...).
Also with carbon footprint in mind, I fail to see why opam-repository may not delete packages. In the end, it is a git repository -- if you wish to install compiler in an arcane version, it is fine to roll back your opam-repository git commit to an arcane commit that fits nicely. The amount of work for the opam-repo CI could as well be minimized by removing old opam packages (since a revdep run will be much smaller).
Please bear with me if you've already answered this, and have written up the design rationale (and maybe the answer how opam-repository will scale). Comments and feedback is welcome. Once I understand the reasoning, it'll be much easier for me to figure out how to move forward. Thanks a lot. //cc @AltGr @avsm @kit-ty-kate @mseri @samoht
The text was updated successfully, but these errors were encountered: