Skip to content
arienmalec edited this page Nov 24, 2022 · 69 revisions

Porting status

The current state of play:

  1. mathport more or less works.
    • Lots of warts, but it gives an output that is much easier to clean up than to translate from scratch.
    • mathport is getting minor fixes and improvements from Mario and Gabriel, but likely there's no full-time attention on it in the near future
  2. mathlib4 tactics are still pretty primitive
    • There is a lot of work-in-progress and resources allocated to porting more tactics to mathlib4.
    • Many advanced parts of the library are not portable with the current state of tactics.
    • However a lot of basic material is portable now, with only minor changes
    • Even that material is going to regularly discover issues with the existing ports of tactics, or issues in lean4 (but this is highly desirable).
  3. The theory development in mathlib4 is at a cusp, transitioning from "a mix of directly ported stuff and adhoc by-hand porting or experimentation", to "most files in it are good approximations of a direct port from mathlib3"
  4. Porting theory development is helpful for tactic porting (judicious use of sorry gets you a long way, but there's no particular need for theory development to lag behind).
  5. There is (hopefully!!!) lots of latent enthusiasm for putting in the work to port the theory development in mathlib3.
  6. But it is not immediately obvious what to do.

Porting procedure

Note there is a video showing the whole process on a very simple file.

  1. Identify a file X.lean in mathlib4 which
    • either doesn't exist, or exists in a partially-ported or adhoc state
    • but for which all imports are "approximately directly ported" You can run scripts/port_status.py in the main mathlib directory (on a recent commit) to identify these files.
  2. Open the mathlib3port repository and copy down the mathlib3 SHA on which your current mathlib3port is based; this is found in the README file. You will need this for the PR to mathlib4.
  3. If you would like to "claim" that you are working on the file, you can visit the port status page and change No to No: WIP by [your name] [mathlib3 SHA you just found]
  4. Open that file X.lean in the mathlib3port repository.
  5. Copy and paste it into a new branch in the mathlib4 repository, and commit it as-is. This allows reviewers to see the diff between mathport and the manually ported version, making it easy to check aligns and similar statements.
  6. Perform a bunch of mechanical updates on the file (i.e. things that in principle mathport could do, but doesn't yet), see below.
  7. Try to get everything compiling again. It's usually useful to have the original mathlib file open in a separate window, so you can typecheck, discover where instances are coming from, etc. Potential obstacles include
    • Discovering that an import Y.lean which you thought was sufficiently well ported already in fact isn't. (Either patch it up, or go back to step 1 with X = Y.)
    • Finding that a tactic T used in a proof is still missing. (Either, hesitantly, replace it with simpler tactics, or decide to pause on this file, perhaps noting somewhere that X.lean is now waiting on T.)
    • Encountering a surprising change in behavior in either a ported tactic or Lean core. (Ask on Zulip, create an issue, hesitantly work around or preferably note the X.lean is waiting on that issue.)
    • Some small changes are just going to have to happen. Leave TODO or FIXME liberally, and note any name changes you make using #align so mathport can follow your changes in its next run.
    • Realizing that it is not a trivial port, e.g. because it depends fundamentally on the significant changes to coercions or typeclass resolution in Lean4. (e.g. data.nat.cast.basic and data.int.cast.basic are on the nearby horizon in this category). Time to assemble the relevant experts.
  8. At this point you've either noted the file as blocked on other changes, or successfully ported the entire file! In this case, add it to the Mathlib.lean file, that is supposed to import all files of Mathlib.
  9. If it's a successful port, make the PR to mathlib4 and be sure to include the mathlib3 SHA you found in Step 2 to the PR description.
  10. Once it is accepted
  • Go to the port status page and change No to Yes mathlib4#XXX [mathlib3 SHA], to indicate that has been ported and specify the mathlib3 SHA on which it was based. This will update the results that scripts/port_status.py produces.
  • Wait for CI to update port comments in mathlib
  • Once this mathlib PR is merged, the port_status will flag your file as changed after the SHA you wrote in the port status (because of your PR!). You need to update the SHA to this newer mathlib3 version, but please be sure no other modification have been done to the file.

Implicit in this proposed scenario there is no "flag day" for all of mathlib: we just work our way up from the bottom.

In the optimistic scenario, the problems in step 5 become less and less frequent, and mathport gets some tweaks that reduce the effort required in step 4, and it becomes possible to do multiple files, or whole directories or areas in one go!

For any file that isn't already a "complete port" but has been manually partly ported, we take "starting over" as the default route.

This might involve:

  • move the existing X.lean to X2.lean
  • take X.lean from mathlib3port, and get it working as X.lean
  • import X.lean in X2.lean
  • delete everything in X2.lean that by now has already been defined in X.lean
  • find homes for or discard anything that remains (judgement required here!)
  • get everything downstream working again

You can also port from Lean 3 core, as well as from mathlib3. If you find something is missing that was in Lean 3 core, please put it in the Mathlib/Init/ directory. There is a repository lean3port which contains the output of mathport running on Lean 3 core, analogous to mathlib3port.

Some common fixes

  • change import Mathbin to import Mathlib
  • Add explicit imports to resolve missing dependencies where transitive imports have changed (often by identifying what the mathlib equivalent resolves to & what the mathlib4 equivalent is)
  • If a file imports another file from the tactic or meta folder: look at the status of which tactics that still need to be ported in this list. Tactics, attributes and commands that are not on this list should be working in Lean 4. If the tactics mentioned in the imports are already ported, you can import the corresponding mathlib4 files (the file name might have changed a bit). If not, you can replace the 'No' on https://github.com/leanprover-community/mathlib/wiki/mathlib4-port-status by 'Waiting on tactic X' or something.
  • remove in names
  • Missing attributes:
    • For refl:
      import Mathlib.Tactic.Relation.Rfl
      
      @[refl]
      theorem foo : 1 = 1 := rfl
      
    • trans: for now use import Mathlib.Mathport.Attributes.
    • If you run into other attributes, hopefully you will find them noted as needing porting in Mathlib/Mathport/Syntax.lean (which you can freely import if needed, I guess), or they should already be implemented somewhere, and searching for register.*attr should help you locate it.
  • See also the Lean 4 documentation about differences with Lean 3.
  • When declaring a structure, if a field needs to be put on a new line, then each fields need to be on a new line, or at least the first after the with keyword must be on a new line.
  • The command include and omit are gone in Lean4, you can safely delete the corresponding lines. Lean should be able to do the right thing by default.

Linting

Run lints by adding #lint at the bottom of the file.

If the file you are importing don't give you access to the #lint command then you need to add

import Std.Tactic.Lint.Frontend
import Std.Tactic.Lint.Misc

Note that we now expect docstrings for each field of each inductive type. For instance

/-- The natural numbers. -/
inductive Nat where
  /-- `Nat.zero`, normally written `0 : Nat`, is the smallest natural number.
  This is one of the two constructors of `Nat`. -/
  | zero : Nat
  /-- The successor function on natural numbers, `succ n = n + 1`.
  This is one of the two constructors of `Nat`. -/
  | succ (n : Nat) : Nat

We also expect documentation of notations. You should usually use the @[inherit_doc] attribute, which will copy the documentation from another declaration. This attribute takes a declaration name as an optional parameter in case Lean doesn't figure out which declaration should give its documentation to the notation.

Aligning names

In Mathlib.Mathport.Rename there are a bunch of commands of the form #align lean3_name lean4Name, but you can place them anywhere. These instruct mathport to adjust its naming on a case-by-case basis, and we need to be using it a lot more than we are. Note that mathport#187 removes the \_x's in exchange for now quite often running into severe type errors and having to stub stuff out. Basically, it will now assume that any name clashes are intentional and will use the clashed name even if it's not type correct (instead of making up \_x names), but that means that some definitions that actually do need to be different, like CoeT or Stream, need to be manually marked for realignment.

But even beyond this, whenever mathport gets a name wrong you want to use an #align directive to change the name to the desired thing, even if you have already fixed the definition, because that influences all future references of that definition.

If Lean complains it does not know about the #align command then you need to import Mathlib.Mathport.Rename.

Note that #align does not automatically include namespaces.

Naming convention

(note: These are work in progress)

We use the following conventions

  1. Types (including structures, inductive types, and type classes) are named with UpperCamelCase
  2. Most terms (e.g. inductive constructors, structure fields, type class members) are lowerCamelCase
  3. Prop-valued terms are snake_case
  4. Other names, including those of theorems are snake_case
  5. In a snake_case name, references to types and terms are lowerCamelCase
  6. Acronyms like LE are written upper-/lowercase as a group, depending on what the first character would be.

There are some rare counter-exceptions to preserve local naming symmetry: e.g., we use Ne rather than NE to follow the example of Eq; any such counter-exceptions should be discussed on the Zulip.

Note that if a theorem contains such a lowerCamelCase component corresponding to a definition with multiple words, mathport will usually get the name wrong and it has to be aligned by hand.

Examples

def Set.univ {α : Type _} : Set α := sorry
-- `Set` is UpperCamelCase as a type (rule 1), `univ` follows rule 4

class NeZero : Prop := sorry
#align ne_zero NeZero
-- rule 1 with rare counter-exception for `Ne`

theorem : NeZero.ne : sorry := sorry

theorem : neZero_iff : sorry := sorry
#align ne_zero_iff neZero_iff
-- reference to `NeZero` follows rule 5

class OrElse (α : Type u) where
  orElse  : α → (Unit → α) → α 
-- `OrElse` UpperCamelCase as a type, `orElse` lowerCamelCase following rule 2

class LT (α : Type u) where
  lt : α → α → Prop
-- names follow rule 6

class Infinite (α : Sort _) : Prop where
  not_finite : ¬Finite α
-- would be `notFinite` by rule 2, but rule 3 overrides.

theorem le_iff_lt_or_eq {a b : α} : a ≤ b ↔ a < b ∨ a = b := sorry
-- lower_snake case following rule 4, references to acronyms use rule 6

Further conventions

Type Classes

Where Lean 3 often named type classes by the predicate implementing them implied, Lean 4 prefers to name the noun of the attribute implied by implementing them.

Examples

  • Lean 3: has_mul, Lean 4: Mul
  • Lean 3: is_lawful_monad, Lean 4: LawfulMonad

Categories

  • Categories are suffixed with Cat, e.g. LatticeCat.

Coercions

In Lean 4 coercions are elaborated directly, so given

variable (f : α ≃ β) (a : α)

the application f a is syntactically equal to f.toFun a.

Similarly

variable (g : α → Prop) (x : Subtype g)

the coercion (x : α) is elaborated as x.val.

There are several different typeclasses for coercions, the most common ones being Coe, CoeFun and CoeTail. Usually, mathlib3's coe corresponds to Coe, coe_fn to CoeFun and coeTC to CoeTail.

Consequences for porting

This means that we can get rid of lemmas that used to do the simplification f.to_fun x to ⇑f x as these lemmas are now syntactic tautologies. This lemma for Subtype is not needed anymore in mathlib4:

theorem val_eq_coe {x : Subtype p} : x.1 = ↑x :=
  rfl

Another consequence is that certain instances become unnecessary:

instance coe_trans [Zero M] [Coe R S] [CoeTail S M] [h : NeZero (r : M)] : NeZero ((r : S) : M) :=
  ⟨h.out⟩

If the coercion function appears explicitly in a theorem, then it needs to be replaced by the function that is used in the coercion. So for example

theorem cast_injective : injective (coe : ℕ → R) := sorry

becomes

theorem cast_injective : injective (Nat.cast : ℕ → R) := sorry

Coercion tag

To define coercions, there exists now the coe attribute to automatically create the delaborator and the norm_cast attribute. A complete definition of a coercion might look like this:

@[coe] def cast [AddGroupWithOne R] : ℤ → R := AddGroupWithOne.intCast

instance [AddGroupWithOne R] : CoeTail ℤ R where coe := cast

Known tactic issues

  • change e defined in Mathlib.Tactic.Basic as an alias for show e, but the at form and change .. with .. are both defined without implementation in lean 4 core. These are just missing an implementation (and they should either be PR'd to lean 4 or moved to mathlib4).
  • to_additive attribute doesn't correctly translate some names, e.g. MonoidHomClass.toOneHomClass -> AddMonoidHomClass.toZeroHomClass. It has issues with flipping variable order in pow -> nsmul and zpow -> zsmul. In such cases, the equivalent additive theorems and definitions need to be manually stated. One may also need to append attribute [to_additive my_additive_theorem] my_multiplicative_theorem to help the tactic with name translations.

Tips and tricks

Auto-implicits

Lean 4 has automatic generation for implicit types, so that the following code is valid without explicitly mentioning α or β:

variable (f : α ≃ β)

When porting this may cause errors due to mathlib3port not aligning names correctly, so it can help to turn auto-implicits off with

set_option autoImplicit false

Finding instances

#synth is useful for discovering if and where instances are available (e.g. as #synth Semiring Nat)