Some Cardinal Arithmetic

Some time ago, Asaf Karagila wrote wonderful post wherein he shows that, even without assuming the axiom of choice one can always find four cardinals \(\mathfrak{p} \lt \mathfrak{q}\) and \(\mathfrak{r} \lt \mathfrak{s}\) such that \(\mathfrak{p}^{\mathfrak{r}} = \mathfrak{q}^{\mathfrak{s}}.\) In the comments, Harvey Friedman asks:

Your theorem is an example of an existential sentence about cardinals in the language with only \(\lt \) and exponentiation. Can you determine which sentences in that language are provable in ZF? More generally, expand the set of sentences about cardinals considered, obviously to include addition and multiplication, and perhaps alternating quantifiers.

After Peter Krautzberger’s tiny blogging challenge, I decided to spend 30 minutes thinking and writing about this…

… 

 

Super HoTT

This is a follow-up to my recent post on the structure of universes in HoTT. In this post I will outline one of the possible alternative ways of handling universes in HoTT, which I will colorfully call Super HoTT for the time being. This is not an original idea; something like that (not in the context of HoTT) can be found in Erik Palmgren’s paper On Universes in Type Theory.1 The idea comes from a comment by Andrej Bauer where he described a universe as "a structure \((\mathcal{U},\tau,\pi,\sigma,\ldots)\) where \(\mathcal{U}\) is a type, \(\tau\) is a type family indexed by \(\mathcal{U},\) \(\pi\) and \(\sigma\) compute codes of dependent products and dependent sums, etc." This is a perfectly reasonable way of describing universes. By the Foundational Thesis of HoTT, this approach should be formalizable in some flavor of HoTT. This is indeed possible and the flavor (or rather flavor family) Super HoTT is one way to do that.

The issue with Bauer’s description is the type family \(\tau.\) Formally, a type family indexed by a type \(\mathcal{U}\) is an element \(\tau:\mathcal{U}\to\mathcal{V},\) where \(\mathcal{V}\) is some larger universe. This type \(\tau\) becomes a type family only after applying Tarski-style coercion associated to \(\mathcal{V}\) that promotes elements of \(\mathcal{V}\) to actual types. On the face of it, this is circular: \(\tau\) is the Tarski-style coercion associated to the universe \(\mathcal{U},\) which is defined in terms of that of \(\mathcal{V},\) which…

One way to break this circularity is to have a designated super universe \(\mathcal{U}^\ast\) — hence the name Super HoTT — whose associated Tarski-style coercion and associated machinery are implemented in the formal system itself. In particular, \(\mathcal{U}^\ast\) is associated to a primitive decoding function \(T\) such that \(T(A)\ \mathsf{type}\) for any element \(A:\mathcal{U}^\ast.\) In Super HoTT, this is the only such decoding function. Other universes arise à la Bauer, i.e., as structures \((\mathcal{U},\tau,\pi,\sigma,\ldots)\) where now \(\tau:\mathcal{U}\to\mathcal{U}^\ast\) is a type family that realizes this universe’s Tarski-style coercion via composition with the super universe’s decoding function \(T.\) In fact, since there is only one super universe, the Russell-style approach doesn’t lead to many problems. In other words, there is little harm in suppressing the decoding function \(T\) in favor of a simple Russell-style rule \[\frac{A:\mathcal{U}^\ast}{A\ \mathsf{type}}.\] In fact, the \(\mathsf{type}\) judgment is often informally confounded with an actual universal type \(\mathsf{Type}.\) This type of identification can lead to serious issues such as Girard’s Paradox; the super universe approach is one of the many safe ways around these problems.

So far, this is pure type theory, we haven’t said anything HoTT specific. To make this a flavor of HoTT, we postulate that there are enough univalent universes or even lots of them. Universes à la Bauer are mere structures, they can be univalent or not, they can have extra structure or lack some structure. Univalent universes can be identified among these structures so we can formalize a rule saying that any two \(A,B:\mathcal{U}^\ast\) are contained in a univalent universe. One can also formalize rules saying that one can have cumulative univalent universes, cumulative univalent universes with propositional resizing, cumulative classical univalent universes, etc. The possibilities for specialized flavors of Super HoTT are vast and varied!

Note that the super universe doesn’t have to be univalent. It doesn’t need to have much fancy structure at all. It’s not even necessary for \(\mathcal{U}^\ast\) to have a natural number type. All of that can happen inside the univalent universes à la Bauer. In fact, a bare-bones super universe is probably more flexible framework than a rich one. Different choices of structure for the super universe and how it interacts with the universes inside it will lead to different flavors of Super HoTT with variable strength. Richer super universe structure allows one to express stronger and stronger properties for the univalent universes it contains.

One advantage of this approach is that universes can arise "on their own" in the sense that any suitable structure \((\mathcal{U},\tau,\pi,\sigma,\ldots)\) is a universe, no matter how it comes about. Though this can be a mixed blessing if one wants more hands-on control over which types are universes. One potential drawback of Super HoTT is that not all types can belong to univalent universes. In particular, the super universe \(\mathcal{U}^\ast,\) cannot belong to any univalent universe inside \(\mathcal{U}^\ast.\) In this way, Super HoTT is more like \(\mathsf{NBG}\) or \(\mathsf{MK}\) than \(\mathsf{ZF}.\) For ontological reasons, it might be preferable to have a more self-contained system analogously to \(\mathsf{ZF}.\) In any case, Super HoTT is one of the more flexible formal ways of handling universes in HoTT and it is a perfectly viable alternative to the formal system presented in the appendix of the HoTT book.2

Notes

  1. E. Palmgren, On universes in type theory. In Twenty-five years of constructive type theory (Venice, 1995), Oxford Logic Guides 36, Oxford University Press, New York, 1998, pages 191–204.

  2. Univalent Foundations Program, Homotopy Type Theory: Univalent Foundations of Mathematics, Institute for Advanced Study, Princeton, 2013.

 

On the structure of universes in HoTT

In this post, I want to outline some subtle and interesting aspects and issues with respect to universes in HoTT. Some of these issues were brought up in a comment discussion with Mike Shulman some time ago and they came up again more recently in a comment discussion with Andrej Bauer and Alexander Kuklev. I will use the HoTT book as a reference though this post is not intended as a criticism of the handling of universes in the book. Indeed, to serve its purpose, the book is deliberately vague about the exact handling of universes in HoTT. In fact, this post is a good illustration why this is necessary!

Also note that the issues I bring up are mostly irrelevant to conducting everyday mathematics in HoTT. While this post is largely for people interested in technical aspects of HoTT, I will nevertheless try to keep the the post at a medium level of technicality since these aspects of HoTT do come occasionally up in everyday mathematics.

… 

 

HoTT Math 4: Local rings and fields

It’s been a while since the last edition of HoTT Math. Fall is always very busy for me and I’ve been composing this installment one \(\varepsilon\) at a time… We are finally arriving at our destination: fields!

The main issue with fields is to correctly handle the implicit negation in the term nonzero. Since the natural logic of HoTT is intuitionistic rather than classical, this is much more subtle than one would expect.


One issue with the basic commutative ring theory we developed last time is that there is a one-element ring. This is a feature of every equational theory since all identities are satisfied in the trivial algebra with one element. The problem is that equational logic doesn’t have negation so there is no way to state a nontriviality axiom like \(\mathsf{O}\not\approx\mathsf{I}\) in this context. These issues are not new and not special to HoTT but the problem is amplified since logic in HoTT is may have more truth values than just true and false. Consequently, negation in HoTT always needs a little more care.

The usual way to say \(\mathsf{O}\not\approx\mathsf{I}\) is the strongest one, namely that \(\mathsf{O}=_R \mathsf{I}\) is the empty type \(\mathbf{0},\) or: \[{\small\fbox{$\phantom{b}$nontrivial$\phantom{q}$}} :\equiv(\mathsf{O}=_R \mathsf{I}) \to \mathbf{0}.\] This is stronger than just saying that \(\mathsf{O}=_R \mathsf{I}\) isn’t inhabited and it may exclude more than just the one element ring in the absence of the law of excluded middle [§3.4].

In general, we define inequality by \[(x \neq_R y) :\equiv(x =_R y) \to \mathbf{0}.\] So we can simply write \(\mathsf{O}\neq_R \mathsf{I}\) instead of \({\small\fbox{$\phantom{b}$nontrivial$\phantom{q}$}}.\) To see this in action, let’s look at nonunits. The usual definition gives \[\operatorname{\mathsf{nonunit}}(x) :\equiv\prod_{y:R} (x \cdot y \neq_R \mathsf{I}).\] The recursion and induction rules for \(\Sigma\)-types [§1.6] show that \(\operatorname{\mathsf{nonunit}}(x)\) is equivalent to the negation \(\lnot\operatorname{\mathsf{unit}}(x) :\equiv\operatorname{\mathsf{unit}}(x) \to \mathbf{0}.\)1 In classical mathematics, a local ring is one where the nonunits form an ideal, which is then necessarily the unique maximal ideal of the ring. Assuming \({\small\fbox{$\phantom{b}$nontrivial$\phantom{q}$}},\) we see that \(\operatorname{\mathsf{nonunit}}(\mathsf{O})\) is inhabited. It is also easy to see that \[\prod_{x,y:R} (\operatorname{\mathsf{nonunit}}(x)+\operatorname{\mathsf{nonunit}}(y) \to \operatorname{\mathsf{nonunit}}(x \cdot y))\] is always inhabited. Thus, all that is missing is the axiom: \[{\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}} :\equiv\prod_{x,y:R} (\operatorname{\mathsf{nonunit}}(x)\times\operatorname{\mathsf{nonunit}}(y) \to \operatorname{\mathsf{nonunit}}(x+y)).\] Classically, fields are local rings where the maximal ideal has only one element \(\mathsf{O}.\) So a reasonable axiom for fields is \[{\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}} :\equiv\prod_{x:R} (\operatorname{\mathsf{nonunit}}(x) \to x =_R \mathsf{O}).\] (Note that \({\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}\) implies \({\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}}.\)) Classically, these are indeed the fields we know and love but things break down in an intuitionistic setting. If these were fields, then we should be able to conclude that nonzero elements are units. However, from \({\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}\) we can only conclude that \(x \neq_R \mathsf{O}\to \lnot\operatorname{\mathsf{nonunit}}(x)\) and the conclusion there is the double negative \(\lnot\lnot\operatorname{\mathsf{unit}}(x)\) rather than the positive statement \(\operatorname{\mathsf{unit}}(x).\) The fact is that we cannot conclude that every nonzero element is a unit from this axiom without using the law of excluded middle! A similar issue arises with local rings as defined above: the axiom \({\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}}\) does not imply the property that if \(x + y\) is a unit then at least one of \(x\) and \(y\) is a unit.

The solution to this isssue is to reverse the relation between equality and inequality. What if instead of inequality being the negation of equality we had that equality was the negation of inequality? This may sound counterintuitive at first but this is often how things work. Consider the case of two computable real numbers \(a\) and \(b,\) that is real numbers for which we have an algorithm which on input \(k\) gives us a rational approximation within \(2^{-k}\) of the real number in question. How do I determine whether \(a\) and \(b\) are equal?

  • To show that they are different, I can use the two algorithms compute sufficiently good rational approximations to \(a\) and \(b\) to tell them apart.
  • To show that they are equal, I need to argue that they are not different: given any input \(k\) the two algorithms give rational approximations that are within \(2^{1-k}\) of each other.

So, for computable real numbers, inequality is more primitive than equality!

This leads us to apartness relations: a binary relation \(\mathrel{\rlap{\neq\,}{\,\neq}_{}}\) that satisfies the following two axioms \[\begin{gathered}
\lnot(x \mathrel{\rlap{\neq\,}{\,\neq}_{}}x) \\
x \mathrel{\rlap{\neq\,}{\,\neq}_{}}y \to x \mathrel{\rlap{\neq\,}{\,\neq}_{}}z \lor y \mathrel{\rlap{\neq\,}{\,\neq}_{}}z
\end{gathered}\]
These axioms ensure that the negation of \(x \mathrel{\rlap{\neq\,}{\,\neq}_{}}y\) is an equivalence relation; a tight apartness relation is one where \[\lnot(x \mathrel{\rlap{\neq\,}{\,\neq}_{}}y) \to x = y\] and thus the negation of \(x \mathrel{\rlap{\neq\,}{\,\neq}_{}}y\) is equality. In classical logic, every apartness relation is the negation of an equivalence relation but this is not true intuitionistically. In fact, the negation of equality is not necessarily an apartness relation!

Without further ado, let us state exactly how we can define local rings using apartness. For example, a more positive way to define local rings is that \[x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} y :\equiv\operatorname{\mathsf{unit}}(x – y)\] is an apartness relation. The first apartness axiom \(\lnot(x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} x)\) actually follows from \({\small\fbox{$\phantom{b}$nontrivial$\phantom{q}$}}\) since \(\operatorname{\mathsf{unit}}(\mathsf{O}) \to (\mathsf{O}=_R \mathsf{I}).\) The second apartness axiom boils down to to the fact discussed above that if \(x+y\) is a unit then at least one of \(x\) and \(y\) is a unit. It is tempting to use the type \[\prod_{x:R} (\operatorname{\mathsf{unit}}(x+y) \to \operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y))\] to describe this but this doesn’t work because of the exclusive nature of sum types. Instead, we must use the propositional truncation \(\Vert\operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y)\Vert,\) which is how the logical disjunction \(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)\) is defined [§3.7]. Thus a strongly local ring is a a nontrivial ring that satisfies \[{\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} :\equiv\prod_{x,y:R} (\operatorname{\mathsf{unit}}(x+y) \to \operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)).\]

Before going further with these ideas, let’s practice some our intuitionistic reasoning by verifying that \[{\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}}.\] Actually, we won’t be practicing our intuitionistic reasoning, we’ll just verify our logic by thinking in terms of functions between types rather than logical implication between propositions. Although an implication is not always equivalent to its contrapositive in intuitionistic logic, we do always have that \[(P \to Q) \to (\lnot Q \to \lnot P).\] In terms of types and functions, this the special case of \[(A \to B) \to ((B \to C) \to (A \to C)),\] where \(A :\equiv{P},\) \(B :\equiv{Q},\) \(C :\equiv\mathbf{0}.\) This looks even more familiar when uncurried: \[(A \to B) \times (B \to C) \to (A \to C).\] Thus, we see that \[{\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} \to (\lnot(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)) \to \lnot\operatorname{\mathsf{unit}}(x+y)).\] Now recall that the disjunction \(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)\) is defined as the propositional truncation \(\Vert\operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y)\Vert\) [§3.7]. This means that \(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)\) is a proposition such that \[\operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y)) \to P \quad\simeq\quad (\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)) \to P\] for any proposition \(P.\)2 Since \(\mathbf{0}\) is a proposition, we see that \[\lnot\operatorname{\mathsf{unit}}(x) \times\lnot\operatorname{\mathsf{unit}}(y) \ \simeq\ (\operatorname{\mathsf{unit}}(x) + \operatorname{\mathsf{unit}}(y)) \to \mathbf{0}\ \simeq\ \lnot(\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y)),\] where the left equivalence is the special case \(A :\equiv{P},\) \(B :\equiv{Q},\) \(C :\equiv\mathbf{0}\) of \[(A + B) \to C \quad\simeq\quad (A \to C) \times (B \to C).\] Finally, recombining the above, we obtain \[{\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} \to (\lnot\operatorname{\mathsf{unit}}(x)\times\lnot\operatorname{\mathsf{unit}}(y) \to \lnot\operatorname{\mathsf{unit}}(x+y)),\] which is equivalent to \({\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$local${}^-\!\!\phantom{q}$}}\) because \(\lnot\operatorname{\mathsf{unit}}(x) \leftrightarrow\operatorname{\mathsf{nonunit}}(x).\)

Analogously to positive axiom for local rings, the positive axiom for fields is \[{\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}} :\equiv\prod_{x:R} (\operatorname{\mathsf{unit}}(x) \lor x =_R \mathsf{O}),\] which implies both \({\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}}\) and \({\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}.\) To see that \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}},\) we will make use of the distributive law \[(P_1 \lor P_2) \land Q \quad\leftrightarrow\quad (P_1 \land Q) \lor (P_2 \land Q),\] which follows from the general equivalence \[(A_1 + A_2) \times B \quad\simeq\quad A_1 \times B + A_2 \times B\] by taking propositional truncations. In particular, \[(\operatorname{\mathsf{unit}}(x) \lor x =_R \mathsf{O}) \times \operatorname{\mathsf{unit}}(x+y) \leftrightarrow\operatorname{\mathsf{unit}}(x)\times\operatorname{\mathsf{unit}}(x+y) \lor (x =_R \mathsf{O}) \times \operatorname{\mathsf{unit}}(x+y).\] Since \(\operatorname{\mathsf{unit}}(x) \times \operatorname{\mathsf{unit}}(x+y) \to \operatorname{\mathsf{unit}}(x)\) and \((x =_R \mathsf{O}) \times \operatorname{\mathsf{unit}}(x+y) \to \operatorname{\mathsf{unit}}(y),\) we see that \[(\operatorname{\mathsf{unit}}(x) \lor x =_R \mathsf{O}) \times \operatorname{\mathsf{unit}}(x+y) \to (\operatorname{\mathsf{unit}}(x) \lor \operatorname{\mathsf{unit}}(y))\] and it follows that \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}}.\) To see that \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}} \to {\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}},\) we will make use of the fact that \[(P \lor Q) \land \lnot Q \to P.\] This also follows from the distributive law above. Indeed, \[(P \lor Q) \land \lnot Q \ \leftrightarrow\ (P \land \lnot Q) \lor (Q \land \lnot Q) \ \leftrightarrow\ P \land \lnot Q\] because \(Q \land \lnot Q\) is \(\mathbf{0}.\) Note that \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}}\) is stronger than \({\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}}\times{\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}\) since it implies that \[\prod_{x,y:R} (x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} y \lor x =_R y)\] and thus that \(R\) has decidable equality [§3.4]. In other words, \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}}\) gives the equivalence \[x \neq_R y \leftrightarrow x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} y,\] which means that \(\neq_R\) is an apartness relation, whereas \({\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}}\times{\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}\) only gives that \(\mathrel{\rlap{\neq\,}{\,\neq}_{R}}\) is a tight apartness relation: \[x =_R y \leftrightarrow\lnot(x \mathrel{\rlap{\neq\,}{\,\neq}_{R}} y).\]

The analysis above gives three plausible definitions for fields:

  • A residual field is a nontrivial ring that satisfies \({\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}.\)
  • A Heyting field is a nontrivial ring that satisfies \({\small\fbox{$\phantom{b}$field${}^-\!\!\phantom{q}$}}\) and \({\small\fbox{$\phantom{b}$local${}^+\!\!\phantom{q}$}}.\)
  • A discrete field is a nontrivial ring that satisfies \({\small\fbox{$\phantom{b}$field${}^+\!\!\phantom{q}$}}.\)

While each is stronger than the previous, they all have their uses and there is no choice which is always better than the others. Discrete fields behave the same as fields in classical logic since \(R\) has decidable equality. Heyting fields are preferred in constructive circles and most definitions of the field of real numbers yield Heyting fields that are not necessarily discrete [§11.2, §11.3]. While residual fields are harder to work with, they do have their uses since the quotient of a nontrivial ring by a maximal ideal is a residual field which is not necessarily Heyting.

It is common for classical equivalences to break down in intuitionistic logic and therefore many natural concepts that are classically equivalent separate in intuitionistic settings. While this can be an annoyance, it is quite natural in the context of proof-relevant mathematics. Indeed, when we prove that a theorem about fields in a proof-relevant way, we cannot merely argue that every field satisfies the conclusion, we must proceed from known properties of fields and work our way to the desired conclusion. The precise properties used in this process naturally leads to a more refined understanding of the hypotheses of the theorem.


Before closing this installment of HoTT Math, let me draw your attention to an important lesson from the above:

While the logic of HoTT is intuitionistic, knowledge of intuitionistic logic is not necessary to work in HoTT.

Thinking in terms of types and functions as we did above works very well. To illustrate once more, consider the following classical equivalences: \[\begin{aligned}
P \to \lnot Q \quad&\leftrightarrow\quad Q \to \lnot P, \\
\lnot P \to Q \quad&\leftrightarrow\quad \lnot Q \to P.
\end{aligned}\]
Which are valid in intuitionistic logic? When translated into types and functions, we get that these are the special cases of \[\begin{aligned}
A \to (B \to C) \quad&\simeq\quad B \to (A \to C), \\
(A \to C) \to B \quad&\simeq\quad (B \to C) \to A,
\end{aligned}\]
where \(A :\equiv{P},\) \(B :\equiv{Q},\) \(C :\equiv\mathbf{0}.\) The first classical equivalence is intuitionistically valid. After uncurrying both sides of \[A \to (B \to C) \quad\simeq\quad B \to (A \to C),\] we obtain the obvious equivalence \[A \times B \to C \quad\simeq\quad B \times A \to C.\] On the other hand, the general form of the second equivalence makes no sense at all. Taking \(C\) to be just about anything other than \(\mathbf{0}\) leads to obviously false equivalences such as \(\mathbf{1}\to A \simeq\mathbf{1}\to B\) when \(C :\equiv\mathbf{1}.\) Indeed, the second classical equivalence is not intuitionistically valid since taking \(Q :\equiv\lnot P\) leads to \[\lnot P \to \lnot P \quad\leftrightarrow\quad \lnot \lnot P \to P\] and double negation elimination is not intuitionistically valid.


Notes

  1. The equivalence of \[\textstyle\prod_{x:A} (B(x) \to C) \quad\simeq\quad \left({\textstyle\sum_{x:A} B(x)}\right)\to C\] is the dependent type analogue of currying/uncurrying. Indeed, if instead of the dependent type \(B:A \to \mathcal{U}\) we have a simple type \(B,\) we obtain the familiar equivalence \[A \to (B \to C) \quad\simeq\quad (A \times B) \to C.\] In full generality, we have the unwieldly equivalence \[\prod_{(a,b):\sum_{x:A} B(x)} C(a,b) \quad\simeq\quad \prod_{a:A} \prod_{b:B(a)} C(a,b),\] where \(B:A \to \mathcal{U}\) and \(C:\sum_{x:A} B(x) \to \mathcal{U}\) [§2.15].

  2. This is the standard trick to work with propositional truncations: so long as \(P\) is a proposition, \(\Vert A \Vert \to P\) is equivalent to \(A \to P.\) This is very important for handling disjunctions and existential quantifiers. Since \(P \lor Q\) "forgets" which of \(P\) and \(Q\) holds, it is not generally possible to do a case analysis. However, if \(R\) is a proposition then \(P \lor Q \to R\) is equivalent to \(P + Q \to R\) and the coproduct \(P + Q\) does allow for case analysis. This explains why in addition to \[\lnot P \land \lnot Q \quad\leftrightarrow\quad \lnot(P \lor Q),\] the following "halves" of De Morgan’s laws hold in intuitionistic logic \[\begin{aligned}
    P \land Q \quad&\to\quad \lnot(\lnot P \lor \lnot Q), \\
    P \lor Q \quad&\to\quad \lnot(\lnot P \land \lnot Q), \\
    \lnot P \lor \lnot Q \quad&\to\quad \lnot(P \land Q),
    \end{aligned}\]
    while their converses do not.

 

HoTT Math 3: Unit group of a ring

In this installment of HoTT Math, we are taking one more step toward elementary field theory by exploring the unit group of a ring.


A commutative (unital) ring is a set \(R\) with two constants \(\mathsf{O},\mathsf{I}:1\to R\), one unary operation \({-\square}:R \to R\), two binary operations \({\square+\square},{\square\cdot\square}:R \times R \to R\) along with the usual axioms: \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(\mathsf{O},-\square,\square+\square)\), \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(\mathsf{I},{\square\cdot\square})\), \[{\small\fbox{$\phantom{b}$distributivity$\phantom{q}$}}({\square+\square},{\square\cdot\square}) : \prod_{x,y,z:R} ((x+y)\cdot z =_R x \cdot z + y \cdot z) \times \prod_{x,y,z:R} (x\cdot(y+z) =_R x \cdot y+x \cdot z),\] and \[{\small\fbox{$\phantom{b}$commutativity$\phantom{q}$}}({\square\cdot\square}) : \prod_{x,y:R} (x \cdot y =_R y \cdot x).\] Commutativity of addition can be derived as usual using the fact that \[x+(x+y)+y =_R (x+y)\cdot(\mathsf{I}+\mathsf{I}) =_R x + (y + x) + y.\] If you want to test your path manipulation skills, you can try to write down the resulting term for \[{\small\fbox{$\phantom{b}$commutativity$\phantom{q}$}}({\square+\square}) : \prod_{x,y:R} (x + y =_R y+x).\] However, as we discussed last time, this is not necessary since the proof is ultimately not relevant when the carrier of an algebra is a set.

Reading off the usual definition of unit, we obtain the type \[\operatorname{\mathsf{unit}}(x) :\equiv\sum_{y:R} (x\cdot y =_R \mathsf{I}).\] Elements of type \(\operatorname{\mathsf{unit}}(x)\) are pairs \((y,p)\), where \(y:R\) and \(p: x \cdot y =_R \mathsf{I}\). Because \(R\) is a set and inverses are unique, there is always at most one such \((y,p)\), which means that \(\operatorname{\mathsf{unit}}(x)\) is a proposition. Therefore, we can comprehend the set of units: \[R^\times :\equiv\{x:R \mid \operatorname{\mathsf{unit}}(x)\} :\equiv\sum_{x:R} \operatorname{\mathsf{unit}}(x).\] (Subset comprehension is discussed in §3.5 of the book.) Technically, elements of \(R^\times\) are triples \((x,y,p)\) where \(x,y:R\) and \(p:(x \cdot y =_R \mathsf{I})\). Thus \(R^\times\) is not merely a subset of \(R\) but each element of \(R^\times\) comes equipped with a justification for being in \(R^\times\). Since for each \(x:R\) there is at most one \((x,y,p):R^\times\), we can identify the two without much confusion but the extra coordinates are actually helpful for proving things about \(R^\times\).

To illustrate this, let’s outline the verification that \(R^\times\) is a group under multiplication. We will view elements of \(R^\times\) as pairs \((x,y)\) but we will forget the path witnessing that \(x \cdot y =_R \mathsf{I}\). Since we’re working with sets, we shouldn’t worry about paths anyway.1 To begin, we need to isolate the group operations:

  • First, we see that \((\mathsf{I},\mathsf{I}):R^\times\) since we have a path \(\mathsf{I}\cdot\mathsf{I}=_R \mathsf{I}\) by the identity axiom; this will be our group identity.

  • Next, we see that if \((x_0,y_0),(x_1,y_1):R^\times\) then \((x_0 \cdot x_1,y_1 \cdot y_0):R^\times\). Indeed, if \(x_0 \cdot y_0 =_R \mathsf{I}\) and \(x_1 \cdot y_1 =_R \mathsf{I}\) then \[x_0 \cdot x_1 \cdot y_1 \cdot y_0 =_R x_0 \cdot \mathsf{I}\cdot y_0 =_R x_0 \cdot y_0 =_R \mathsf{I}.\] From this, we directly extract our group multiplication \(\square\cdot\square:R^\times \times R^\times \to R^\times\).

  • Finally, we see that if \((x,y):R^\times\) then \((y,x):R^\times\). (This is immediate from commutativity but a slightly longer argument shows that this works even without this assumption.) As a result, we get the group inverse \(\square^{-1}:R^\times \to R^\times\).

To conclude the argument, it suffices to verify that \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(R^\times,\square^{-1},\square\cdot\square)\) is inhabited. Associativity and identity are immediate consequences of \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(R,\mathsf{I},\square\cdot\square)\). The tacit path coordinates are handy for inverses: the path \((x,y)\cdot(y,x) =_{R^\times} (\mathsf{I},\mathsf{I})\) simply consists of the two implied paths \(x \cdot y =_R \mathsf{I}\) and \(y \cdot x =_R \mathsf{I}\)!

The key difference between this proof and the classical proof that \(R^\times\) is a group is the way we used the definition of \(R^\times\). At each step of the classical proof, we need to invoke the definition of \(R^\times\) to get the relevant inverses, do the same computations, and then forget the extra information. In the proof-relevant argument, the subset \(R^\times\) incorporates its definition and the relevant inverses are already there to be used in computations. To keep the argument from involving too many (ultimately irrelevant) path computations, we still forgot one piece of the definition of \(R^\times\) in the outline above. This kind of selective unraveling can be useful since formal definitions can rapidly become unwieldy. We did end up invoking the path coordinates at the very end to verify the identity axiom. However, we didn’t really need the paths themselves, we just needed to remember that elements \((x,y)\) aren’t arbitrary pairs, they are pairs such that \(x \cdot y =_R \mathsf{I}\). This is what the seemingly irrelevant path coordinate does, the relevant data is not the path itself, it is the type \(x \cdot y =_R \mathsf{I}\) of the path that matters.


The lesson is that while proof-relevant mathematics does force you to carry extra baggage, that baggage is actually handy. Moreover, you can always manage the burden by selectively unraveling the contents of the baggage. In fact, you also carry this baggage when doing classical mathematics except that you conceal the baggage in your memory, recalling each item as you need it. The problem with this strategy is that it’s easy to forget things if you are not careful. Proof-relevant mathematics forces you to keep track of everything!


Post Scriptum

In the comments, Toby Bartels pointed out that the argument above contains a major type-theoretic faux pas: using type declarations as propositions. The cause of this is the deliberate omission of the path coordinate which leads to phrases like "we see that \((\mathsf{I},\mathsf{I}):R^\times\) since […]" and "we see that if \((x,y):R^\times\) then \((y,x):R^\times\)." The problem is that type declarations are either correct or not (and this is decidable!) but not true or false. Indeed, true and false are both values that can be used but correct and incorrect are not — there is no value in being incorrect! This may seem like a subtle difference but it is actually a very important one in type theory since there is little other separation of syntax and semantics.

Below, I produced three more variants of the above argument that avoid the trouble with Version 0 above. The first is simply does not omit the path coordinates. The second uses the membership relation \(\in\) where the original used an erroneous type declaration. The third is a direct translation of the classical proof, also using the membership relation. To facilitate comparison, I tried to keep the three versions as close as possible to the original. I am curious to know which of the four versions you prefer, or if you have yet another version that you prefer!


Version 1. Let’s outline the verification that \(R^\times\) is a group under multiplication. To begin, we need to isolate the group operations:

  • First, \((\mathsf{I},\mathsf{I},i):R^\times\) is our group identity where \(i:\mathsf{I}\cdot\mathsf{I}=_R \mathsf{I}\) is an instance of the identity axiom.

  • Next, the group product is defined by \((x_0,y_0,p_0)\cdot(x_1,y_1,p_1) := (x_0 \cdot x_1,y_1 \cdot y_0,q)\), where \[q:x_0 \cdot x_1 \cdot y_1 \cdot y_0 =_R x_0 \cdot \mathsf{I}\cdot y_0 =_R x_0 \cdot y_0 =_R \mathsf{I}.\]

  • Finally, group inverses are defined by \((x,y,p)^{-1} := (y,x,q)\), where \(q:y \cdot x =_R x \cdot y =_R \mathsf{I}.\) (A slightly longer argument shows that this works even without the commutativity assumption.)

To conclude the argument, it suffices to verify that \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(R^\times,\square^{-1},\square\cdot\square)\) is inhabited. Associativity and identity are immediate consequences of \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(R,\mathsf{I},\square\cdot\square)\). The path \((x,y,p)\cdot(y,x,q) =_{R^\times} (\mathsf{I},\mathsf{I},i)\) is composed of \(p:x \cdot y =_R \mathsf{I}\), \(q:y \cdot x =_R \mathsf{I}\), and the paths must match since \(R\) is a set.


Version 2. Let’s outline the verification that \(R^\times\) is a group under multiplication. By virtue of univalence, there are multiple ways to look at \(R^\times\). Instead of viewing it as a subset of \(R\) we can also view it as a subset of \(R \times R\), namely \[R^\times = \{(x,y):R \times R \mid x \cdot y =_R \mathsf{I}\} = \sum_{x,y:R} (x \cdot y =_R \mathsf{I}).\] Thus \((x,y) \in_{R \times R} R^\times\) denotes the proposition \(x \cdot y =_R \mathsf{I}.\) It turns out that this view of \(R^\times\) will be easier to work with. To begin, we need to isolate the group operations:

  • First, we see that \((\mathsf{I},\mathsf{I}) \in_{R\times R} R^\times\) since we have a path \(\mathsf{I}\cdot\mathsf{I}=_R \mathsf{I}\) by the identity axiom; this will be our group identity.

  • Next, we see that if \((x_0,y_0),(x_1,y_1) \in_{R \times R} R^\times\) then \((x_0 \cdot x_1,y_1 \cdot y_0) \in_{R \times R} R^\times\) too. Indeed, if \(x_0 \cdot y_0 =_R \mathsf{I}\) and \(x_1 \cdot y_1 =_R \mathsf{I}\) then \[x_0 \cdot x_1 \cdot y_1 \cdot y_0 =_R x_0 \cdot \mathsf{I}\cdot y_0 =_R x_0 \cdot y_0 =_R \mathsf{I}.\] From this, we directly extract our group multiplication \(\square\cdot\square:R^\times \times R^\times \to R^\times\).

  • Finally, we see that if \((x,y) \in_{R \times R} R^\times\) then \((y,x) \in_{R \times R} R^\times\) since \(x \cdot y =_R \mathsf{I}\) implies \(y \cdot x =_R \mathsf{I}\) by commutativity. (A slightly longer argument shows that this works even for non-commutative rings.) As a result, we get the group inverse \(\square^{-1}:R^\times \to R^\times\).

To conclude the argument, it suffices to verify that \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(R^\times,\square^{-1},\square\cdot\square)\) is inhabited. Associativity and identity are immediate consequences of \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(R,\mathsf{I},\square\cdot\square)\). The path \((x,y)\cdot(y,x) =_{R^\times} (\mathsf{I},\mathsf{I})\) is composed of \(x \cdot y =_R \mathsf{I}\) and \(y \cdot x =_R \mathsf{I}\), which hold because \((x,y), (y,x) \in_{R \times R} R^\times\).


Version 3. Let’s outline the verification that \(R^\times\) is a group under multiplication. Per definition of \(R^\times\), we will write \(x \in_R R^\times\) to mean \(\mathsf{unit}(x)\). To begin, we need to isolate the group operations:

  • First, we see that \(\mathsf{I}\in_{R} R^\times\) since \(\mathsf{I}\cdot\mathsf{I}=_R \mathsf{I}\) by the identity axiom; this will be our group identity.

  • Next, we see that if \(x_0,x_1 \in_{R} R^\times\) then \(x_0 \cdot x_1 \in_{R} R^\times\) too. Indeed, since \(\mathsf{unit}(x_0)\) and \(\mathsf{unit}(x_1)\) there are \(y_0,y_1:R\) such that \(x_0 \cdot y_0 =_R \mathsf{I}\) and \(x_1 \cdot y_1 =_R \mathsf{I}.\) Then \[x_0 \cdot x_1 \cdot y_1 \cdot y_0 =_R x_0 \cdot \mathsf{I}\cdot y_0 =_R x_0 \cdot y_0 =_R \mathsf{I}\] which shows that \(\mathsf{unit}(x_0 \cdot x_1).\) Thus our group multiplication is (the restriction of) the usual ring multiplication.

  • Finally, we see that if \(x \in_{R} R^\times\) then, by definition of the proposition \(\mathsf{unit}(u)\), there is a unique \(y:R\) such that \(x \cdot y =_R \mathsf{I}\). By commutativity, \(y \cdot x =_R \mathsf{I}\) which means that \(y \in_{R} R^\times\) and this \(y\) is the group inverse of \(x\).

To conclude the argument, it suffices to verify that \({\small\fbox{$\phantom{b}$group$\phantom{q}$}}(R^\times,\square^{-1},\square\cdot\square)\) is inhabited. Associativity and identity are immediate consequences of \({\small\fbox{$\phantom{b}$monoid$\phantom{q}$}}(R,\mathsf{I},\square\cdot\square)\). The remaining identity \(x \cdot x^{-1} =_{R^\times} \mathsf{I}\) follows immediately from the definition of group inverses.

 

HoTT Math 2: More on equational logic

Last time, I promised we would look at fields. I have to delay this by one or two installments of HoTT Math since there is so much to say and I am struggling to keep these short. This edition of HoTT Math is a continuation of the first. The main lesson of HoTT Math 1 was that equational logic still works, provided you’re careful enough. As I was working through some algebra, I realized that I should have been more precise and nuanced. To make things more precise, I will briefly recall some basic universal algebra and then I will talk a little about Birkhoff’s soundness and completeness theorem for equational logic.


A signature is a sequence \(\sigma\) of sets indexed by natural numbers. The elements of the set \(\sigma_n\) are intended to be symbols for \(n\)-ary operations (\(0\)-ary operations are simply constants). For example, the signature for rings can be thought as \[(\{0,1\},\{-\},\{+,\cdot\},\varnothing,\varnothing,\dots).\] It is generally assumed that the sets \(\sigma_n\) are mutually disjoint so that each symbol has a definite arity. In the end, only the cardinality of the sets \(\sigma_n\) matters since the particular symbols used for the operations is mostly a matter of taste.

The logic of universal algebra is equational logic. The symbols from the signature \(\sigma\) can be strung together to form terms. In addition to the symbols from \(\sigma\), we need an infinite set \(\{x_0,x_1,x_2,\dots\}\) of variable symbols. Formally, terms are defined inductively using the two rules:

  • Every variable symbol is a term.
  • If \(t_1,\dots,t_n\) are terms and \(f\) is an element of \(\sigma_n\), then \(ft_1\dots t_n\) is also a term.

Equational logic deals with identities of the form \(t \approx s\) where \(t\) and \(s\) are terms. The rules for equational logic are reflexivity, symmetry, transitivity \[\frac{}{t \approx t}, \quad \frac{t \approx s}{s \approx t}, \quad \frac{t \approx s, s \approx r}{t \approx r}\] together with substitution and replacement \[\frac{t \approx s}{t[x/r] \approx s[x/r]}, \quad \frac{s \approx r}{t[x/r] \approx t[x/s]},\] where \([x/r]\) denotes the act of replacing every occurrence of the variable symbol \(x\) with the term \(r\). We write \(\Gamma \vdash t \approx s\) to signify that the identity \(t \approx s\) follows using a combination of these rules from the identities in \(\Gamma\).

An algebra \(\mathfrak{A}\) (with signature \(\sigma\)) is a type \(A\) together with an interpretation \[I:\left({\textstyle\sum_{n:\mathbb{N}} \sigma_n \times A^n}\right) \to A.\] If \(f\) is a symbol for an \(n\)-ary operation in \(\sigma_n\), then the interpretation \(f^{\mathfrak{A}}:A^n \to A\) is \[f^{\mathfrak{A}}(a_1,\dots,a_n) := I(f,a_1,\dots,a_n).\] These interpretations can be used to evaluate any term starting from a variable assignment \(\mathcal{V}:\mathbb{N}\to A\) in the usual recursive manner: \[x_i^{\mathfrak{A},\mathcal{V}} := \mathcal{V}(i), \quad (f t_1 \dots t_k)^{\mathfrak{A},\mathcal{V}} := f^{\mathfrak{A}}(t_1^{\mathfrak{A},\mathcal{V}},\dots,t_k^{\mathfrak{A},\mathcal{V}}).\] The satisfaction relation for the algebra \(\mathfrak{A}\) and the identity \(t \approx s\) is then the type \[(\mathfrak{A} \vDash t \approx s) :\equiv \prod_{\mathcal{V}:\mathbb{N}\to A} (t^{\mathfrak{A},\mathcal{V}} =_A s^{\mathfrak{A},\mathcal{V}}).\] If the carrier \(A\) is a set then each identity type \(t^{\mathfrak{A},\mathcal{V}} =_A s^{\mathfrak{A},\mathcal{V}}\) has at most one element and hence \(\mathfrak{A} \vDash t \approx s\) is a proposition with the usual meaning. In the general case, \(\mathfrak{A} \vDash t \approx s\) can carry substantial path information.

The fact that equational logic is sound and complete for the semantics is called Birkhoff’s Theorem: \[\Gamma \vdash t \approx s \quad\text{if and only if}\quad \Gamma \vDash t \approx s.\] Clasically, the right hand side means that every algebra \(\mathfrak{A}\) with signature \(\sigma\), whose carrier \(A\) is a set, that satisfies all equations in \(\Gamma\) also satisfies \(t \approx s\). In HoTT, this translates to the type \[(\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s) :\equiv \prod_{\mathfrak{A}/\operatorname{\mathsf{set}}} \left(\prod_{u \approx v:\Gamma} (\mathfrak{A} \vDash u \approx v) \to (\mathfrak{A} \vDash t \approx s)\right)\] where the outer product is over all set-based algebras \(\mathfrak{A}\) with signature \(\sigma\). An alternate interpretation is \[(\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s) :\equiv \prod_{\mathfrak{A}/\operatorname{\mathsf{any}}} \left(\prod_{u \approx v:\Gamma} (\mathfrak{A} \vDash u \approx v) \to (\mathfrak{A} \vDash t \approx s)\right)\] where the outer product is over algebras with signature \(\sigma\) based on any type. Consequently there are two possible ways to interpret Birkhoff’s Theorem in HoTT!

The first is closest to the classical version:

Weak Birkhoff Theorem. \(\left\Vert\Gamma \vdash t \approx s\right\Vert\) is equivalent to \(\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s\).

The vertical lines \(\Vert\square\Vert\) indicate that one should take the propositional truncation of \(\Gamma \vdash t \approx s\) (§3.7). Indeed, when translated into HoTT, \(\Gamma \vdash t \approx s\) is the type of all proofs of \(t \approx s\) from \(\Gamma\) in equational logic. The propositional truncation agglomerates all these proofs into one and simply asserts the unqualified existence of such a proof. This is necessary since the right hand side \(\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s\) is a proposition so the backward implication, without propositional truncation, would amount to a canonical choice of proof with basically no information.

On the other hand, \(\Gamma \vDash_{\text{any}} t \approx s\) is much more expressive because of all the possible identity paths. With all that information on hand, it is not so unlikely that we could reconstruct a proof. This suggests another interpretation of Birkhoff’s Theorem:

Strong Birkhoff Theorem. \(\Gamma \vdash t \approx s\) is equivalent to \(\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s\).

As I stated it just now, this is unlikely to be true. However, I suspect it becomes true with some tweaking of \(\Gamma \vdash t \approx s\) to only allow proofs that are normalized in some sense or to formally identify proofs that are trivial variations of each other. I haven’t had much time to think about this so I’ll leave it as an open conjecture for now and hope that I can come back to it later…

The soundness part (forward implication) of both forms are true. Last time, I emphasized the usefulness of the weak soundness theorem but I forgot to talk about substitution and replacement! We did talk about how paths could be reversed and composed to get symmetry and transitivity; details are in §2.1 of the book. Since I think it’s important to know what goes on behind the scenes, let’s talk about the remaining two rules. Unfortunately, and this explains the earlier omission, neither is particularly pleasant.

Substitution works exactly as you expect. An element \(p:(\mathfrak{A} \vDash t \approx s)\) is a function that maps each variable assignment \(\mathcal{V}:\mathbb{N}\to A\) to a path \(p(\mathcal{V}):t^{\mathfrak{A},\mathcal{V}} =_A s^{\mathfrak{A},\mathcal{V}}\). Given a term \(r\) and a variable \(x\), the corresponding element of \((\mathfrak{A} \vDash t[x/r] \approx s[x/r])\) is the composition of \(p\) with the function that sends each variable assignment \(\mathcal{V}:\mathbb{N}\to A\) to the assignment where all values stay the same except that the value of \(x\) is changed to \(r^{\mathfrak{A},\mathcal{V}}\). Verifying that the result really is an element of \(\mathfrak{A} \vDash t[x/r] \approx s[x/r]\) is a tedious computation which is best performed under a rug.

The trick behind replacement is revealed in §2.2: functions are functors. The identity types give each type a higher groupoid structure and functions act functorially in the sense that with any \(f:A \to B\) comes with more functions \(\mathsf{ap}_f\) which translate paths \(p:x =_A y\) into paths \(\mathsf{ap}_f(p):f(x) =_B f(y)\). The details for soundness of the replacement rule are gory but they are essentially the same as in the classical proof. There is some work needed to single out the variable \(x\) and interpret the term \(t\) as a function \(t^{\mathfrak{A},\mathcal{V}}_x:A \to A\) where all other variables are fixed. Then, \(\mathsf{ap}_{t^{\mathfrak{A},\mathcal{V}}_x}\) can be used to transform paths \(s^{\mathfrak{A},\mathcal{V}} =_A r^{\mathfrak{A},\mathcal{V}}\) into paths \(t^{\mathfrak{A},\mathcal{V}}_x(s^{\mathfrak{A},\mathcal{V}}) =_A t^{\mathfrak{A},\mathcal{V}}_x(r^{\mathfrak{A},\mathcal{V}})\). Thus, we obtain \[(\mathfrak{A} \vDash s \approx r) \to (\mathfrak{A} \vDash t[x/s] \approx t[x/r]).\]

Since each rule of equational logic is sound, we therefore have the strong soundness \((\Gamma \vdash t \approx s) \to (\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s)\). To get weak soundness, we compose this with the obvious implication \((\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s) \to (\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s)\) and we conclude that \(\left\Vert\Gamma \vdash t \approx s\right\Vert \to (\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s)\) from the fact that \(\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s\) is a proposition. The completenesss part of Birkhoff’s theorem uses syntactic algebras and HoTT has a really neat way to construct these that we will talk about in a later HoTT Math post.


In conclusion, equational logic works just fine in HoTT. In general, because of proof relevance, it is best to keep track of the equational proofs in order to keep track of the identity paths. The strong soundness map \[(\Gamma \vdash t \approx s) \to (\Gamma \vDash_{\operatorname{\mathsf{any}}} t \approx s)\] guarantees that equational proofs will lead to identities, but different proofs can lead to different identity paths! When the carrier is a set, less care is necessary, the weak soundness map \[\Vert\Gamma \vdash t \approx s\Vert \to (\Gamma \vDash_{\operatorname{\mathsf{set}}} t \approx s)\] shows that the mere existence of an equational proof leads to the desired identities. This is exactly how algebra and equational logic are used classically.

 

HoTT Math 1: Elementary group theory

This is the first in a series of posts on doing mathematics in Homotopy Type Theory (HoTT). Various conventions and notations are explained in the preamble; there are no additional prerequisites for this post.


A group is a set \(G\) equipped with a constant \(e:1 \to G\), a unary operation \(\square^{-1}:G \to G\) and a binary operation \({\square\cdot\square}:G \times G \to G\) that satisfy the usual axioms. In HoTT the group axioms must be motivated, so the group \(G\) comes with three more components: \[\begin{aligned}
{\small\fbox{$\phantom{Q}$associativity$\phantom{Q}$}} &: \prod_{x,y,z:G} (x\cdot(y\cdot z) =_G (x\cdot y)\cdot z) \\
{\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}} &: \prod_{x:G} (x \cdot e =_G x) \times \prod_{x:G}(e \cdot x = x)\\
{\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}} &: \prod_{x:G} (e = x^{-1} \cdot x) \times \prod_{x:G} (e =_G x \cdot x^{-1}) \\
\end{aligned}\]
The axioms are concrete objects in proof-relevant mathematics! The last two axioms are actually the conjunction of two axioms since conjunction in HoTT correspond to products. It’s conceptually convenient to package them together and use \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_1\) and \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_2\) for the two conjuncts obtained by applying the projections. In fact, it makes sense to pack all of these into one \[{\small\fbox{$\phantom{Q}$group$\phantom{Q}$}} :\equiv{\small\fbox{$\phantom{Q}$associativity$\phantom{Q}$}}\times{\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}\times{\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}}\] (but, for the sake of clarity, it is best to refrain from using things like \({\small\fbox{$\phantom{Q}$group$\phantom{Q}$}}_1\) for \({\small\fbox{$\phantom{Q}$associativity$\phantom{Q}$}}\)).

The axioms types above are parametrized by \(G:\mathsf{Set}\) and the three components \(e\), \(\square^{-1}\), \(\square\cdot\square\), so one can form the type of all groups: \[\mathsf{Grp}:\equiv\sum_{\substack{G:Set\\e:1\to G\\\square^{-1}:G\to G\\\square\cdot\square:G\times G \to G}} {\small\fbox{$\phantom{Q}$group$\phantom{Q}$}}(G,e,\square^{-1},\square\cdot\square).\] This level of formalism is cumbersome and since it is perfectly clear what the type \(\mathsf{Grp}\) is from the narrative description, it is best to avoid it. The only difference from classical mathematics is that the axioms are given as concrete objects rather than abstract statements.

Familiar identities, for example the fact that \(e\) is its own inverse, can be obtained by combining the group axioms. We have a path \({\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}}_1(e):e =_G e^{-1}\cdot e\) and another path \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_1(e^{-1}):e^{-1}\cdot e =_G e^{-1}\). Concatenating these two paths yields the desired path \[{\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}}_1(e)\cdot{\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_1(e^{-1}):e^{-1} =_G e.\]
There are other ways to see this, for example, symmetric reasoning yields \[{\small\fbox{$\phantom{Q}$inverse$\phantom{Q}$}}_2(e)\cdot{\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_2(e^{-1}):e^{-1} =_G e.\] In general, these two paths are need not be the same but since \(G\) is a set, i.e., a 0-type, there is at most one path in \(e^{-1} =_G e\) and therefore the two paths above must be the same.

Let’s try something a tad more complicated — an old favorite — the uniqueness of identity elements. To say that \(u:G\) is a left identity element means exhibiting an element of the type \[\prod_{x:G} (u \cdot x =_G x);\] so \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_1\) shows that \(e\) is an identity element. Similarly, \({\small\fbox{$\phantom{Q}$identity$\phantom{Q}$}}_2\) shows that \(e\) is a right identity element. Classically, we know that if \(u\) is a left identity and if \(v\) is a right identity then we must have \(u = v\). In HoTT, this corresponds to exhibiting an element of type \[\mathsf{lrid}:\equiv\prod_{u,v:G} \Big(\prod_{x:G} (u \cdot x =_G x) \times \prod_{y:G} (y \cdot v =_G y) \to (u =_G v)\Big).\]

To prove this, let’s fix \(u,v:G\). Given \(p:\prod_{x:G} (u \cdot x =_G x)\) and \(q:\prod_{y:G} (y \cdot v =_G y)\) we have paths \[\begin{aligned} p(v) &: u \cdot v =_G v, & q(u)&:u \cdot v =_G u,\end{aligned}\] and therefore a path \[q(u)^{-1} \cdot p(v): u =_G v.\] The classical proof would end here but to get the desired element of \(\mathsf{lrid}\), we need to wrap things up as follows. Since the final path was obtained uniformly from \(p,q\), we have an element \[\lambda p,q.q(u)^{-1} \cdot p(v):\prod_{x:G} (u \cdot x =_G x) \times \prod_{y:G} (y \cdot v =_G y) \to (u =_G v)\] and we can then unfix \(u,v:G\) to obtain the desired element \[\lambda u,v.\lambda p,q . q(u)^{-1} \cdot p(v):\mathsf{lrid}.\]
I slightly abused \(\lambda\)-notation in the argument above but the idea was only to give a hint how the argument could be formalized. In fact, we did not need to worry about this at all. Since \(G\) is a set, there is at most one path in \(u =_G v\) and thus there is also at most one element in \[\prod_{x:G} (u \cdot x =_G x) \times \prod_{y:G} (y \cdot v =_G y) \to (u =_G v)\] by function extensionality (§4.9). Therefore, the unique choice principle (§3.9) applies to give the desired element of \(\mathsf{lrid}\). In fact, \(\mathsf{lrid}\) also has exactly one element by function extensionality.

In the end, the classical proof of uniqueness of inverses was sufficient and unambiguous. In fact, the same is true for essentially all similar facts of elementary algebra. This is good for multiple reasons but most importantly this means that doing math in HoTT does not involve getting caught up in elementary stuff like this: you can invoke \(\mathsf{lrid}\) any time without referencing a particular proof since the proof is unique. There is one important caveat:

It is very important that the carrier type of a group is a set!

I fell into this trap when I asked this MathOverflow question. It is very tempting to think that the loop space \(\Omega(A,x) :\equiv(x =_A x)\) is a group for every \(x:A\). This is only true if \(A\) is a 1-type. Otherwise the carrier of \(\Omega(A,x)\) is not necessarily a 0-type, the uniqueness of paths is lost, \(\mathsf{lrid}\) can have many different proofs, which are all relevant and must not be forgotten!


In conclusion, elementary group theory works fine in HoTT. The general feel is a bit different but it’s more fun than cumbersome to see how the axioms work in relevant mathematics. The same is true for much of elementary algebra and, ultimately, proof relevance is never cumbersome since proofs are almost always unique. It is important to remember that the carrier of an algebraic structure must always be a set for this to work!

In the next installment of HoTT Math, we will continue with elementary field theory which presents an additional difficulty since we must handle the fact that multiplicative inverses do not always exist…

 

HoTT Math Series

I am planning to do a series of posts where I attempt to do math in Homotopy Type Theory (HoTT). The plan is to do some relatively simple proof-relevant mathematics at an informal level. The topics will all be undergraduate level so the mathematics won’t be hard to follow. I’m hoping to keep the series brief so each post will only be an appetizer and not a full course dinner. Enjoy!

This preamble will serve to accumulate a table of contents and various conventions and notations that come up along the way. The only prerequisites (or rather corequisites) are the first two chapters of the (free) Homotopy Type Theory book. Further prerequisites and reverences to later topics in the book will always be indicated where they occur.


  1. Elementary group theory
  2. More on equational logic
  3. Unit group of a ring
  4. Local rings and fields
 

Homotopy Type Theory

After a productive year at the Institute for Advanced Study, the Univalent Foundations Program has written a book on Homotopy Type Theory (HoTT). The foreword gives a succinct description of the purpose of this book:

We did not set out to write a book. The present work has its origins in our collective attempts to develop a new style of "informal type theory" that can be read and understood by a human being, as a complement to a formal proof that can be checked by a machine. Univalent foundations is closely tied to the idea of a foundation of mathematics that can be implemented in a computer proof assistant. Although such a formalization is not part of this book, much of the material presented here was actually done first in the fully formalized setting inside a proof assistant, and only later "unformalized" to arrive at the presentation you find before you — a remarkable inversion of the usual state of affairs in formalized mathematics.

The danger in writing such a book is to fall into the minefields of logic wars. The authors successfully avoided much of these traps, so logicians from other perspectives can read the book without too much cringing. To avoid unnecessary confusion, I recommend mentally substituting most instances of "set theory" by the more apropos "classical mathematics." Readers from strongly opposing points of view should be prepared for a certain amount of propaganda, which is to be expected in a book written to promote one point of view. Barring these caveats, you will find an enjoyable and well-written book on a very interesting subject. Readers should not be too concerned with the word "homotopy" in the title, homotopy theory is not required background for the book though some basic knowledge of the ideas of topology and homotopy theory helps to understand the motivation behind certain concepts.

Having addressed all the necessary caveats, let’s talk about why this book is interesting and why you should read it…


What is so hot about HoTT?

The most interesting aspect from my point of view is that HoTT fully supports proof-relevant mathematics, a way of doing mathematics where proofs are real objects that are manipulated on the same level as numbers, sets, functions and all the usual objects of classical mathematics. This is not a brand new idea, logicians have been playing with proofs in this way for a long while, but HoTT brings this idea into the realm of everyday mathematics and that is a major step forward in mathematics.

The key difference with first-order logic is that equality is not primitive. To define a type \(A\) one must also define what equality means for \(A\). Formally if \(x,y:A\) then \(x =_A y\) (or \(\mathsf{Id}_A(x,y)\)) is another type, an identity type, which can be thought of as made of reasons to identify \(x\) and \(y\). Elements \(p:x =_A y\) are often called "paths" by analogy with topolgy. Indeed, these paths can be inverted and concatenated to realize symmetry and transitivity of equality, respectively; reflexivity is realized by a path \(\mathsf{refl}_x:x =_A x\). Thus each type \(A\) is actually a groupoid rather than a plain set. In fact, since each \(x =_A y\) is itself a type with its own identity types and so forth, the type \(A\) is actually a kind of \(\infty\)-groupoid.

It is this rich structure associated with each type is what permits HoTT to support proof relevant mathematics. To get a basic feel of how this works, the statement "\(x =_A y\) and \(y =_A z\)" is interpreted via the product type \((x =_A y)\times(y =_A z)\), whose elements are pairs of paths that explain why \(x\) is to be identified with \(y\) and why \(y\) is to be identified with \(z\). Similarly, "\(x =_A y\) or \(y =_A z\)" is interpreted via the coproduct type \((x =_A y) + (y =_A z)\), whose elements are either paths that explain why \(x\) is to be identified with \(y\) or paths that explain why \(y\) is to be identified with \(z\). The catch, as you may have guessed from the last example, is that this form of constructive reasoning is intuitionistic and thus not as familiar to mathematicians.

Interestingly, the learning curve for constructive reasoning appears to be much less steep with HoTT than with other constructive frameworks. One of the reasons is that the topological interpretation of the key concepts is very intuitive but more significantly HoTT provides many tools to revert to more familiar territory. The analogue of a plain set in HoTT is a \(0\)-type: a type \(A\) where the identity types \(x =_A y\) always contain at most one path. In other words, these are types where the groupoid structure is trivial and contains no other information than how to handle equality of elements. It is consistent with HoTT that the \(0\)-types form a model of ETCS, a classical theory of sets and functions. Thus, by "truncating" thoughts to \(0\)-types, one can revert to a more familiar classical setting.


What is the big deal with univalence?

It is natural to identify things that are not significantly different. For example, the axiom of extensionality in set theory identifies sets that have the same elements since the elements of a set are all that matter in this context. Extensionality for functions identifies functions that agree on all inputs. Univalence is an indiscernibility axiom in the same spirit: it identifies types that are not significantly different.

To make sense of equality for types, we first need to put them in an ambient type, a universe, with its associated identity types. We can’t have a truly universal type since that directly leads to the usual paradoxes of self-reference. Instead, we have a bunch of universes such that each type belongs to some universe and each universe is closed under the basic type formation rules. Once we have a universe \(\mathcal{U}\) we can talk about equality of types in \(\mathcal{U}\), and because \(\mathcal{U}\) is a type we have a lot of freedom in defining what equality means for types in \(\mathcal{U}\).

This is exactly what the univalence axiom does. It can be stated elegantly: \[(A \simeq B) \simeq (A =_\mathcal{U} B).\] The equivalence relation \({\simeq}\) is similar to isomorphism but it is slightly more permissive. To say \(A \simeq B\) requires the existence of a function \(f:A \to B\) together with \(\ell,r:B \to A\) such that \(\ell \circ f\) is homotopic to \(\mathsf{id}_A\) and \(f \circ r\) is homotopic to \(\mathsf{id}_B\). Given two functions \(f\) and \(g\) of the same dependent product type \(\prod_{x:A} B(x)\), a homotopy from \(f\) to \(g\) is an element of \(\prod_{x:A} (f(x) =_{B(x)} g(x))\). So \(f\) and \(g\) are homotopic if they agree on all inputs, which does not mean that \(f = g\) in the absence of function extensionality.

In general type theory, \(A \simeq B\) is definitely a good way to say "\(A\) and \(B\) are not siginificantly different" and thus univalence arguably captures the right indiscernibility axiom for type theory. The surprising fact is that such a strong axiom does not appear to collapse more than it should. The benefits of univalence are interesting and need to be explored further.


I still need to digest the book so this is probably only the first of many posts on HoTT. The next posts will undoubtedly wander deeper into the technical levels of HoTT. There are a few interesting leads but nothing definite yet.

There is only one thing that bugs me right now, which is the way universes are handled in the book. However, since these assumptions do not appear to be crucial for the development of HoTT and there are plenty of alternatives out there, I’m not overly concerned about this at the moment.

I will eventually need to talk about higher inductive types. These are really interesting and I’m happy to see that the book devotes an entire chapter to them. This is a very interesting outgrowth of this project and which deserves study even independently of HoTT.