Towsner’s stable forcing

It is well known that a model $\newcommand{\MN}{\mathfrak{N}}\MN$ of $\newcommand{\RCA}[1]{\mathsf{RCA}_{#1}}\RCA0$ satisfies $\Sigma^0_n$-induction ($\newcommand{\Ind}[1]{\mathsf{I}{#1}}\Ind{\Sigma^0_n}$) if and only if it satisfies bounded $\Sigma^0_n$-comprehension: if $\phi(x)$ is a $\Sigma^0_n$-formula (with parameters) then the set $\set{x \lt b: \phi(x)}$ exists for every number $b$ in $\MN$. Thus, it follows that $\Pi^0_n$-induction also holds and indeed induction holds for all boolean combinations of $\Sigma^0_n$ formulas. However, $\Ind{\Sigma^0_n}$ offers no control over sets of higher complexity than that. In particular, $\Delta^0_{n+1}$-induction may fail very badly in a model of $\Ind{\Sigma^0_n}$. Indeed, by a result of Slaman [1] $\Delta^0_{n+1}$-induction is equivalent to $\Sigma^0_{n+1}$-bounding ($\newcommand{\Bnd}[1]{\mathsf{B}{#1}}\Bnd{\Sigma^0_{n+1}}$) and it is known that we have a strict hierarchy: $$\Ind{\Sigma^0_1} \quad\WHEN\quad \Bnd{\Sigma^0_2} \quad\WHEN\quad \Ind{\Sigma^0_2} \quad\WHEN\quad \Bnd{\Sigma^0_3} \quad\WHEN \cdots$$ The actual size of the gaps between $\Ind{\Sigma^0_n}$ and $\Bnd{\Sigma^0_{n+1}}$ has recently been quantified by Henry Towsner, who included a beautiful forcing argument in his preprint [2] that shows that $\Ind{\Sigma^0_n}$ gives absolutely no control whatsoever on $\Delta^0_{n+1}$-definable sets.

… 

 

Scheming schemes…

In everyday language, the word scheme often has a negative connotation: scheme is used as a synonym for devious plan. In mathematical language, the word has no negative aspect at all. In fact, I think that mathematical schemes of all kinds are wonderful things and it often takes me a moment to understand when someone uses scheme to imply wrongful intent. This can lead to awkward social situations, but what I want to talk about here is how axiom schemes can sometimes behave badly and may lead to awkward mathematical situations.

… 

 

Envelope forcing

In a recent paper [1], Jared Corduan and I considered various notions of combinatorial indecomposability for finite ordinal powers of \(\omega.\) In this process, we uncovered two weak forms of Ramsey’s theorem for pairs (\(\newcommand{\RT}[2]{\mathsf{RT}^{#1}_{#2}}\RT2k\)):

  • The Weak Ramsey Theorem (\(\newcommand{\Wk}{\mathsf{W}}\Wk\RT2k\)). For every coloring \(c:\N^2\to\set{0,\dots,k-1}\) there are a color \(d \lt k\) and an infinite set \(H\) such that \[\set{y \in \N : c(x,y) = d}\] is infinite for every \(x \in H.\)
  • The Hyper-Weak Ramsey Theorem (\(\newcommand{\HWk}{\mathsf{HW}}\HWk\RT2k\)). For every coloring \(c:\N^2\to\set{0,\dots,k-1}\) there are a color \(d \lt k\) and an increasing function \(h:\N\to\N\) such that \[\bigcup_{x=h(i-1)}^{h(i)-1} \set{y \in \N : c(x,y) = d}\] is infinite for every \(i \in \N.\)

The first is equivalent to what we called the game indecomposability of the ordinal \(\omega^2,\) and the second is related to the lexicographic indecomposability of \(\omega^2.\) You can find out more about these notions of indecomposability in our paper, what I want to write about here is the forcing technique that we used to show that \(\HWk\RT22\) is \(\Pi^1_1\)-conservative over \(\mathsf{RCA}_0\) with \(\Sigma^0_2\)-induction and to separate the two principles \(\Wk\RT22\) and \(\HWk\RT22.\) I’ve been calling the method envelope forcing since the basic idea is to force over a larger model that envelops the original one rather than forcing over the ground model itself. In the following, I will give a brief tour of this method, skipping all the gory details (that can be found in our paper).

… 

 

Generalized separation principles

It is well-known that computability theory and reverse mathematics have very strong ties. Indeed, the base theory \(\newcommand{\RCA}{\mathsf{RCA}_0}\RCA\) used in reverse mathematics was designed as the minimal theory that can adequately formulate and prove the basic results of computability theory. Based on the Church–Turing thesis, this is a very reasonable way to capture the proof method known as direct computation in everyday mathematics. Although the ties run very deep, there are always frictions when translating results between computability theory and reverse mathematics. The aim of this post is to talk about different ways of doing this translation but I will do so by talking about a family of generalized separation principles that I recently realized were (non-trivially) equivalent, a fact that had been well known in computability theory for several decades.

… 

 

Subposets of small dimension

Behind the scenes at Boole’s Rings, there was some talk on asking more open questions related to our research. I have three related problems that I would like to share. They are strikingly similar in nature, but they properly belong in three different branches of mathematics: the first problem is about finite combinatorics, the second problem is about reverse mathematics, and the third problem is about infinite combinatorics. To me they are three faces of the same question and I currently don’t know how to answer any of them.

Three Problems. Let $d \geq 2$:

  1. Let $F_d(n)$ denote the largest number such that every poset of size $n$ has a $d$-dimensional subposet of size $F_d(n)$. What is the asymptotic growth of $F_d(n)$?
  2. Every (countably) infinite poset has an infinite subposet of dimension at most $d$. What is the logical strength of this statement over $\mathsf{RCA}_0$?
  3. Is there an uncountable poset that has no uncountable $d$-dimensional subposet?

… 

 

Classically valid theorems of intuitionistic analysis

It is well-known that proofs in intuitionistic logic are more constructive than proofs in classical logic. Indeed, this is what the (informal) Brouwer–Heyting–Kolmogorov (BHK) interpretation leads to believe. Thus, intuitionistic proofs tend to have more information content than classical proofs, a fact that is well exploited in proof mining.

This suggests that looking at intuitionistic proofs of classical results from classical analysis may yield some new insights on the constructive nature of these classical results. However, there is one major hurdle with this program: theorems and axioms of intuitionistic analysis are not necessarily classically valid. Thus, inuitionistic proofs must be inspected very closely before drawing any kind of classical conclusions from them. Often, this is done by only looking at intuitionistic proofs over weak systems of intuitionistic analysis whose axioms are all classically valid. This amounts to doing classical analysis with a very limited set of axioms and deduction rules and it basically prohibits the use of the most important methods and results from intuitionistic analysis. This is not very satisfactory, but there is another way. There is a rich class of statements for which intuitionistic analysis is conservative over some weak subtheory whose axioms are all classically valid. So long as we restrict our attention to statements from this class, we can use methods and results of intuitionistic analysis with impunity and draw classical conclusions from the mere existence of a proof in the weaker subsystem, without going through the pain of finding such a proof.

One important result of intuitionistic analysis that is not classically valid is Brouwer’s Continuity Theorem, which simply says that every function defined on the unit interval is (uniformly) continuous [1].  The Continuity Theorem becomes especially powerful when combined with the Axiom of Choice, together they imply that if the statement $\forall\xi\exists\zeta A(\xi,\zeta)$ is true, then there must be a continuous function $F$ such that $\forall\xi A(\xi,F(\xi))$. On the one hand, this combination captures the constructive strength of intuitionistic analysis, as it provides a uniform and effective way to obtain witnesses to existential statements. On the other hand, this combination has some disastrous consequences for classically minded mathematicians. For example, the only subsets of $\R$ that have complements are $\emptyset$ and $\R$. Indeed, if $A \subseteq \R$ has a complement then it has a characteristic function by the Axiom of Choice, which must then be continuous by the Continuity Theorem. Since $\R$ is connected, the only continuous $\set{0,1}$-valued functions on $\R$ are the two constant functions. Therefore, $A$ must either be empty or all of $\R$. In particular, it follows that equality for real numbers is undecidable: $\forall\xi(\xi = 0 \lor \xi \neq 0)$ cannot be true since we cannot partition $\R$ into the singleton zero and the set of nonzero real numbers.

There are many ways to formalize all of this. In my paper [2], I look at two related systems intuitionistic analysis and I exploit conservation results by Anne Troelstra [3] and Jaap van Oosten [4] to draw some classical consequences in subsystems of second-order arithmetic commonly used in reverse mathematics. The main results are of the following form: if $A$ and $B$ satisfy certain syntactic constraints and $$\forall\xi(B(\xi) \lthen \exists\zeta A(\xi,\zeta))$$ is provable in a certain formal system of intuitionistic analysis, then the sequential form $$\forall\xi(\forall n B(\xi_n) \lthen \exists\zeta\forall n A(\xi_n,\zeta_n))$$ is provable in a corresponding classical subsystem of second-order arithmetic. Results of this kind had already been obtained by Jeff Hirst and Carl Mummert for a different class of intuitionistic systems [5]. However, their systems only include classically valid axioms, so the fact that both systems I used prove Brouwer’s Continuity Theorem and some form of the Axiom of Choice adds an interesting new twist to this picture.1

$\newcommand{\EL}{\mathsf{EL}}\newcommand{\GC}{\mathsf{GC}}\newcommand{\RCA}{\mathsf{RCA}}\newcommand{\krf}{\mathrel{\mathtt{rf}}}\newcommand{\cat}{\mathop{{}^\frown}}$The basic intuitionistic system I will use is called $\EL$. The system $\EL$ has two sorts: natural numbers ($\N$) and functions from numbers to numbers ($\N^\N$). I will use roman letters for numbers and greek letters for functions. The details of $\EL$ can be found in [3] and [2]. The axioms of $\EL$ are classically valid. In fact, by adding the law of excluded middle to $\EL$, you obtain a classical system equivalent to the subsystem $\RCA$ of second-order arithmetic. To make this into a full-blown system for intuitionistic analysis, we can add Troelstra’s Generalized Continuity ($\GC$) axiom scheme: $$\forall\xi(B(\xi) \lthen \exists\zeta A(\xi,\zeta)) \lthen \exists\alpha\forall\xi(B(\xi) \lthen \alpha|\xi{\downarrow} \land A(\xi,\alpha|\xi)),$$ where $B(\xi)$ is restricted to the syntactic class $\mathrm{N}_K$ (defined below) but $A(\xi,\zeta)$ is arbitrary. The notation $\alpha|\xi$ is a clever way of coding partial continuous functions $\N^\N\to\N^\N$ due to Kleene that I will describe shortly. Since the class $\mathrm{N}_K$ includes the tautology $0 = 0$, $\EL + \GC$ directly implies the Axiom of Choice: if the set $A_\xi = \set{\zeta \in \N^\N : A(\xi,\zeta)}$ is nonempty for every $\xi \in \N^\N$, then there is an $\alpha$ such that $\alpha|\xi$ picks out a single element of  $A_\xi$ for every $\xi \in \N^\N$. $\EL + \GC$ also implies Brouwer’s Continuity Theorem: if $A(\xi,\zeta)$ describes the graph of a function $F:\N^\N\to\N^\N$, then there must be an $\alpha$ such that for every $\xi$, $\alpha|\xi$ is the unique $\zeta$ such that $A(\xi,\zeta)$. Since $\xi \mapsto \alpha|\xi$ is necessarily continuous, the function $F$ must be continuous.

In order to talk about the relevant conservation result for $\EL + \GC$, I need to talk about Kleene’s realizability with functions. The basic idea of realizability with functions is both natural and beautiful. Unfortunately, the beauty is hidden behind a thick wall of coding tricks. Three standard coding tricks are the following:

  • Pairs of functions can be coded by alternately meshing their values: given functions $\xi_0,\xi_1$ there is a unique function $\seq{\xi_0,\xi_1}$ such that $\seq{\xi_0,\xi_1}\pi_0 = \xi_0$ and $\seq{\xi_0,\xi_1}\pi_1 = \xi_1$, where $\pi_0 = \lambda n.2n$ and $\pi_1 = \lambda n.2n+1$.
  • Number-function pairs can be coded by prefixing the number: given a number $x$ and a function $\xi$ there is a unique function $x\cat\xi$ such that $(x\cat\xi)(0) = x$ and $(x\cat\xi)\sigma = \xi$, where $\sigma = \lambda n.n+1$.
  • Every function $\xi$ can be thought of as encoding an infinite list of functions $\xi_n = \lambda m.\xi(2^n(2m+1)-1)$.

The next trick is a very clever way of encoding partial continuous functions $\N^\N\to\N^\N$ due to Kleene that I alluded to above. Given a function $\xi$ and a number $\ell$, let $\bar{\xi}\ell$ denote the finite initial segment $\seq{\xi(0),\dots,\xi(\ell-1)}$ coded as a number in some primitive recursive fashion. Kleene’s encoding trick is best thought of as a two-step process.

  1. A function $\alpha \in \N^\N$ encodes a partial continuous function $\N^\N\to\N$ as follows: to compute $\alpha(\xi)$ for some $\xi \in \N^\N$, look for the first number $\ell$ (if any) such that $\alpha(\bar{\xi}\ell) \neq 0$ and then set $\alpha(\xi) = \alpha(\bar{\xi}\ell)-1$. Write $\alpha(\xi){\uparrow}$ when no such $\ell$ can be found, and $\alpha(\xi){\downarrow}$ when $\alpha(\xi)$ is defined.
  2. Thinking of functions in $\N^\N$ as encoding number-function pairs $n\cat\xi$, the above scheme lets $\alpha \in \N^\N$ encode a partial continuous function $\N\times\N^\N\to\N$ via $\alpha(n\cat\xi)$. By currying, we obtain a partial continuous function $\N^\N\to\N^\N$. Thus, for each $\xi \in \N^\N$, the value $\alpha|\xi$ of the partial continuous function $\N^\N\to\N^\N$ coded by $\alpha$ is the function $\lambda n.\alpha(n\cat\xi)$ provided that $\alpha(n\cat\xi){\downarrow}$ for every $n$. Write $\alpha|\xi{\uparrow}$ when $\alpha(n\cat\xi){\uparrow}$ for some $n$, and $\alpha|\xi{\downarrow}$ when $\alpha|\xi$ is defined.

This encoding of partial continuous functions is rather dense at first sight, but it is actually quite natural after getting used to it. The $\alpha|\xi$ notation may seem strange, but it is actually well motivated. Thinking of $|$ as a partial binary operation on the set $\N^\N$, we obtain a partial combinatory algebra known as Kleene’s second algebra. The post-modern approach to realizability goes through realizability topos, which can be constructed on top of any partial combinatory algebra.

Kleene’s realizability with functions is a syntactic transformation of formulas which uses all the tricks above to pack most existential quantifiers into an auxiliary function parameter. The transformation is built by induction on complexity as follows2:

  • $\alpha \krf A$ is $A$ for atomic $A$.
  • $\alpha \krf (A \land B)$ is $\alpha\pi_0 \krf A \land \alpha\pi_1 \krf B$.
  • $\alpha \krf (A \lthen B)$ is $\forall\beta(\beta \krf A \lthen \alpha|\beta{\downarrow} \land \alpha|\beta \krf B)$.
  • $\alpha \krf \forall x A$ is $\forall x(\alpha_x \krf A)$.3
  • $\alpha \krf \forall \xi A$ is $\forall \xi(\alpha|\xi{\downarrow} \land \alpha|\xi \krf A)$.
  • $\alpha \krf \exists x A$ is $\alpha\sigma \krf A[x/\alpha(0)]$.
  • $\alpha \krf \exists \xi A$ is $\alpha\pi_1 \krf A[\xi/\alpha\pi_0]$.

Note that the translated formula $\alpha \krf A$ never involves existential quantifiers, except to say that $\alpha|\xi{\downarrow}$ and the scope of this implicit existential quantifiers is quantifier-free. Therefore, $\alpha \krf A$ always belongs to the class of formulas $\mathrm{N}_K$, which is defined as follows:

  • If $A$ is quantifier-free, then $A$, $\exists x A$, $\exists\xi A$ are in $\mathrm{N}_K$.
  • If $A, B$ are in $\mathrm{N}_K$, then so are $A \land B$, $A \lthen B$, $\forall x A$, $\forall \xi A$.

It seems odd to include existential function quantifiers in the first clause but this is harmless since a quantifier-free formula $A$ can only use a finite initial segment of $\xi$ and thus $\exists\xi A$ is easily simulated by an existential number quantifier.

Now that we have described the system $\EL + \GC$ and Kleene’s realizability with functions, we can start tying everything together. The first step is to relate provability in $\EL + \GC$ with Kleene’s realizability with functions.

Troelstra’s Characterization Theorem. For every formula $A$:

  1. $\EL + \GC \vdash A \liff \exists\alpha(\alpha \krf A)$.
  2. $\EL + \GC \vdash A \IFF \EL \vdash \exists\alpha(\alpha \krf A)$.

An important consequence of this characterization of $\krf$ is that $\EL + \GC$ is relatively consistent with $\EL$. In particular, Brouwer’s Continuity Theorem is relatively consistent with the Axiom of Choice so that the above discussion was not meaningless.

The next step is to formulate the relevant conservation result. The class of formulas $\Gamma_K$ is defined as follows:

  • Quantifier-free formulas are in $\Gamma_K$.
  • If $A, B$ are in $\Gamma_K$ then so are $A \land B$, $\forall x A$, $\forall \xi A$, $\exists x A$, $\exists \xi A$.
  • If $A$ is in $\mathrm{N}_K$ and $B$ is in $\Gamma_K$ then $A \lthen B$ is in $\Gamma_K$.

Although $\Gamma_K$ is far from including all formulas, it is actually a rather rich fragment which plays well with Kleene’s realizability with functions.

Troelstra’s Conservation Theorem. If $A \in \Gamma_K$ then $\EL \vdash \exists\alpha(\alpha \krf A) \lthen A$.

Combining this with the second part of Troelstra’s Characterization Theorem, we see that if $A \in \Gamma_K$, then $$\EL + \GC \vdash A \quad\IFF\quad \EL \vdash A.$$ In other words, $\EL + \GC$ is conservative over $\EL$ for formulas in $\Gamma_K$.

What is this all good for? Well, suppose your friend Bertus (a hard-core intuitionist working in $\EL + \GC$) tells you (a hard-core classicist working in $\mathsf{ZFC}$) about his most recent result:

Every $n \times n$ real matrix with nonzero determinant is invertible.

A priory, you have no reason to believe your friend since he also believes in false statements like every function is continuous. After some thought, you realize that $\det(A) \neq 0$ can be formalized as a formula in $\mathrm{N}_K$ and that $$(\forall A \in M_{n\times n}(\R))(\det(A) \neq 0 \lthen (\exists B \in M_{n\times n}(\R))(AB = BA = I))$$ can be formalized as a formula in $\Gamma_K$. You can then conclude that Bertus’s result is classically true and, because of $\GC$, that there is a continuous function to compute the inverse of a $n \times n$ real matrix $A$ with nonzero determinant. Not only were you able to meaningfully communicate with your friend Bertus, but you also gained a new classical result that you can call your own!4

The system $\EL + \GC$ is not the only system for which the above scheme goes through. In [4], Jaap van Oosten proves a similar characterization result for an intuitionistic system that incorporates the Weak König Lemma. This system also proves Brouwer’s Continuity Theorem. However, since the Weak König Lemma and Brouwer’s Continuity Theorem are incompatible with the Axiom of Choice, van Oosten has to settle for a weaker form of choice that continuously selects a nonempty compact set of witnesses to existential statements instead of a single witness. These results of van Oosten are also exploited in my paper [2].

Notes

  1. I think that it is probably safe to add certain intuitionistic principles like Brouwer’s Continuity Theorem to the systems considered by Hirst and Mummert, but I have not checked that.
  2. There is no clause for disjunction since because I prefer to think of $A \lor B$ as an abbreviation for $\exists x((x = 0 \lthen A) \land (x \neq 0 \lthen B))$. Since equality of numbers is decidable, this interpretation is intuitionistically correct.
  3. Troelstra [3] actually uses $\forall x(\alpha|\lambda n.x{\downarrow} \land \alpha|\lambda n.x \krf A)$ for this case. This amounts to the same idea except that $\alpha_x$ is always defined whereas $\alpha|\lambda n.x$ might not. There does not appear to be any need for the added complexity except to make this case more parallel to the other universal quantification case.
  4. Don’t start writing that paper right away: this is not a new result… However, you can now captivate your friends for hours explaining how you discovered the existence of Cramer’s Rule without opening a linear algebra textbook!

References

[1] [doi] L. E. J. Brouwer, “Über Definitionsbereiche von- Funktionen,” Math. ann., vol. 97, iss. 1, pp. 60-75, 1927.
[Bibtex]
@article {Brouwer27,
AUTHOR = {Brouwer, Luitzen Egbertus Jan},
TITLE = {\"{U}ber {D}efinitionsbereiche von- {F}unktionen},
JOURNAL = {Math. Ann.},
FJOURNAL = {Mathematische Annalen},
VOLUME = {97},
YEAR = {1927},
NUMBER = {1},
PAGES = {60--75},
ISSN = {0025-5831},
CODEN = {MAANA},
MRCLASS = {Contributed Item},
MRNUMBER = {1512354},
DOI = {10.1007/BF01447860},
URL = {http://dx.doi.org/10.1007/BF01447860},
NOTE = {Translated in \cite{VanHeijenoort67}, pp.~446--463},
}
[2] F. G. Dorais, “Classical consequences of continuous choice principles from intuitionistic analysis,” Preprint, 2011.
[Bibtex]
@article {Dorais11,
AUTHOR = {Dorais, Fran{\c{c}}ois G.},
TITLE = {Classical consequences of continuous choice principles from intuitionistic analysis},
JOURNAL = {preprint},
YEAR = {2011},
URL = {http://boolesrings.org/dorais/files/2011/10/gc.pdf},
}
[3] [doi] Metamathematical investigation of intuitionistic arithmetic and analysis, A. S. Troelstra, Ed., Berlin: Springer-Verlag, 1973.
[Bibtex]
@book {Troelstra73,
TITLE = {Metamathematical investigation of intuitionistic arithmetic and analysis},
SERIES = {Lecture Notes in Mathematics, Vol. 344},
EDITOR = {Troelstra, Anne Sjerp},
PUBLISHER = {Springer-Verlag},
ADDRESS = {Berlin},
YEAR = {1973},
PAGES = {xvii+485},
MRCLASS = {02CXX (02DXX 02HXX)},
MRNUMBER = {0325352 (48 \#3699)},
DOI = {10.1007/BFb0066739},
URL = {http://dx.doi.org/10.1007/BFb0066739},
}
[4] [doi] J. van Oosten, “Lifschitz’ realizability,” J. symbolic logic, vol. 55, iss. 2, pp. 805-821, 1990.
[Bibtex]
@article {VanOosten90,
AUTHOR = {van Oosten, Jaap},
TITLE = {Lifschitz' realizability},
JOURNAL = {J. Symbolic Logic},
FJOURNAL = {The Journal of Symbolic Logic},
VOLUME = {55},
YEAR = {1990},
NUMBER = {2},
PAGES = {805--821},
ISSN = {0022-4812},
CODEN = {JSYLA6},
MRCLASS = {03F50 (03F30 03F55)},
MRNUMBER = {1056390 (92a:03076)},
MRREVIEWER = {Wim Veldman},
DOI = {10.2307/2274666},
URL = {http://dx.doi.org/10.2307/2274666},
}
[5] [doi] J. L. Hirst and C. Mummert, “Reverse mathematics and uniformity in proofs without excluded middle,” Notre dame j. form. log., vol. 52, iss. 2, pp. 149-162, 2011.
[Bibtex]
@article{HirstMummert11,
AUTHOR = {Hirst, Jeffry L. and Mummert, Carl},
TITLE = {Reverse mathematics and uniformity in proofs without excluded middle},
JOURNAL = {Notre Dame J. Form. Log.},
FJOURNAL = {Notre Dame Journal of Formal Logic},
VOLUME = {52},
YEAR = {2011},
NUMBER = {2},
PAGES = {149--162},
DOI = {10.1215/00294527-1306163},
URL = {http://dx.doi.org/10.1215/00294527-1306163},
EPRINT = {1010.5165},
}
 

On computing complex square roots

What could possibly be interesting about complex square roots? That’s what I thought until very recently when I realized that complex square roots are significantly more difficult to compute than real square roots. Let me explain why by putting you in a context where you need to teach students how to compute complex square roots…

There is not much to computing complex square roots; it just takes a little practice to get used to it. So you just need to assign a few homework problems to make sure that students practice enough. How many problems should you assign: one, two, five, twelve, or seventeen? There is no upper bound on how many practice problems are enough, so perhaps you should assign infinitely many problems?

Now, you may find the very idea of an infinite problem set objectionable, but they are in fact very common. Here is one:

For every positive integer $n$, find a square root of $n+i$.

Some might find the following problem infinite too:

Find a square root of $\pi + i\sqrt{17}$.

Here, $\pi + i\sqrt{17}$ is a finite description of a complex number, the complex number being described is actually an infinite object. Hopefully this is enough to convince you that infinite problem sets are not completely unreasonable. Of course, you do need to set some ground rules so that students can at least understand the what the problem set is asking and that they know what a solution entails. Here are the rules:

  • Real numbers are represented by rapidly convergent sequences of rational numbers. That is, a real number is a function $x:\N\to\Q$ such that $|x(s) – x(t)| \leq 2^{-s}$ for all natural numbers $s \lt t$.
  • Complex numbers are pairs $z = x + iy$, where $x$ and $y$ are real numbers represented by rapidly convergent sequences of rational numbers. The real and imaginary parts of $z$ will also be denoted $\Re(z)$ and $\Im(z)$ when convenient.
  • An infinite problem set is an uniformly computable sequence $\seq{z_n}_{n=0}^\infty$ of complex numbers. That is, there is a Turing machine which, on input $n, s$, will give me the rational numbers $x_n(s)$ and $y_n(s)$, where $z_n = x_n + iy_n$ as above.
  • A solution to the infinite problem set $\seq{z_n}_{n=0}^\infty$ is an uniformly computable sequence $\seq{w_n}_{n=0}^\infty$ of complex numbers such that $z_n = w_n^2$ for each $n$.

Grading such a problem set can be a daunting task, but let’s not worry about the aftermath. A more interesting question is:

Does every infinite problem set have a solution?

Indeed, it would be unreasonable to assign a problem set that students can’t possibly solve!

If the problem set were specifically asking for the principal square roots then you could easily devise an infinite problem set that has no solution. Recall that the principal square root of $z = re^{i\theta}$ is $\sqrt{r}e^{i\theta/2}$ where $-\pi \lt \theta \leq \pi$ and $\sqrt{r}$ is the nonnegative square root of the nonnegative real number $r$. In other words, the principal branch agrees with the positive square root on the positive real axis, and it has a branch cut along the negative real axis.

Theorem D. Given any computably enumerable set $A$, there is a uniformly computable sequence of nonzero complex numbers $\seq{z_n}_{n=0}^\infty$ such that the corresponding sequence $\seq{w_n}_{n=0}^\infty$ of principal square roots computes $A$.

Proof. Let’s assume that $A$ is infinite and let $\seq{a_m}_{m=0}^\infty$ be a computable enumeration of $A$, without repetitions. Define the sequence $\seq{z_n}_{n=0}^\infty$ by $$z_n = \begin{cases} -1 – 2^{-m}i & \text{when $n = a_m$,} \cr -1 & \text{when $n \notin A$.} \end{cases}$$ To see that this is uniformly computable, recall that $z_n = x_n + iy_n$ where $x_n$ and $y_n$ are two rapidly convergent sequences of rationals. The first sequence $x_n$ is constant with value $-1$, which is definitely computable. The second sequence $y_n$ can be defined by $$y_n(s) = \begin{cases} 2^{-m} & \text{when $m \leq s$ and $a_m = n$,} \cr 0 & \text{otherwise.} \end{cases}$$ Because we require $m \leq s$, we only have finitely many possible $m$ to check in order to determine which case to use. So $y_n$ is computable too. Since this process is uniform in $n$, the sequence $\seq{z_n}_{n=0}^\infty$ is in fact uniformly computable.

Now suppose $\seq{w_n}_{n=0}^\infty$ is the corresponding sequence of principal square roots. Then we must have $\Im(w_n) \lt 0$ when $n \in A$, and $\Im(w_n) \gt 0$ when $n \notin A$. So we can use the sign of $\Im(w_n)$ to determine whether or not $n$ is in the range of $f$. More precisely, if $w_n = u_n + iv_n$ then we can look at the terms of the rapidly convergent sequence $v_n$ until we find $s$ such that $|v_n(s)| \gt 2^{-s}$; such an $s$ will eventually be found since we know that $v_n \neq 0$. Then the sign of $v_n(s)$ will tell us whether $n \in A$ or not. Therefore, $A$ is computable from $\seq{w_n}_{n=0}^\infty$. QED

Choosing $A$ to be the halting set will trick many students, but solutions to infinite problem sets do not require principal square roots. Some students might use a slightly different convention and define the principal square root of $-1$ to be $-i$ instead of $i$. In that case, the above trick doesn’t work since such a student will give me a sequence of square roots $\seq{w_n}_{n=0}^\infty$ where $\Im(w_n) \lt 0$ for every $n$. However, it turns out that you can design an infinite problem set that will simultaneously deal with all such variations. Let us say that $w$ is a weakly principal square root of $z$ if $w$ is the principal square root, or $z$ is a negative real number in which case $w$ can be any of the two square roots.

Theorem C. For every pair $A$, $B$ of disjoint computably enumerable sets, there is a uniformly computable sequence $\seq{z_n}_{n=0}^\infty$ of nonzero complex numbers such that any corresponding sequence of weakly principal square roots computes a separating set for $A$ and $B$.

Proof. Let $\seq{a_m}_{m=0}^\infty$ and $\seq{b_m}_{m=0}^\infty$ be computable enumerations of $A$ and $B$, respectively. Similar to the proof of Theorem A, define $$z_n = \begin{cases} -1 + 2^{-m}i & \text{when $a_m = n$,} \cr -1 – 2^{-m}i & \text{when $b_m = n$,} \cr -1 & \text{when $n \notin A \cup B$.} \end{cases}$$ Suppose we have a corresponding sequence $\seq{w_n}_{n=0}^\infty$ of weakly principal square roots. It must be that $\Im(w_n) \gt 0$ (in fact, $\Im(w_n) \approx 1$) when $n \in A$, and $\Im(w_n) \lt 0$ (in fact, $\Im(w_n) \approx -1$) when $n \in B$. We could have $w_n = \pm i$ when $n \notin A \cup B$, but we still know that $\Im(w_n) \neq 0$. We can still use the sign of $\Im(w_n)$ to separate $A$ and $B$ since $$A \subseteq \set{n \in \N : \Im(w_n) \gt 0}, \quad B \subseteq \set{n \in \N : \Im(w_n) \lt 0},$$ and the two supersets are complementary $\Sigma^0_1$ sets. QED

Choosing $A$ and $B$ to be an inseparable pair of computably enumerable sets, many of your students will be unable to solve this problem set. That is, you will trick all your students who don’t realize that the choice of branch cut for square roots is completely arbitrary. Any student that chooses the branch cut at an angle of $\pi/2$ instead of $\pi$ will have no trouble solving this infinite problem set.

It is not difficult to adapt the above to simultaneously deal with a bunch of plausible branch cuts $\pi, \pi/2, -\pi/2, \pi/3, \ldots$ However, you can’t deal with all possible branch cuts. While there are only countably many computable branch cuts that students could choose from, but there is no computable enumeration of all of these choices. In fact, a clever student can always pick a branch cut that you haven’t considered.

Theorem B. If $\seq{z_n}_{n=0}^\infty$ is a uniformly computable sequence of nonzero complex numbers, then there is a corresponding uniformly computable sequence $\seq{w_n}_{n=0}^\infty$ such that $w_n^2 = z_n$ for every $n$.

Proof. The trick is to select a branch cut angle $\theta$ that avoids all the numbers $z_n$. This can be done by a trisection argument. Initially set $\theta_0^{-} = -1$ and $\theta_0^{+} = 1$, say. Once $\theta_n^{\pm}$ have been chosen, set $\delta_n = (\theta_n^{+}-\theta_n^{-})/3$ and compute $z_n$ to enough precision that you can determine that $z_n/|z_n|$ does not belong to one of the three sets $$\set{e^{i\theta} : \theta_n^{-} \leq \theta \leq \theta_n^{-} + \delta_n},$$ $$\set{e^{i\theta} : \theta_n^{-} + \delta_n \leq \theta \leq \theta_n^{+} – \delta_n},$$ $$\set{e^{i\theta} : \theta_n^{+} – \delta_n \leq \theta \leq \theta_n^{+}}.$$ In the first case set $\theta_{n+1}^{-} = \theta_n^{-}, \theta_{n+1}^{+} = \theta_n^{-} + \delta_n$; in the second case set $\theta_{n+1}^{-} = \theta_n^{-} + \delta_n, \theta_{n+1}^{+} = \theta_n^{+} – \delta_n$; in the third case set $\theta_{n+1}^{-} = \theta_n^{+} – \delta_n, \theta_{n+1}^{+} = \theta_n^{+}$.

This process defines two rapidly convergent sequences of rational numbers $\seq{\theta_n^{\pm}}_{n=0}^\infty$ that represent the same real number $\theta$. The branch angle $\theta$ so defined is computable and avoids all the numbers $z_n$. So there is no problem computing the principal square roots $w’_n$ of the numbers $z’_n = z_n/e^{i\theta}$. The solution to the infinite problem set $\seq{z_n}_{n=0}^\infty$ is then $\seq{w’_ne^{i\theta/2}}_{n=0}^\infty$. QED

Now the fact that the complex numbers $z_n$ were nonzero was very important in this argument. A student using this strategy will stall at any stage $n$ where $z_n = 0$. However, some very clever students will anticipate this possibility and avoid this problem.

Theorem A. If $\seq{z_n}_{n=0}^\infty$ is a uniformly computable sequence of complex numbers, then there is a corresponding uniformly computable sequence $\seq{w_n}_{n=0}^\infty$ such that $w_n^2 = z_n$ for every $n$.

Proof. Although there may be no computable way to determine which $z_n$ are zero, the set $A = \set{n \in \N : z_n \neq 0}$ is computably enumerable since $z_n \neq 0$ is a $\Sigma^0_1$ statement. For simplicity, let’s assume that this set is infinite and let $\seq{a_k}_{k=0}^\infty$ be a computable enumeration of this set (without repetitions). We can then use the method from Theorem B to compute a sequence of square roots $\seq{w’_k}_{k=0}^\infty$ for the subsequence $\seq{z_{a_k}}_{k=0}^\infty$. The desired sequence of square roots is then $$w_n = \begin{cases} w_k & \text{when $a_k = n$,} \cr 0 & \text{when $n \notin A$.} \end{cases}$$

To see that the sequence $\seq{w_n}_{n=0}^\infty$ is uniformly computable, recall that $w’_k = u’_k + iv’_k$ where $u’_k$ and $v’_k$ are rapidly convergent sequences of rational numbers. So we can define $w_n = u_n + iv_n$ where $$(u_n(s),v_n(s)) = \begin{cases} (u’_k(s),v’_k(s)) & \text{when $k \leq p(n,s)$ and $a_k = n$,} \cr (0,0) & \text{otherwise.} \end{cases}$$ These will be a rapidly convergent sequences of rationals provided that the computable function $p(n,s)$ is suitably chosen.

To define $p(n,s)$ we need to make sure that the first time $s$ where $(u_n(s),v_n(s)) \neq (0,0)$, we don’t have $|u_n(s)| \gt 2^{-s}$ or $|v_n(s)| \gt 2^{-s}$. Should this happen, we necessarily have $|z_n| \gt 4^{-s}$. So we can look at $z_n(2s+2) = x_n(2s+2) + iy_n(2s+2)$. Since $|z_n – z_n(2s+2)| \leq 4^{-s}/2$, if $|z_n(2s+2)| \leq 4^{-s}/2$ then $|z_n| \leq 4^{-s}$ and we’re safe to define $(u_n(s),v_n(s)) = (0,0)$. Otherwise, we know that $z_n \neq 0$ so that $n = a_k$ for some $k$. Therefore, we can define $p(n,s) = 0$ when $|z_n(2s+2)| \leq 4^{-s}/2$, and $p(n,s) = k$ such that $a_k = n$ when $|z_n(2s+2)| \gt 4^{-s}/2$. QED

So every infinite problem set does have a solution! Moreover, any student that reaches the solution of Theorem A clearly understands complex square roots (and basic computability theory!), so infinite problem sets might not be such a bad idea after all…

So what does this have to do with reverse mathematics? Well, the four theorems above can be used to show that:

  • It is provable in $\mathsf{RCA}_0$ that every infinite sequence of complex numbers has a corresponding sequence of complex square roots.
  • It is not provable in $\mathsf{RCA}_0$ that every infinite sequence of complex numbers has a corresponding sequence of principal square roots. In fact, this $\Pi^1_2$ statement is equivalent to $\Sigma^0_1$-comprehension (i.e. $\mathsf{ACA}_0$).
  • It is also not provable in $\mathsf{RCA}_0$ that every infinite sequence of complex numbers has a corresponding sequence of weakly principal square roots. In fact, this $\Pi^1_2$ statement is equivalent to $\Sigma^0_1$-separation (i.e. $\mathsf{WKL}_0$).

In fact, there is nothing very special about square roots. The same applies to the inverse of any nonlinear polynomial over $\C$. Therefore, you must be very careful when using the Fundamental Theorem of Algebra in $\mathsf{RCA}_0$…