## Interpreting the Size of the Cantor Set

The Cantor set provides some of the most pathological examples in real analysis. Introduced by G. Cantor in 1883, the Cantor set (or Cantor dust) can be thought of as the remainder of the unit interval after removing open middle thirds ad infinitum. In the following, we discuss the pathology relating to the “size” of the Cantor set where, depending on how you define it, the Cantor set has the size of a point, the entire real line, or somewhere in-between.

#### The Cantor Set

The Cantor set is the set obtained from the unit interval $[0,1]$ by successively removing open middle thirds. Formally, it is the intersection of a countable number of closed sets. Start by letting $C_{0} = [0,1]$. The first middle third we are tasked with removing is $(\frac{1}{3},\frac{2}{3})$, so let $C_{1} = [0,1]-(\frac{1}{3},\frac{2}{3}) = (0,\frac{1}{3}) \cup (\frac{2}{3},1)$. In general, the $n$-th step in this procedure is

$\displaystyle C_{n} = \frac{C_{n-1}}{3}+\left(\frac{2}{3}+\frac{C_{n-1}}{3}\right)$

so that we have a nested sequence $C_{0} \supset C_{1} \supset C_{2} \supset \cdots$ of closed sets. Also observe that $C_{n}$ is the union of $2^{n}$ closed intervals each of length $3^{-n}$. We then define the Cantor set $C$ as

$\displaystyle C = \bigcap_{n = 1}^{\infty}C_{n}$.

The image below is a pictorial description of the first few steps in forming the Cantor set:

It’s not uncommon to ask if we may have removed all of $[0,1]$ during this process so that $C$ is empty. It is not too hard to see that $C$ contains the endpoints of all the intervals making up the $C_{n}$. Indeed, if $x \in C_{n}$ is an endpoint, then $x \in C_{i}$ for $i \le n$ because the $C_{n}$ are nested. But $x \in C_{i}$ for $i > n$ as well because we are only removing open middle thirds and an endpoint of an interval can never be in an open middle third. Hence $x \in C$. There are two natural questions we might now want to answer:

• Are there any more points in $C$ besides the endpoints of intervals?
• As all of the endpoints are rationals, is $C \subset \mathbb{Q}$?

Interesting as they are, we will postone answering these questions until the next section since they naturally lead to discussing the size of $C$ in terms of cardinality.

A more natural question to ask is if $C$ is a closed set. Indeed it is. Each $C_{n}$ is closed because $C_{0}$ is. Since an arbitrary intersection of closed sets is closed, $C$ is closed. It is also bounded as $C \subset [0,1]$. So in particular, $C$ is compact by Heine-Borel. However, unlike more familar compact sets (such as closed intervals), $C$ is totally disconnected. That is, every pair of points $x$ and $y$ are separated by disjoint sets. The idea is that $C_{n}$ consists of finitely many disjoint intervals of length $3^{-n}$ so that if $x,y \in C_{n}$ and $3^{-n} < |x-y|$, then $x$ and $y$ belong to different intervals. For $x,y \in C \subset C_{n}$, choose $n$ large enough such that $3^{-n} < |x-y|$. From this $n$ you can produce a pair of disjoint sets containing $x$ and $y$ respectively. Temptation might then lead you to suspect that $C$ consists of only isolated points since it is totally disconnected. This temptation will lead you astray; $C$ has no isolated points. The idea is that we can use the endpoints of the intervals in the $C_{n}$ to construct a sequence $(x_{n})$ convering to any $x \in C$. As $x \in C$, $x \in C_{n}$ for each $n \ge 0$ and as each $C_{n}$ is a union of closed intervals of length $3^{-n}$ we can always find an $x_{n} \in C$ (its an endpoint of an interval in $C_{n}$) such that $x_{n} \neq x$ and $|x_{n}-x| \le 3^{-n}$. The sequence $(x_{n})$ in $C$ will converge to $x$. Observe that since $C$ is closed and contains no isolated points, $C$ is a perfect set. As a conclusion to this section, it is a theorem that these properties uniquely characterize the Cantor set:

Theorem: The Cantor set is the only compact, totally disconnected, perfect metric space up to homeomorphism.

#### Size by Cardinality

The size of the Cantor set is large in the sense of cardinality. In fact it is as large as the real numbers, or in other words, it is uncountable. Loosely speaking, this means our procedure of successively removing open middle thirds does not see the cardinality of $[0,1]$. The idea behind showing the Cantor set is uncountable, does deserve some comment however. We first recall a simple fact that can be proved using contradiction and a standard diagonal argument:

Theorem: The set $B$ of infinite binary sequences is uncountable.

The idea behind proving $C$ is uncountable is to construct a bijection between $B$ and $C$ by exploiting the binary tree structure of $C$ (see image above). Given $x \in C$, it either lies in the first third $[0,\frac{1}{3}]$ or the last third $[\frac{2}{3},1]$ in $C_{1}$ (see image above). Let $b_{1}$ be $0$ or $1$ according to $x$ lying in the first or last third respectively. Given $b_{1}$, $x$ lies in one of two intervals in $C_{2}$: either the first third or last third of the interval corresponding to $b_{1}$. Let $b_{2}$ be $0$ or $1$ according to $x$ lying in the first or last third respectively. Continuing in this manner ad infinitum defines a binary sequence $(b_{n})$ for every $x \in C$. This mapping turns out to be a bijection between $C$ and $B$ showing that $C$ is uncountable.

#### Size by Lebesgue Measure

The Lebesgue measure $\lambda$ is the most natural way of assigning “size” to most subsets of $\mathbb{R}$ in a way that most naturally generalizes the notion of length. We say most subsets here because, strangely enough, for $\lambda$ to naturally generalize the notion of length it must be the case that there exists subsets of $\mathbb{R}$ which cannot me measured (i.e. given a size). Fortunately, these sets are somewhat difficult to construct and don’t pose many isues (see here if you are interested in learning more about such sets). Formally, $\lambda:\mathcal{M} \to \mathbb{R}_{\ge 0}$ is a map assigning every measurable set $E \in \mathcal{M}$ a size $\lambda(E)$. We won’t concern ourselves with what $\mathcal{M}$ is apart from that it contains all open and closed sets. More importantly, $\lambda$ is natural in the following sense:

• It assigns the proper length value to intervals: $\lambda([a,b]) = \lambda((a,b)) = b-a$.
• It is countably additive: If $\{E_{i}\}_{i \in I}$ is an at most countable collection of measurable sets that are all mutually disjoint, then $\lambda(\bigcup_{i \in I}E_{i}) = \sum_{i \in I}\lambda(E_{i})$.

The size of the Cantor set is as small as possible in the sense of Lebesgue measure. In fact, the Lebesgue measure of the cantor set is that of a point, namely zero. We call sets of this nature measure zero sets. So in the sense of measure theory, the Cantor set and a singleton have the same size. The idea is to show that at each stage of the process, we are removing sets whose measure is approaching $\lambda([0,1]) = 1$. When $n = 1$ we have removed a single interval of measure $3^{-1}$. When $n = 2$, we have removed an additional two intervals, pairwise disjoint from themselves and from the first interval, of measure $3^{-2}$. In general, at the $n$th stage we remove $2^{n-1}$ intervals each of measure $3^{-n}$ all of which are pairwise disjoint. Therefore the total measure we remove ad infinitum is

$\displaystyle \sum_{n = 1}^{\infty}\left(\frac{2^{n-1}}{3^{n}}\right) = \frac{1}{3}\sum_{n = 1}^{\infty}\left(\frac{2}{3}\right)^{n} = \frac{1}{3}\frac{1}{1-\frac{2}{3}} = 1$

where we have used countable additivity. Since $\lambda([0,1]) = 1$, this means that the Cantor set has measure zero.

There is an important point here to be made. It is not too hard to prove that the Lebesgue measure doesn’t see finite sets or even countable sets in the sense that they are measure zero. The Cantor set gives an example that the Lebesgue measure doesn’t even see uncountable sets. This suggests that the Lebesgue measure knows information about the topology of a measurable set $E \subset \mathbb{R}$. It gets even more strange: there exists compact totally disconnected sets with positive Lebesgue measure! See here for an example of such a set.

#### Size by Dimension

Let’s detour to talk about self-similar sets in $\mathbb{R}^{n}$ for a moment. Let $E$ be any bounded subset of $\mathbb{R}^{n}$. For any positive real number $c$ and any vector $\mathbf{s}$ we can define the scaled and shifted sets by

$cE = \{cx \mid x \in E\}, \qquad E+\mathbf{s} = \{x+\mathbf{s} \mid x \in E\}$.

In other words, $cE$ is $E$ scaled by a factor of $c$ and $E+\mathbf{s}$ is $E$ shifted by $\mathbf{s}$. We say $E$ is self similar if there exists a finite set $\{\mathbf{s}_{n}\}_{n = 1}^{N}$ of vectors and a positive real number $c < 1$ such that

$\displaystyle E = \bigcup_{n = 1}^{N}cE+\mathbf{s}_{n}$

and the sets $E+\mathbf{s}_{n}$ are all pairwise disjoint for $1 \le n \le N$. What this definition is saying is that $E$ consists of $N$ $c$-scaled copies of $E$. If $E$ is self-similar, then we define the similarity dimension of $E$ as

$\mathrm{dim}(E) = -\frac{\log(N)}{\log(c)} = -\log_{c}(N)$.

What does this dimension mean from an intuitive standpoint? First consider a line segment. When we scale it to $c$ its size we need $N = c^{-1}$ copies of the scaled segment to cover the original segment. For a square, scaling it to $c$ its size requires $N = c^{-2}$ copies of the scaled square to cover the original square. For a cube, scaling it to $c$ its size requires $N = c^{-3}$ copies of the scaled square to cover the original square. In the relations between $N$ and $c$, the usual topological dimension is the playing the role of the exponent. The definition of similarity dimension is equivalent to

$N = c^{-\mathrm{dim}(E)}$

so the similarity dimension is generalizing the usual topological dimension in the sense of scaling.

The size of the Cantor set is in the middle in this sense of size. With the definition, the Cantor set doesn’t have integer dimension. We first claim $C$ is a self-similar set with $N = 2$ and $c = \frac{1}{3}$. This means

$C = \left(\frac{1}{3}C+0\right) \bigcup \left(\frac{1}{3}C+\frac{2}{3}\right)$

and these two sets are disjoint. The second of these two statements is clear since the largest element of $\frac{1}{3}C+0$ is $\frac{1}{3}$ and the smallest element of $\frac{1}{3}C+\frac{2}{3}$ is $\frac{2}{3}$. To show equality in the above equation, recall that there is a bijection between $C$ and the set of infinite binary sequences $B$. If we identify these two sets we can express any element $x \in C$ as an infinite sequence $(b_{n})$. It’s easy to see that $(b_{n})$ is in the first set if and only if $b_{1} = 0$ and $(b_{n})$ is in the second set if and only if $b_{1} = 1$. From this fact, the equality above can be deduced. We can then compute the similarity dimension of $C$ as

$\mathrm{dim}(C) = -\log_{\frac{1}{3}}(2) = \log_{3}(2) \sim 0.63$.

So the Cantor set has non-integer dimension!

#### Ending Remarks

Hopefully by this point I have convinced you that the Cantor set is quite an odd beast. At the same time, it is in one sense large, another small, and yet another somewhere in the middle. To pay homage to a classic fairy tale, the Cantor set is like the three bears in one and we are but Goldilocks in its home.