The Cantor set provides some of the most pathological examples in real analysis. Introduced by G. Cantor in 1883, the Cantor set (or Cantor dust) can be thought of as the remainder of the unit interval after removing open middle thirds *ad infinitum*. In the following, we discuss the pathology relating to the “size” of the Cantor set where, depending on how you define it, the Cantor set has the size of a point, the entire real line, or somewhere in-between.

#### The Cantor Set

*The Cantor set is the set obtained from the unit interval by successively removing open middle thirds*. Formally, it is the intersection of a countable number of closed sets. Start by letting . The first middle third we are tasked with removing is , so let . In general, the -th step in this procedure is

so that we have a nested sequence of closed sets. Also observe that * is the union of closed intervals each of length *. We then define the Cantor set as

.

The image below is a pictorial description of the first few steps in forming the Cantor set:

It’s not uncommon to ask if we may have removed all of during this process so that is empty. It is not too hard to see that contains the endpoints of all the intervals making up the . Indeed, if is an endpoint, then for because the are nested. But for as well because we are only removing open middle thirds and an endpoint of an interval can never be in an open middle third. Hence . There are two natural questions we might now want to answer:

- Are there any more points in besides the endpoints of intervals?
- As all of the endpoints are rationals, is ?

Interesting as they are, we will postone answering these questions until the next section since they naturally lead to discussing the size of in terms of cardinality.

A more natural question to ask is if is a closed set. Indeed it is. Each is closed because is. Since an arbitrary intersection of closed sets is closed, is closed. It is also bounded as . So in particular, is compact by Heine-Borel. However, unlike more familar compact sets (such as closed intervals), is totally disconnected. That is, every pair of points and are separated by disjoint sets. *The idea is that consists of finitely many disjoint intervals of length so that if and , then and belong to different intervals*. For , choose large enough such that . From this you can produce a pair of disjoint sets containing and respectively. Temptation might then lead you to suspect that consists of only isolated points since it is totally disconnected. This temptation will lead you astray; has no isolated points. *The idea is that we can use the endpoints of the intervals in the to construct a sequence convering to any .* As , for each and as each is a union of closed intervals of length we can always find an (its an endpoint of an interval in ) such that and . The sequence in will converge to . Observe that since is closed and contains no isolated points, is a perfect set. As a conclusion to this section, it is a theorem that these properties uniquely characterize the Cantor set:

Theorem: The Cantor set is the only compact, totally disconnected, perfect metric space up to homeomorphism.

#### Size by Cardinality

The size of the Cantor set is *large* in the sense of cardinality. In fact it is as large as the real numbers, or in other words, it is uncountable. Loosely speaking, this means our procedure of successively removing open middle thirds does not see the cardinality of . The idea behind showing the Cantor set is uncountable, does deserve some comment however. We first recall a simple fact that can be proved using contradiction and a standard diagonal argument:

Theorem: The set of infinite binary sequences is uncountable.

*The idea behind proving is uncountable is to construct a bijection between and by exploiting the binary tree structure of * (see image above). Given , it either lies in the first third or the last third in (see image above). Let be or according to lying in the first or last third respectively. Given , lies in one of two intervals in : either the first third or last third of the interval corresponding to . Let be or according to lying in the first or last third respectively. Continuing in this manner *ad infinitum* defines a binary sequence for every . This mapping turns out to be a bijection between and showing that is uncountable.

#### Size by Lebesgue Measure

The Lebesgue measure is the most natural way of assigning “size” to most subsets of in a way that most naturally generalizes the notion of length. We say most subsets here because, strangely enough, *for to naturally generalize the notion of length it must be the case that there exists subsets of which cannot me measured* (i.e. given a size). Fortunately, these sets are somewhat difficult to construct and don’t pose many isues (see here if you are interested in learning more about such sets). Formally, is a map assigning every **measurable set** a size . We won’t concern ourselves with what is apart from that it contains all open and closed sets. More importantly, is natural in the following sense:

- It assigns the proper length value to intervals: .
- It is
**countably additive**: If is an at most countable collection of measurable sets that are all mutually disjoint, then .

The size of the Cantor set is as *small as possible* in the sense of Lebesgue measure. In fact, the Lebesgue measure of the cantor set is that of a point, namely zero. We call sets of this nature **measure zero** sets. So in the sense of measure theory, the Cantor set and a singleton have the same size. The idea is to show that at each stage of the process, we are removing sets whose measure is approaching . When we have removed a single interval of measure . When , we have removed an additional two intervals, pairwise disjoint from themselves and from the first interval, of measure . In general, at the th stage we remove intervals each of measure all of which are pairwise disjoint. Therefore the total measure we remove *ad infinitum* is

where we have used countable additivity. Since , this means that the Cantor set has measure zero.

There is an important point here to be made. It is not too hard to prove that the Lebesgue measure doesn’t *see* finite sets or even countable sets in the sense that they are measure zero. The Cantor set gives an example that the Lebesgue measure doesn’t even see uncountable sets. This suggests that the Lebesgue measure knows information about the topology of a measurable set . It gets even more strange: there exists compact totally disconnected sets with positive Lebesgue measure! See here for an example of such a set.

#### Size by Dimension

Let’s detour to talk about self-similar sets in for a moment. Let be any bounded subset of . For any positive real number and any vector we can define the scaled and shifted sets by

.

In other words, is scaled by a factor of and is shifted by . We say is **self similar** if there exists a finite set of vectors and a positive real number such that

and the sets are all pairwise disjoint for . What this definition is saying is that * consists of -scaled copies of .* If is self-similar, then we define the **similarity dimension** of as

.

What does this dimension mean from an intuitive standpoint? First consider a line segment. When we scale it to its size we need copies of the scaled segment to cover the original segment. For a square, scaling it to its size requires copies of the scaled square to cover the original square. For a cube, scaling it to its size requires copies of the scaled square to cover the original square. *In the relations between and , the usual topological dimension is the playing the role of the exponent*. The definition of similarity dimension is equivalent to

so *the similarity dimension is generalizing the usual topological dimension in the sense of scaling*.

The size of the Cantor set is *in the middle* in this sense of size. With the definition, the Cantor set doesn’t have integer dimension. We first claim is a self-similar set with and . This means

and these two sets are disjoint. The second of these two statements is clear since the largest element of is and the smallest element of is . To show equality in the above equation, recall that there is a bijection between and the set of infinite binary sequences . If we identify these two sets we can express any element as an infinite sequence . It’s easy to see that is in the first set if and only if and is in the second set if and only if . From this fact, the equality above can be deduced. We can then compute the similarity dimension of as

.

So the Cantor set has non-integer dimension!

#### Ending Remarks

Hopefully by this point I have convinced you that the Cantor set is quite an odd beast. At the same time, it is in one sense large, another small, and yet another somewhere in the middle. To pay homage to a classic fairy tale, the Cantor set is like the three bears in one and we are but Goldilocks in its home.