The Cantor set provides some of the most pathological examples in real analysis. Introduced by G. Cantor in 1883, the Cantor set (or Cantor dust) can be thought of as the remainder of the unit interval after removing open middle thirds ad infinitum. In the following, we discuss the pathology relating to the “size” of the Cantor set where, depending on how you define it, the Cantor set has the size of a point, the entire real line, or somewhere in-between.
The Cantor Set
The Cantor set is the set obtained from the unit interval by successively removing open middle thirds. Formally, it is the intersection of a countable number of closed sets. Start by letting
. The first middle third we are tasked with removing is
, so let
. In general, the
-th step in this procedure is
so that we have a nested sequence of closed sets. Also observe that
is the union of
closed intervals each of length
. We then define the Cantor set
as
.
The image below is a pictorial description of the first few steps in forming the Cantor set:

It’s not uncommon to ask if we may have removed all of during this process so that
is empty. It is not too hard to see that
contains the endpoints of all the intervals making up the
. Indeed, if
is an endpoint, then
for
because the
are nested. But
for
as well because we are only removing open middle thirds and an endpoint of an interval can never be in an open middle third. Hence
. There are two natural questions we might now want to answer:
- Are there any more points in
besides the endpoints of intervals?
- As all of the endpoints are rationals, is
?
Interesting as they are, we will postone answering these questions until the next section since they naturally lead to discussing the size of in terms of cardinality.
A more natural question to ask is if is a closed set. Indeed it is. Each
is closed because
is. Since an arbitrary intersection of closed sets is closed,
is closed. It is also bounded as
. So in particular,
is compact by Heine-Borel. However, unlike more familar compact sets (such as closed intervals),
is totally disconnected. That is, every pair of points
and
are separated by disjoint sets. The idea is that
consists of finitely many disjoint intervals of length
so that if
and
, then
and
belong to different intervals. For
, choose
large enough such that
. From this
you can produce a pair of disjoint sets containing
and
respectively. Temptation might then lead you to suspect that
consists of only isolated points since it is totally disconnected. This temptation will lead you astray;
has no isolated points. The idea is that we can use the endpoints of the intervals in the
to construct a sequence
convering to any
. As
,
for each
and as each
is a union of closed intervals of length
we can always find an
(its an endpoint of an interval in
) such that
and
. The sequence
in
will converge to
. Observe that since
is closed and contains no isolated points,
is a perfect set. As a conclusion to this section, it is a theorem that these properties uniquely characterize the Cantor set:
Theorem: The Cantor set is the only compact, totally disconnected, perfect metric space up to homeomorphism.
Size by Cardinality
The size of the Cantor set is large in the sense of cardinality. In fact it is as large as the real numbers, or in other words, it is uncountable. Loosely speaking, this means our procedure of successively removing open middle thirds does not see the cardinality of . The idea behind showing the Cantor set is uncountable, does deserve some comment however. We first recall a simple fact that can be proved using contradiction and a standard diagonal argument:
Theorem: The setof infinite binary sequences is uncountable.
The idea behind proving is uncountable is to construct a bijection between
and
by exploiting the binary tree structure of
(see image above). Given
, it either lies in the first third
or the last third
in
(see image above). Let
be
or
according to
lying in the first or last third respectively. Given
,
lies in one of two intervals in
: either the first third or last third of the interval corresponding to
. Let
be
or
according to
lying in the first or last third respectively. Continuing in this manner ad infinitum defines a binary sequence
for every
. This mapping turns out to be a bijection between
and
showing that
is uncountable.
Size by Lebesgue Measure
The Lebesgue measure is the most natural way of assigning “size” to most subsets of
in a way that most naturally generalizes the notion of length. We say most subsets here because, strangely enough, for
to naturally generalize the notion of length it must be the case that there exists subsets of
which cannot me measured (i.e. given a size). Fortunately, these sets are somewhat difficult to construct and don’t pose many isues (see here if you are interested in learning more about such sets). Formally,
is a map assigning every measurable set
a size
. We won’t concern ourselves with what
is apart from that it contains all open and closed sets. More importantly,
is natural in the following sense:
- It assigns the proper length value to intervals:
.
- It is countably additive: If
is an at most countable collection of measurable sets that are all mutually disjoint, then
.
The size of the Cantor set is as small as possible in the sense of Lebesgue measure. In fact, the Lebesgue measure of the cantor set is that of a point, namely zero. We call sets of this nature measure zero sets. So in the sense of measure theory, the Cantor set and a singleton have the same size. The idea is to show that at each stage of the process, we are removing sets whose measure is approaching . When
we have removed a single interval of measure
. When
, we have removed an additional two intervals, pairwise disjoint from themselves and from the first interval, of measure
. In general, at the
th stage we remove
intervals each of measure
all of which are pairwise disjoint. Therefore the total measure we remove ad infinitum is
where we have used countable additivity. Since , this means that the Cantor set has measure zero.
There is an important point here to be made. It is not too hard to prove that the Lebesgue measure doesn’t see finite sets or even countable sets in the sense that they are measure zero. The Cantor set gives an example that the Lebesgue measure doesn’t even see uncountable sets. This suggests that the Lebesgue measure knows information about the topology of a measurable set . It gets even more strange: there exists compact totally disconnected sets with positive Lebesgue measure! See here for an example of such a set.
Size by Dimension
Let’s detour to talk about self-similar sets in for a moment. Let
be any bounded subset of
. For any positive real number
and any vector
we can define the scaled and shifted sets by
.
In other words, is
scaled by a factor of
and
is
shifted by
. We say
is self similar if there exists a finite set
of vectors and a positive real number
such that
and the sets are all pairwise disjoint for
. What this definition is saying is that
consists of
-scaled copies of
. If
is self-similar, then we define the similarity dimension of
as
.
What does this dimension mean from an intuitive standpoint? First consider a line segment. When we scale it to its size we need
copies of the scaled segment to cover the original segment. For a square, scaling it to
its size requires
copies of the scaled square to cover the original square. For a cube, scaling it to
its size requires
copies of the scaled square to cover the original square. In the relations between
and
, the usual topological dimension is the playing the role of the exponent. The definition of similarity dimension is equivalent to
so the similarity dimension is generalizing the usual topological dimension in the sense of scaling.
The size of the Cantor set is in the middle in this sense of size. With the definition, the Cantor set doesn’t have integer dimension. We first claim is a self-similar set with
and
. This means
and these two sets are disjoint. The second of these two statements is clear since the largest element of is
and the smallest element of
is
. To show equality in the above equation, recall that there is a bijection between
and the set of infinite binary sequences
. If we identify these two sets we can express any element
as an infinite sequence
. It’s easy to see that
is in the first set if and only if
and
is in the second set if and only if
. From this fact, the equality above can be deduced. We can then compute the similarity dimension of
as
.
So the Cantor set has non-integer dimension!
Ending Remarks
Hopefully by this point I have convinced you that the Cantor set is quite an odd beast. At the same time, it is in one sense large, another small, and yet another somewhere in the middle. To pay homage to a classic fairy tale, the Cantor set is like the three bears in one and we are but Goldilocks in its home.