We’ve spent some time contemplating and discussing the intricacies of the infinite. We started off with a very natural abstraction, and quickly got lead into a mire of technicality and complexity. With a little work we came through unscathed and ended up with a new appreciation for the unfolding beauty and complexity that we were lead to. All of that, however, was a matter of “head in the clouds” contemplation as to what it truly meant to be infinite, and what the continuum actually was. Now that we have that understanding it is time to wander off the path and explore the countryside that such understanding begins to open up for us.

We are going to take a rather circuitous route through all of this, so please bear with me as we wander in odd directions. I want to start by picking up from Paradoxes of the Continuum, Part I and Zeno’s paradoxes — in particular it is finally time to have a look at the previously undiscussed third paradox of Zeno, knows as *The Arrow*. The paradox generally runs as follows: picture an arrow speeding toward its target. Now imagine freezing time to just a single moment as the arrow travels; the arrow will be stuck, stationary and not moving. And yet this will be true at each and every moment of the arrow’s flight — the arrow is always stationary! Thus, would conclude Zeno, the concept of movement and change is false; merely an illusion.

As with Zeno’s other paradoxes, also meant to demonstrate that movement is an illusion, most people aren’t convinced. It sounds all well and good, but I think we all tend to suspect there’s a trick somewhere in that reasoning. And indeed there is, but it is a subtle one, and teasing it out will require our new understanding of the continuum. To start, however, we’re going ask some questions about *speed*.

We tend to have intuitions about speed (or more correctly velocity, but I’m going to avoid vectors for now), and it is these inexact intuitions that Zeno preys upon. We think of speed as the rate of movement of an object. An object that has zero speed is stationary — not moving. But how do we know the speed of an object? We observe its change in position over time, and then normalize that value based on the span of time we were observing the object for. This works well enough, giving an average speed over reasonable spans of time, but it fools us when we have to deal with Zeno’s thought experiment: If we freeze time then surely the change in position will be zero, and doesn’t that mean that the speed will be zero? Well no, contrary to intuition the *instantaneous* speed isn’t zero; to see that, however, we need to think about time as a continuum.

This is the heart of the trick in Zeno’s third paradox: he effectively manages a straddle, viewing time as either discrete or continuous at different times to suit his needs. If we think carefully, and view time as truly continuous, the problem will evaporate.

Given a span of time it is easy enough to calculate a change in position by taking a difference (that is, subtracting the first position from the second). What are we to make, however, of a change in position when there is no span of time involved? What we need, is a *continuous difference* between positions — the difference as positions change smoothly and continuously. So let’s think about a moment in time: it is a point on the continuum. We have, however, learned about continuums, and found that a point on a continuum is really an infinite sequence of ever more accurate approximations — a Cauchy sequence. That is, a point on the continuum is an infinite sequence of rational points that “home in” around the desired point; for many points on the continuum (indeed, for most of them!) we can only specify them as an infinite sequence of approximations that get ever closer to, but never quite reach, the point. More importantly, however, we learned that there are many different Cauchy sequences that all refer to the same point (in much the same way as there are many fractions that refer to the same ratio): we simply pick a representative sequence. We could, of course, choose two different representative sequences for the same point. If we’re careful we can choose sequences that have no terms in common — perhaps one sequence is consistently increasing, each term larger than the next, but by a smaller and smaller amount, toward our desired “limit point”, while the other is consistently decreasing, each term slightly smaller than the next, down toward the same desired “limit point”.

To make this a little easier to talk about we’ll provide some labels. Let

t_{1}, t_{2}, t_{3}, t_{4}, …

be the moment in time expressed as the sequence tending from below, and let

T_{1}, T_{2}, T_{3}, T_{4}, …

be the same moment, expressed as the sequence tending from above. Now, for each term in each sequence, we’ll have an associated position of the arrow at that time (the term is, in a sense, shorthand for a constant sequence). Thus for the first sequence we’ll have a sequence of positions:

p_{1}, p_{2}, p_{3}, p_{4}, …

and for the second sequence we’ll have a different sequence of positions:

P_{1}, P_{2}, P_{3}, P_{4}, …

With all of that in mind we can set about calculating the speed of the arrow at the chosen moment in time. How do we do that? Well we can certainly find the difference between positions p_{1} and P_{1}, and since t_{1} and T_{1} are different we can normalize against the span of time between those positions to get *a* speed — call it s_{1}. That is, we can calculate

s_{1}= (P_{1}– p_{1})/(T_{1}– t_{1})

with solid assurance that neither the numerator nor denominator are zero (we specifically chose our sequences t_{n} and T_{n} so this would be the case). Indeed, we can do the same calculation for each set of terms, and find

s_{2}= (P_{2}– p_{2})/(T_{2}– t_{2})

s_{3}= (P_{3}– p_{3})/(T_{3}– t_{3}

and so on, giving us an infinite sequence s_{1}, s_{2}, s_{3}, …, which shouldn’t be that daunting since, if we were being realistic, we were expecting speed to be a continuum as well, and thus we expect speeds to be, ultimately, Cauchy sequences too. So is the sequence Cauchy — is it an ever closer approximation of some single value? In the case of our arrow, the answer is yes. Does the sequence get closer and closer to zero? In the case of our arrow, the answer is no. And so we have a non-zero speed that we calculated by finding the change in position between a moment, and itself (normalized, of course, against span of time over which we observed the change). In other words, at any given moment the arrow isn’t actually stationary — it has some non-zero speed.

There is a technicality that it is worth being aware of: for all of this to make sense we want to know that the speed we get is independent of our choice of Cauchy sequence to describe the chosen moment of time. That is, we don’t want to find that, if we choose two new and different Cauchy sequences that express our chosen moment and proceed with the calculation as above, we end up with a different speed as the result. We can, of course, end up with a different sequence, as long as the sequence tends to the same limit (recall that the same point on the continuum can be expressed as many different Cauchy sequences). Clearing this particular hurdle will be easier later when we have better mathematical machinery and notation, so for now it will suffice to say that, presuming our arrow travels as any normal arrow, we won’t have an issue here.

The real question, however, is: what just happened? With a wave of my hand and a dance of symbols we’ve apparently made the problem disappear, but I can certainly forgive you if you don’t feel any more enlightened as to where Zeno’s paradox breaks down. Let’s go over it all again, but at a higher level, to try and get a feel for what’s really going on here.

Zeno wants us to conclude that the arrow is frozen in a moment in time, that it is stationary. We intuitively think this is the case because we presume that, in a moment of time, the change in position of the arrow is zero, and from this we think of it as having zero speed. The catch is that, for speed to make sense, we require the moment of time in which the arrow is frozen to be non-zero — that is, a moment should capture some span of time. We can think of the moment that way because we are used to thinking of things in terms of discrete units, and we intuitively associate a moment with a discrete unit of time. We think of it as the smallest possible span of time; the fundamental tick of the universe that takes us from one moment to the next. The problem is that time is a continuum, and continuums don’t work that way: there isn’t any clear “next moment”; more specifically, as noted in Paradoxes of the Continuum, part II, the continuum contains incommensurable points — there simply is no fundamental unit for continuums, it cannot exist.

Now, in Paradoxes of the Continuum, Part II we made sense of the continuum, and these baffling points, by working in terms of Cauchy sequences. The points of the continnum are, in some sense, evanescent; we cannot pint them down, but instead must specify them as an infinite sequence of progressively more accurate approximations. Or, to put it another way, a point on the continuum, a moment in time, is a completed infinite, and as we saw in A Transfinite Landscape we must be careful of our intuitions when dealing with completed infinities.

Now, just as a moment is expressible as an infinite increasingly accurate approximation, the speed of the arrow in that moment (which exists in the continuum of speeds — since we expect speeds to change smoothly just as time does) will also be expressible as an infinite increasingly accurate approximation. How do we find increasingly good approximations of the speed at a given moment? By building each approximation from approximations of the moment in time! This is, in essence, exactly what we have done above. The evanescent nature of points on the continuum — that they can only be apprehended as ever better approximations rather than clear discrete points — is the catch here. Zeno’s paradox relies on us failing to fully grasp this bizarre aspect of the continuum.

To defeat the paradox we have shifted from thinking in terms of discrete differences in position, to continuous (normalized) differences in position — differences that don’t require a discrete “next moment”. A natural question begins to arise: if we can take continuous normalized differences instead of the usual discrete differences, can we evaluate continuous normalized sums instead of the usual discrete sums? And what do we mean by that anyway? It is time to follow our nose, and attempt to resolve these questions, for in doing so we will eventually come to a better understanding of continuous differences as well!

Before trying to work out what we mean by a continuous normalized sum, let’s sort out what a discrete normalized sum looks like, and what would have to change for it to be a continuous sum. A discrete normalized sum is, in practice, an average. If we have a list of populations for each country in the world we can find the average population for a country. To do that we sum up all the populations, and divide by the number of countries.

That covers discrete normalized sums — so what does it mean to be continuous? All we need to do is shift to values that are spread over a continuous rather than discrete domain. Suppose we have a metal bar with a temperature that varies (continuously!) along it’s length. We can reasonably ask for the average temperature of the metal bar as a whole. Now, however, we don’t have discrete values as we did with the populations of countries; the temperatures are spread over a continuum. Our usual methods of summation won’t work — we need a continuous normalized sum!

As with continuous differences we should expect to find our answer takes the form of an infinite sequence of ever better approximations since we expect our answer to, itself, be a value on a continuum. How do we arrive at these approximations however?

If we look back at how the continuous differences case worked, we see that we arrived at the desired continuous difference via a series of ever better approximations using discrete differences. Following that line of thought, we should look to create discrete sums that approximate our desired continuous sum. To create a discrete sum we can simply pick a finite number of points along the bar, sum up the temperature values at those points, and normalize by dividing by the number of points on the bar we picked — that gives us a discrete normalized sum that we can calculate. Now, obviously in picking merely a finite number points to sample the temperature at we are getting only an approximation; if we add more points to our existing sample, however, the approximation will improve. Thus if we choose to sample the temperature at more and more points for each successive approximation we will have a sequence of approximations that will get more and more accurate. It’s not hard to see that such a sequence will be a Cauchy sequence, and thus we have specified a point on the continuum of temperatures that will be the average temperature of the bar.

As with continuous differences, we again have the technical issue: in this case ensuring that which finite set of points we choose for each approximation doesn’t effect the final result; that is, at each stage we have a choice of which (and how many!) new points to add to our sample, and we want to be sure that our choice of points doesn’t actually change the end result: the Cauchy sequences of approximations are allowed to differ, but their limit, the point of the continuum that they specify, must be the same. For now that’s a little to technical to get into, but suffice to say that, with sufficient care, we can surmount this issue.

Another issue is that of why we are using *normalized* sums; why not just use ordinary sums instead? This comes down to the nature of continuous sums, and can be elucidated by our examples of populations of countries and the temperature of a metal bar. In the discrete example, summing populations of countries, there is a clear way of finding the ordinary sum, the total population of all countries. We can do that because we have a clear sense of units that we are summing over: a country is a unit, each country has a population value, and all of those population values are of equal weight with regard to one another. Indeed any discrete sum, by default, defines the units that are being summed over, since we can regard each discrete value as a discrete unit. In contrast, when we have a continuous domain to sum over, such as the metal bar, there are *no basic units* since such a thing simply does not exist for a continuum! Given that we have no units to sum over, and thus cannot ensure that the values we are summing are of comparable weight to one another, a standard sum no longer quite makes sense. Instead we simply need to pick an arbitrary unit and convert (i.e. *normalize*) our values into that unit system.

In the case of the calculation we made for the metal bar our unit was the bar itself, and thus the continuous normalized sum was the average temperature for the length of bar (the average, normalized per our unit of choice). We could equally well have worked in meters and normalized according to that unit; what we would get is the sum of the temperatures along the bar averaged over one meter of bar. What exactly does that mean? Well if the bar was, for example, three meters long, then the resulting sum over the length of the bar normalized to meters would be three times that of the average for the bar — we’re summing over three meters of bar, but normalizing (averaging) that total over only one meter of bar. Of course if we wanted the average temperature of just the first meter of the bar, and summed only over that first meter things would again make sense as a standard average. The key, however, is having a standardized unit to normalize each individual contribution to the sum against.

This same sort thinking applies to the discrete differences example: we choose an arbitrary unit of speed (be it meters per second, feet per hour, or what have you) and, for each discrete speed calculation we do, normalize to those units so each successive approximation is comparable to the last. Thus we see that our continuous differences and sums are necessarily *normalized* differences and sums because we simply don’t have a default set of units in the continuous case, and thus have to pick and arbitrary unit and consistently normalize to those units.

So, we now have a method for finding continuous sums, as well as continuous differences. The natural question is: how do they relate to one another? If they are to behave at all similar to discrete sums and differences, we would expect them to be inverses of one another. Does this work? Let’s go back to our example of the arrow, and recall that we can calculate, for each point in time, the speed of the arrow at that moment — the continuous (normalized) difference in position. It should be possible, using continuous sums, to sum up all those speeds. And indeed we can, but if we look a little closer at exactly what is going on, we’ll find something very interesting is happening.

Both the continuous differences and continuous sums need to be normalized; the continuous differences in position are normalized with respect to time, giving us speed, while a continuous sum of speeds is also normalized with respect to time. Intriguingly, because of the way these normalizations work, they will cancel out. Recall that, if we normalized our metal bar against meters, we got a average temperature scaled by the length of the bar that we summed over in meters. If we were to sum up the continuous differences, the speeds, then we would get a average speed scaled by the amount of time we were summing over — but what is an average speed multiplied by the amount of time over which the average was maintained? Why it is the total change is position over that time! Thus if we take a continuous sum of the continuous differences in position, we arrive back at the total change in position. Likewise if we were to sum up instantaneous acceleration values (continuous differences in speed) we expect to get the total change in speed, and so on. This relationship between continuous sums and continuous differences (as relatively clear as it is when phrased in terms of sums and differences) is *The Fundamental Theorem of Calculus*!

Indeed, what we have been doing here is the heart of calculus. Ultimately this is what calculus is: the arithmetic of the continuous. And this also begins to explain why it is that calculus is so important: the continuous is all around us; time and space are continuous, and so are many things that inhabit it such as electromagnetic fields. And even things that aren’t strictly continuous, such as the flow of water, can be modelled or approximated as being continuous (calculating the movement of every molecule of water is intractable, but if we approximate water as a continuous medium the problem becomes quite manageable). When so much is best handled as continuous, it is clear that an arithmetic of the continuous, rather than the basic discrete arithmetic we are used to, is vital: it opens up vast new areas of our universe for mathematical exploration that were inaccessible using only discrete arithmetic. And so it was that, with the development of an organised calculus, an arithmetic of the continuous, in the 17th century, physics and our understanding of the universe underwent a stunning revolution that is still on-going today. And at the heart of this remarkable revolution was a deeper understanding of of the age old abstractions of the infinite and the continuous.

]]>

In the previous entries Shifting Patterns and Permutations and Applications we have looked at how patterns and symmetry can be abstracted into algebras. Each different pattern provided its own unique algebra, with different algebraic rules. These are, in a sense, our islands; each different and unique and beautiful. What we need to explore now is the entire bay. To do this we need to take a step back, or more accurately up, and try to examine the whole. The aim is to abstract across all the different algebras that different patterns generate. We could, perhaps, liken this to the process of developing algebra from numbers by abstraction; here, however, we have different pattern-algebras standing in place of different numbers. In abstracting up from numbers to basic algebra we looked for those properties that held true regardless of the particular numbers under consideration — each different number has its own unique properties (indeed, there is a vast richness of material here that is studied in number theory) — but there are certain properties that are common to all. What we seek is a set of basic algebraic rules that are true for all pattern-algebras; each different pattern algebra may have its own idiosyncratic rules, but hopefully we can find a basic set of underlying rules. In practice we’re going to hope for even more than that. Not only do we want the algebraic rules we determine to be common to all pattern-algebras, we would very much like it if *any algebra* that satisfied the basic rules turned out to be the pattern-algebra for some pattern. That is, we want a two way correspondence: every pattern-algebra should satisfy these rules, and everything that satisfies these rules should be a pattern-algebra.

So where to start? First and foremost we know that every pattern always has the null symmetry — the do nothing symmetry: doing nothing to a pattern will always preserve the pattern. Thus it follows that our rules should somehow express this fact. The question then is how to express the existence of a null symmetry in terms of algebraic rules. That, in turn, leads to the question of what algebraic rules make a given symmetry element a null symmetry. Now, the algebraic rules we can have for pattern-algebras are all expressed in terms of combining together different symmetries, so a rule for the null symmetry will be one that expresses how it interacts with other symmetries in combination. How does the null symmetry interact with other symmetries? Since it is the “do nothing” symmetry, combining it with other symmetries gives the same result as if we hadn’t done it at all. That is, we know that a symmetry e is a null symmetry if, for any symmetry a we have

ae = ea = a

Thus our first requirement, or rule, for general pattern-algebras is that there must be some element e of the algebra such that ae = ea = a for any (i.e. every) element a of the algebra.

Next it would be nice to start to characterise the *symmetry* aspect of the pattern-algebras. Ultimately this can be reduced to just a couple of properties, though such a reductionist approach does diminish the transparency of the relationship to symmetry. The first, and easier to grasp, property boils down to the reversibility of symmetries. That is, if a certain action is a symmetry, then doing it’s opposite and taking everything back to how you started is also a symmetry. This is perhaps a little non-obvious since the “opposite” action doesn’t appear to necessarily be a pattern preserving symmetry in its own right; however, since the the result of our initial action results in a state that preserves the pattern, and the initial state to which we go back via the “opposite” action also necessarily preserves the pattern (by definition essentially) it follows that the opposite action necessarily takes pattern preserved states to pattern preserved states (that is, as long as we start in a pattern preserved state, we’ll always end up at another one) and is thus a symmetry. Looking at it from a broader perspective this is really a relatively deep statement about the nature of what we are calling symmetries: they are actions that move from one one state to another preserving some pattern that we have deemed it relevant to conserve; it is in the nature of such actions that they be reversible and that the inverse or opposite action is also a pattern preserving action. How do we write that in terms of algebraic rules? The opposite action is going to be one that combines with initial action to result in effectively a null action. Thus our second requirement is that for any symmetry a there exists some symmetry b such that

ab = ba = e

where e is the null symmetry whose existence we were guaranteed by the first requirement.

The final property we require is the hardest to explain in clear practical terms. It will be easier to just state it, and then try and discuss exactly what it means, and why it might be relevant. The property we require is that for any three elements of the pattern-algebra, call them a, b, and c, we have

a(bc) = (ab)c

This is the associative law which you should recall from A Fraction of Algebra. What we are essentially saying by applying it here is that how we group together composition of symmetries is unimportant to the end result. That this is true of symmetries is relatively clear: given a sequence of symmetry actions the order of the actions matters, but how we group them does not; we can think of some pair, or group of actions in the sequence, as a single atomic action and the end result will be the same. For a more explicit example of this we can think of our example of the symmetries of a square from Shifting Patterns. There we expressed things in terms of two basic actions: a rotation by 90 degrees, r; and a flip about the vertical axis, f. These combined to provide other symmetries, for example fr was the symmetry action of flipping the square about its trailing diagonal. Now, given a sequence of such actions, it didn’t matter whether we thought it as a diagonal flip about the trailing diagonal axis followed by a rotation by 90 degrees, or a flip about the vertical axis followed by a rotation by 180 degrees, or simply as a flip about the horizontal axis; all amount to the same result, the same rearrangement of corners of the square. Stretching your mind to abstract this to general symmetries will let you see that they too will have the same property. Seeing that this property is the last piece we need to characterise symmetries is a little harder, and perhaps beyond the scope of this entry. Suffice to say this third requirement is the last one we need.

There are, of course, a few unstated assumptions that we’ve been getting away with here. For the purposes of informal discussion that’s fine, but when we get into hammering out the specifics for mathematical purposes we can’t afford to let such ambiguity stand. So, let’s spell out these final details…

In assuming that there are symmetries of pattern we are assuming that there are a set of actions, that is, rearrangements, and that those actions are composable — that we can combine two actions together one after the other. In terms of our pattern algebra that amounts to assuming the existence of some algebraic objects, which we can denote by letters (or other characters if we prefer) and some sort of *binary operation* for those algebraic objects. A binary operation is essentially just a rule that allows us to combine together any two elements and arrive at a third. Addition is a binary operation on numbers; you add together two numbers to get a third. Likewise multiplication is a binary operation for numbers; you multiply two numbers together and get a new number. So what is the binary operation for pattern-algebras? There isn’t one in particular. Indeed, the binary operation is essentially defined by the algebraic rules unique to that pattern-algebra, so different pattern-algebras have different binary operations. There are a set of rules that help narrow down binary operations for pattern algebras in general, and those are, of course, precisely the rules we’ve been discussing previously. What matters, however, is that there exist *some* binary operation that can act on any pair of algebraic objects.

Putting all of this together we can arrive at a formal description of what it takes to be a pattern-algebra. Using abstraction to pare it down the the minimum set of requirements we have: a pattern-algebra is a set of objects, with a binary operation on those objects, such that the following hold:

- There is an object e such that, for any other object a in the set, ea = ae = a.
- For each object a in the set, there is some object b, also in the set, such that ab = ba = e (Where e is the special object mentioned in requirement 1).
- For any three objects a, b, and c in the set, we have (ab)c = a(bc).

and that’s it. This sort of very abstract and minimalist definition is exactly what you’ll find at the very beginning of most books on *Group Theory*, since this is the formal definition of what mathematicians call a *group*. Indeed a *group* is really just a pattern-algebra, though in the more rarefied areas of group theory the patterns they relate to can be so hideously complex as to be effectively unimaginable. Since using the same terms as mathematicians will help keep us on the straight and narrow when referring to any outside sources I’ll henceforth be using “group” to mean the sort of pattern-algebra we’ve been discussing throughout Shifting Patterns, Permutations and Applications and this entry. You can, of course, mentally translate “group” to mean “pattern-algebra” to help keep the mental connections to patterns and symmetry clearer.

So what was all this abstraction for? What exactly have we gained by reducing things to this very abstruse definition in terms of sets and binary operations and algebraic rules? One may as well ask what the point of abstracting over different collections of objects to arrive at the abstruse notion of numbers and arithmetic is. We’ve dropped away all the fine grained particularity, and reduced things to a simple matter of clearly defined rules. Any set of objects with a binary relation that meets these three simple rules can be thought of as a group relating to some pattern. We don’t have to know about the pattern however, we can simply work within the specific rules of the group. More importantly, however, we now have an overview of the entire bay of islands rather than finding ourselves inspecting each unique island. These three simple rules provide the minimal basis for any group. Any group must have at least these three rules, and we can add any extra rules we like to these three base rules (as long as the resulting set of rules is self-consistent) and arrive at a new group. We aren’t bound by patterns any longer, we can explore the unique complexity of any island in the bay by simply building the group of our choosing; by imposing the extra rules, the extra structure, we wish. We need only choose a set of some size, and rules regarding how elements interact with one another.

This, in turn, begins to draw us full circle. The perceptive reader may have noticed that each of the three rules defining a group is listed amongst the algebraic rules for addition of numbers back in A Fraction of Algebra; we had the existence of an additive identity (the number zero) which covers requirement 1, the existence of additive inverses (negative numbers) which covers requirement 2, and the associativity of addition which covers requirement 3. Furthermore, you will hopefully recall that, as discussed in The Slow Road, higher order operations such as subtraction, multiplication, and division, could be built from addition. Numbers form a group; they are a pattern-algebra!

What pattern does the group of numbers under addition relate to? If we are thinking of the integers then you can think of an infinite line of marbles (infinite in both directions), with the various symmetries being horizontal shifts of varying sizes. The fact that such a shift is indeed a symmetry, that it results in exactly the same pattern of marbles, is something we discussed and resolved in our parallel road considering the nature of the infinite. And so now, having scaled the heights, we find we can look back, out across the vast bay with an infinite number and diversity of unique and interesting islands, and see that what we had thought of as the vast plain of numbers with all its intricacies, is, in fact, just a single island amidst a sea of many many more. Certainly there is much to explore on that original island, but that is just the beginning; we now have the perspective to see how much wider the world is; to put our narrow beginnings in their proper place. And still we have only just begun! There is a vast bay of islands yet to explore, and once we have begun to comprehend some of their mysteries we can strike out for higher ground again from which we can look back and see even how narrow our current vista is.

]]>

The difficulty is in being sure of your answer*. We require a way to think consistently and coherently about such matters. So what is the answer? bin A has *no tennis balls* in it, while bin B has an infinite number! Does that sound wrong? It certainly seems confusing: we are consistently putting two balls into bin A and only taking one out at each step, so how can we end up with no balls in bin A? The key is to think in terms of a finished state, when the infinite process is somehow “complete”. Every ball is eventually moved to bin B, thus after an infinite number of steps all the balls must have been moved to bin B. The counterintuitive aspect is that we don’t expect moving one ball at a time to ever catch up with moving two balls at a time, yet oddly this happens.

Another tale that highlights this point is that of the hotel with an infinite number of rooms**. The story usually begins with the hotel finding itself full one evening. A lone traveller then arrives, very weary, and asks the hotel manager if there is any chance at all that he can get a room. The hotel manager ponders this for a moment, and then has an idea. He asks each guest to move to the room numbered one higher than their current room. Since every number has a number one greater, and there are an infinite number of rooms, everyone is housed; and yet room number 1 is now empty, and the traveller has somewhere to stay. It doesn’t end there though. After the lone traveller, an infinite tour bus arrives, carrying an *infinite* number of passengers all looking for rooms. After having solved the first problem, however, the hotel manager isn’t phased. He asks each guest to move into the room with number twice that of their current room. Again, each number has a number twice as large, and there are an infinite number of rooms, so again everyone is housed; this time, however, everyone is housed in even numbered rooms, which leaves infinitely many odd numbered rooms in which the bus-load of tourists can be put up for the night.

This brings us a little closer to the sticky point where our intuitions start to go astray. For any finite number n we expect there to be (roughly, it depends on whether n is odd or even) half as many even numbers less than n than there are natural numbers less than n; when we have infinitely many numbers, however, there seems to be exactly as many even numbers as natural numbers. It’s this sort of unexpected equality with a set we initially intuitively think should be half as big that allows the tennis ball problem to fool us. In a sense, looking back from a completed infinity, 1 and 2 look pretty much the same. What it really comes down to, however, is the very simple question of what we mean by “how many”. As we’ve often seen before, the devil is usually in the details, and even simple things that we think we know and understand bear some thinking about if we want to be sure we actually know what we mean.

What happens when we count things? Because counting is a fairly innate skill for most adults it is helpful to consider what children, for whom counting is still somewhat new, do. Usually they count on their fingers (or other similar things), making a correlation between objects counted and fingers held up. At the more advanced adult level we do much the same thing, but we correlate with abstract objects (numbers, which by that time we’ve solidly beaten into instinctual memory). The point I’m trying to get at here is that counting is a matter of correlation; more importantly it is a very particular kind of correlation: in mathematics it is known as a one-to-one correlation. This means that each object corresponds with exactly one other object, and vice versa — in practice each object corresponds to exactly one number in our count, and each number in the count corresponds to exactly one object. If you can accept that, at the heart of it, it is that one-to-one correspondence that matters in counting, that it is the correspondence that ultimately determines what we mean by quantity, then we can pull out the mathematicians handy tool of abstraction and forget the other unimportant trivialities we might associate with counting, and use the idea of one-to-one correspondence to “count” infinite quantities. It might not be counting exactly as you would normally do it, but it will have the same core properties that matter about counting and quantity, and in the end as long as we can agree as to what important parts need to be preserved we can happily abstract all the rest away.

So how do we count infinite sets with one-to-one correspondences? Rather than actually counting infinitely many things, we provide an explicit process by which the count could (in theory) be done. Thus, we simply try to set up a one-to-one correspondence just as before, the difference being that it will be given as a rule we can apply element by element as needed, rather than having every single element to element correspondence laid out ahead of time. If such a correspondence exists then the sets have the same infinite quantity. And that is exactly what we are doing, for example, with the infinite hotel story. First we are comparing the sizes of the sets {1,2,3,…} and {2,3,4,…} by noting that we have a correspondence

1 | 2 | 3 | 4 | ⋯ | n | ⋯ |

↕ | ↕ | ↕ | ↕ | ↕ | ||

2 | 3 | 4 | 5 | ⋯ | n+1 | ⋯ |

and that since both sets are infinite we will have exactly one element in the second set for every element in the first set, and vice versa; a one-to-one correspondence. The sets have the same quantity — thus we can shuffle everyone down one room and still house them all. When the tour bus shows up we end up comparing the sizes of the sets {1,2,3,…} and {2,4,6,…} by making the correspondence

1 | 2 | 3 | 4 | ⋯ | n | ⋯ |

↕ | ↕ | ↕ | ↕ | ↕ | ||

2 | 4 | 6 | 8 | ⋯ | 2n | ⋯ |

where again the infinite sets ensure that each element corresponds to exactly one element in either direction; another one-to-one correspondence, demonstrating the sets have the same quantity — we can move everyone to even numbered rooms and still house them all.

At this point most people are happy enough to accept that there are the same number of even numbers as there are natural numbers. Their argument runs roughly “well there is an infinite number of both, and infinity is as big a number as there can be, so of course they’re the same”. There is, possibly, a little squirming under the fact that the what is clearly only a part apparently has the same size as the whole, but that tends to get swept under the rug of “infinity is as big as you can go, so we don’t have a choice”. The real problems start, however, when we come to the realisation that using this same idea of quantity (determined in terms of one-to-one correspondences) we can find sets with sizes larger than the set of natural numbers: there may be infinitely many natural numbers, but that is not as large as you can go!

The classic example of something infinite and larger in size than the natural numbers is the continuum (at least as classically conceived; the constructivist/intuitionist continuum is a little more tricky on this front) as discussed in Paradoxes of the Continuum, Part II. In that post we determined that points on the continuum were able to be identified with Cauchy sequences, which were akin to (though a little more technical than) infinite decimal expansions. We’ll stick with infinite decimal expansions here as most people have a better intuitive grasp of decimals than they do of Cauchy sequences or Dedekind cuts. To make things simple we’ll consider the continuum ranging between 0 and 1; that is, all the possible infinite decimals between 0 and 1, such as 0.123123123… We do have to be a little bit careful here since, as you should recall from Paradoxes of the Continuum, Part II, in the same way that there are many fractions that represent the same ratio, there were many Cauchy sequences that represent the same point in the continuum, and in particular there are different decimal expansions that represent the same point, such as 0.49999999… and 0.50000000… ***. To be careful we have to make sure we always pick and deal with just one representative in all such cases; to do this we can simply only consider representations that have infinitely many non-zero places. Showing this is sufficient, and still covers all real numbers between 0 and 1 isn’t that hard, but amounts to some technical hoop jumping that is necessary for formal proofs, but not terribly elucidating for discussions such as this. Suffice to say that it all works out.

The catch now is that we need to show not just that we are incapable of finding a one-to-one correspondence between these points on the continuum and the natural numbers, but that no such correspondence can exist. We do this by the somewhat backwards approach of assuming there is such a correspondence, and then showing that a logical contradiction would result. From that we can conclude that any such correspondence would be contradictory, and thus can’t actually exist (at least not in any system that doesn’t have contradictions). So, to begin, lets presume we have a correspondence like so****

1 | 2 | 3 | 4 | ⋯ |

↕ | ↕ | ↕ | ↕ | |

0.a_{1}a_{2}a_{3}… |
0.b_{1}b_{2}b_{3}… |
0.c_{1}c_{2}c_{3}… |
0.d_{1}d_{2}d_{3}… |
⋯ |

where the a_{1} etc. are just digits in the decimal expansion. The trick is to show that despite our best efforts to set up a one-to-one correspondence, the list of points in the continuum given by this correspondence (and since we haven’t specified what the correspondence actually is, *any* such one-to-one correspondence) is actually incomplete: we’ve missed some. We do this by constructing a decimal as follows: for the first decimal place, choose a digit different from a_{1}, for the second decimal place choose a digit different from b_{2}, for the third choose a digit different from c_{3}, and so on. Now clearly this decimal is between 0 and 1, and hence ought to be in our list somewhere, but by its very manner of construction it isn’t! How can we be sure of that? Consider any decimal on the list, say the nth one; now since in constructing our new decimal we specifically chose the digit in the nth decimal place to be different from the nth decimal place of the nth decimal on the list, even if our decimal agrees at every other decimal place, we know it differs at at least one — and if it differs at one decimal place, then it is a different decimal. Since that argument applies to every single decimal on the list, we are guaranteed that our constructed decimal is different from every decimal we’ve listed! Thus despite our assumption that we had a one-to-one correspondence, it isn’t, since we’ve found a point in the continuum for which there is no corresponding natural number. Given such a contradiction, the only conclusion we can draw is that we cannot create a one-to-one correspondence between natural numbers and points in the continuum — no matter how we try, we’ll always end up with extra points in the continuum for which there is no corresponding natural number; that is, there are more points in the continuum than there are natural numbers.

What all of this means is that we have to come to terms with the fact that some infinities are bigger than others. In fact, it gets even worse: some infinities are entirely incompatible with others. This particular catch hides in a slightly over-zealous abstraction of numbers. For the most part we do not differentiate between numbers describing order (2nd position, as opposed to 5th position, etc.) and numbers describing quantity (which is generally the notion of number with which we’ve been dealing). There is a perfectly good reason for this: when it comes to using and manipulating numbers using standard arithmetic, the numbers describing order (so called *ordinal numbers*) behave completely identically to those describing quantity (which we call *cardinal numbers*). As so often happens with abstraction (indeed, it essentially is the core idea of abstraction) if there are no practical differences (at least as far as the practical purposes we care about are concerned) between objects, we simply forget that there are any differences at all. And, indeed, for finite numbers this is a perfectly reasonable thing to do. The catch is that, once we start dealing with infinities, ordinals and cardinals start behaving rather differently — it is no longer safe to consider them the same, or even, for that matter, comparable to one another.

I won’t go into the rather technical theory of transfinite ordinals here, and instead just give you a precis of where the difficulties lie. To start, lets’ introduce some standard mathematical notation, and let ℵ_{0} denote the first infinite *cardinal* (that is, the quantity of natural numbers), and let ω denote the first infinite *ordinal* (that is, the first position reached after we’ve exhausted all finite positions). Now, as we’ve already seen, if we have infinite rooms, we can house an extra guest even if they’re all full; that is, ℵ_{0} + 1 = ℵ_{0}. On the other hand, if we tack on an extra position after ω and all the finite ones, i.e. we have 1st,2nd,3rd,…,ω,ath, then the ath position turns out to be appreciably different to all the positions before it. In other words ω + 1 ≠ ω. Now, whereas with finite numbers where are adding one produces the same result for both ordinals and cardinals, for infinite numbers it makes a huge difference (in one case we simply end up with what we started with, and in the other we end up with something entirely new). It shouldn’t be too hard to see that, from that simple difference, whether you have are dealing with a cardinal or ordinal transfinite number is going to matter for arithmetic operations; we can no longer ignore the difference; cardinals and ordinals have to be considered as quite separate and distinct kinds of objects!

At this point you might be trying to reconcile the fact that ℵ_{0} + 1 = ℵ_{0} with the previously observed fact that there are bigger cardinal infinities. How can we get to a bigger infinity if adding to ℵ_{0} ends up going nowhere? To answer that I’m going to need to discuss power sets. Given a set A (and for now we’ll keep things informal, the finer technicalities of what actually is and is not a set will come further along our road), the *power set* of A is the set of all possible subsets of A. An example will help clarify. If we have a set A = {a, b, c}, then the power set of A is ℙ(A) = {{}, {a}, {b}, {c}, {a,b}, {a,c}, {b,c}, {a,b,c}}. Thus each element of the set ℙ(A) is itself a set, and in particular, a subset of A; note that we consider both the empty set and A itself to be subsets of A. With a little combinatorics you can see that if a set has n elements (where n is finite), then its power set will have 2^{n} elements (to make a subset we have to decide if each element is either in, or out, of the subset, thus we have 2 choices, multiplied together n times, or 2^{n}). The trick is that, using an argument that closely parallels the previous argument showing that there is not a one-to-one correspondence between the natural numbers and points in a continuum, we can show that even if a set has infinite cardinality it’s power set will have a larger cardinality. Thus, borrowing notation from the finite case, ℵ_{0} < 2^{ℵ0}. Using this trick, which applies to any infinite set, we can develop an entire hierarchy of different orders of infinity:

ℵ

_{1}= 2^{ℵ0}< ℵ_{2}= 2^{ℵ1}< ℵ_{3}= 2^{ℵ2}< ⋯

A similar, but different, hierarchy of infinite ordinals also exists (above and beyond the obvious option of simply adding one to an existing infinite ordinal to get a larger one), spiralling ever higher, this time using the concept of tetration***** rather than exponentiation. Contrary to initial expectation, infinities exist in infinite variety. How many infinities are there? We cannot say, on pain of paradox, since such a statement would only reflect back on itself in a vicious circle of contradiction.

While we began with just a hazy view of the infinite, the mists have cleared to reveal a strange and remarkable valley; a transfinite landscape with a veritable zoo of infinities of different kinds and sizes; it is, indeed, a whole new landscape of numbers and possibilities to explore; a hidden valley of the infinite. Running through the middle of the valley is a large river, and if we wade in we will find very deep waters. It is all a matter of asking the right questions. The question we can start with seems innocently simple:

Is the cardinality of points on the continuum bigger, smaller, or the same as ℵ

_{1}= 2^{ℵ0}?

The answer is deceptively complex. It can be established, with a little work, that the number of points in the continuum is not bigger than ℵ_{1}, which leaves us with either smaller, or the same size as ℵ_{1}. From there things get complicated quickly, and mired in a certain degree of technicality, but essentially the result is that the answer “doesn’t matter”. What I mean by that is we may assume that the number of points in the continuum is ℵ_{1} and no problems or contradictions will arise, yet at the same time we can equally well assume that the number of points in the continuum is *strictly less than* ℵ_{1} and still no problems or contradictions arise. Indeed, whether there is any cardinal number between ℵ_{0} and ℵ_{1} falls into this category. There is, in a sense, no “truth” here, merely preference.

This highlights deep facts about mathematics. When our journey began we considered numbers, and fractions, and algebra. Relatively speaking these are fairly simple abstractions, and, more importantly, they are abstractions that we tend to use each and every day (particularly in the case of numbers and fractions). Through a mix of immediate concrete associations due to the relatively low level of abstraction, and the sense reality imbued by constant use and exposure, we tend to think of numbers, fractions, algebra, and even mathematics itself, as something real, fixed, and concrete. That is, we think of mathematics as describing some platonic reality, that the objects it describes, while abstract, have some real existence. It is natural, then, to think that a number between ℵ_{0} and ℵ_{1} either exists, or doesn’t exist, in some real and concrete way — yet that is not how things have worked out. Imagine being told that whether the number five existed or not was quite optional, arithmetic would work just fine either way! We have, in essence, been told that existence is merely a preference, not a reality; “truth” is up for grabs, an option rather than a cold hard absolute.

Going all the way back to the first entry, On Abstraction, things start to get a little clearer however. As long as we view mathematics as a matter of making effective and powerful abstractions from the real world, rather than describing some platonic universe, having a choice of abstraction doesn’t seem so bad. We can choose how to interpret the continuum to suit our needs — indeed, we can even reject transfinite arithmetic and opt for the intuitionist conception of the continuum if we wish; we choose the abstraction that best suits our purposes for the moment. You could view it as little different than choosing to work at the genetic level as a molecular biologist instead of the considering subatomic particles as a physicist would: the level and manner of abstraction matters only with regard to the level and manner of detail you wish to obtain in the way of results. The more layers of abstraction we apply, the greater the chances of running into quandaries and choices; by abstracting away more and more detail, and by piling abstractions upon abstractions, we push further and further into the realm of pure possibility. This has the potential to lead us to strange and confusing trails, but it also gives us the power to see beyond our own limited horizons. In broadening our minds to embrace worlds of possibility we conceive of realities that transcend our conceptions, and probe our own reality in ways far beyond the limits evolution has shackled our perceptions with.

It has been a steep climb, but we have left the plains of ordinary finite numbers far behind us. We’ve crested the peak, and found a world that is strange and new. It gives us a chance to stretch our minds and our conceptions, and to begin to change how we look at the world. Equally importantly, it gives us a foundation for the further climb to come, providing a glimpse of the dance between logic and mathematics that will follow. We passed by the crossroads of unreality some time ago, yet there is still a very long way to go.

* If you are sure of your answer then either you already know a decent amount of transfinite theory, or you’re most likely wrong — don’t be disappointed about being wrong though; the very reason we need clear logical guidelines for reasoning about the infinite is *precisely because* our intuitions are woefully inadequate and misleading.

** The story is usually attributed to David Hilbert, but this is my own spin on it; errors and lack of clarity that may have crept in via the retelling are, of course, mine.

*** People have a tendency to object to this, and other similar claims, such as that 0.99999… is equal to 1. The easiest way to see this simple but slightly unintuitive fact is to note that we should be able to take 0.99999…, move the decimal place right one place, subtract 9, and arrive back at the same value (this is akin to shifting the hotel guests down one room to make a spare room at the front, but in reverse). That is, if x=0.99999… we can say that 10x – 9 = x. A little simple algebraic manipulation quickly yields x=1. This sort of sleight of hand with infinite expansions will also play a role if and when we come to p-adic numbers, and discover that, in that case, negative numbers are really just very big positive numbers!

**** The wary should be asking how we can even have a first element among points in the continuum (keeping in mind, of course, that there are cunning ways of ordering fractions such that there is a first element). This question quickly wades into very deep waters indeed — we can appeal to the Well Ordering Principle, which essentially just asserts that this can be done, or its equivalents such as the Axiom of Choice, or Zorn’s Lemma; all of these are somewhat contentious and tricky. If you’re interested it is well worth doing a little reading about them. We may come to discuss these issues ourselves later when we start to cross back and forth between mathematics and logic further down the road with Topos theory.

***** Tetration is kind of like exponentiation on steroids; or, more accurately, it’s the next layer of abstraction up: whereas the number in exponentiation counts the multiplications to be performed, the number in tetration counts the exponentiations to be performed. Thus the 4th tetration of 3, written ^{4}3 is equal to 3^{(3(33))}, which is 3^{19683}, or really rather remarkably large.

In terms of examples I would like to take a step into the more abstract — rather than dealing with a physical example and determining an abstraction from it, we’ll start with a slightly abstract example and explore from there. The example I have in mind is that of permutations. By a permutation I simply mean a rearrangement of unspecified objects, mapping one position to another. We can view a permutation as a kind of wiring diagram, with the permutation:

meaning that we shift whatever is in position 1 to position 3, whatever is in position 2 to position 1, and whatever is in position 3 to position 2. Hopefully you can see how such rearrangements are essentially what we were doing in Shifting Patterns, but here we aren’t starting out with a specific pattern in mind, but considering all such rearrangements in general.

As before we can combine two rearrangements together to get another. In this case we simply connect one wiring diagram to the next and follow the paths from the top right the way to the bottom. We can then simplify that down to a direct wiring diagram as before.

The first thing to note is that the number of objects we are permuting, or wiring together, matters. If we take, as our first and simplest case, permutations of two items, then we find there are only two permutations: the null permutation where we do nothing (the first item is connected to the first item, and the second item is connected to the second item), and a simple swap where we reverse the items (connect the first item to the second, and the second item to the first). Using the algebraic terms we established previously we end up a two element algebra: let s be the permutation where we swap, and then we have the rule:

ss = ⋅

(swapping the two items, then swapping them again, is the same as doing nothing)

which completely describes our algebra*.

On the other hand, as soon as we consider permutations of three objects we find things get more complicated. There are a total of 6 (that’s 3×2×1) permutations of three objects. If we select two basic permutations appropriately we can generate them all as various combinations of the two. There are, in fact, several different pairs we could select (though the resulting algebra will turn out to be the same, no matter how we do it — the names might change, but the underlying rules will be the same), and I’ve opted for these two

where a swaps the first two elements and leaves the third alone, and b swaps the second two elements, and leaves the first alone. Now as with the permutations of two elements, if we swap a pair, and then swap them again, we end up back where we started, so we can see that we have the following two rules:

aa = ⋅

bb = ⋅

Now, however, we have the possibility of combining together a and b. We already saw that ab results in the first item going to the third place, the second item to the first position, and the third item to the second position:

but if we swap things around to find ba we get

which reverses the situation, with the third item moving the first place, while the first and second items get bumped to second and third place respectively. So at the very least we know that we don’t have commutativity — that is ab≠ba. Instead we find the rule that defines this algebra is as follows:

We can see that this gives us all the permutations by counting up the combinations of a and b that haven’t been ruled out as being reducible to something simpler. We have

- The null permutation: ⋅
- a
- b
- ab
- ba
- aba

and anything with four or more as and bs will be reducible. Why is that? Since aa = ⋅ and bb = ⋅ any sequence will have to alternate as and bs, otherwise we can just cancel down consecutive pairs. On the other hand, if we have a sequence of more than three alternating as and bs then we’ll have a sequence aba or bab that we can convert using the fact that aba = bab, and end up with a pair of consecutive as or bs that we can then cancel down. For example, if we tried to have a sequence of four as and bs like abab, then we can say

abab = a(bab) = a(aba) = (aa)ba = ⋅ba = ba

With a little thought you can see that this sort of procedure can reduce any sequence of four or more as and bs down to one of three or less. So for permutations of three objects we get an algebra that is described by three rules:

aa = ⋅

bb = ⋅

aba = bab

If we were to consider permutations of four items we would have 24 permutations (that’s 4×3×2×1) to deal with, and things would be more complicated yet again. Permutations of five items provide a total of 120 (5×4×3×2×1) permutations, and an even more complicated algebra with yet more subtle and interesting properties.

There are two things that you should take notice of here. The first is that even simple changes to a pattern — as simple as changing the number of items involved — can give rise to very different dynamics. The character of the algebra that arises from permutations of two objects is very different from that of the three object permutation algebra, and four objects is different again. To reiterate the point: different patterns can have surprisingly different and remarkable dynamics. The second thing that you should be noticing is that while we can work with permutations as wiring diagrams and connect them up to see what combinations will result, ultimately everything about the dynamics of the permutations is contained within the algebra we get from it; and the algebra can be described and manipulated using very simple rules. While the pattern provides the algebra, the algebra in turn tells us everything we need to know about the dynamics of the pattern. The advantage of the algebra is that we can reduce the whole problem of patterns to the simple task of manipulating algebraic expressions according to particular rules. By abstracting up to the algebra, we’ve made the problem much easier to think about and manipulate.

Hopefully by this stage you’re developing a feel for how this abstraction process works. With numbers we start with a collection and abstract away all the details save a single property: the quantity. Here we have something a little more complex; we start with a pattern and abstract away as much of the detail as possible, while still retaining some information about the nature of the pattern. That information can be efficiently encoded into a sort of algebra, in the same way that we encode information about quantity into symbols (numbers). The exact nature and rules of the algebra we generate is the information about the pattern that we have kept. Now, numbers allow us to reason about quantity in general via arithmetic, which we can reduce to a game of manipulating symbols. Our abstraction of pattern allows us to reason about patterns via their associated algebra, which we can also reduce to a game of manipulating symbols. We have turned thinking about patterns into a kind of arithmetic; and doing so allows us to be systematic in studying and analysing such patterns.

This, of course, raises the question of why we should be interested in studying and analysing patterns at all. The same question can be asked as to why we should be interested in studying and analysing quantity. The difference is that our culture is steeped in analysis of and use of quantity; we take its usefulness for granted. So let’s step back, and ask why using numbers is useful. As was pointed out in The Slow Road, numbers and quantity are useful because they are everywhere — we can apply quantitative analysis to almost everything (and often do, sometimes even where it isn’t appropriate). It is worth pointing out that patterns and symmetry are every bit as prevalent in the world. All around us things can be described in terms of their patterns. Pick any collection of objects you care to set your eyes upon, and they will form some manner of pattern; perhaps they will only have a trivial symmetry, or perhaps they will have more complex symmetries. The point is that, just like numbers, symmetries are all around us. The study of pattern and symmetry in the manner we’ve been describing is very new however, and this means it hasn’t entered the mainstream consciousness, nor the language, in the same way that numbers have. We don’t describe the world around us in the language of mathematical symmetry, at least not in the same way that we describe the world around us in terms of numbers. Slowly that will change, but it will take a very long time indeed (centuries probably). That means that, in the meantime, the areas to which the language of mathematical symmetry will be applied, and the people who will apply it, will be restricted to those already using advanced mathematical methods. Right now that tends to mean fields such as physics and chemistry.

To give examples of applying our abstraction of pattern to physics and/or chemistry runs the risk of delving into the technical details of those subjects, as well as requiring math that is currently beyond the scope of our discussion. For that reason, you’ll have to forgive me if I gloss over things quite liberally in what follows.

Everything has a pattern, and symmetries associated with that pattern, even if it is just the trivial symmetry. In the case of chemistry the obvious thing to start looking at is molecules. Unsurprisingly, the structure of a molecule has a pattern that depends, to a large extent, on its constituent elements. More interestingly, molecules often have interesting symmetries. We can, using a naive view, picture a molecule as a pattern of coloured balls, not dissimilar to our patterns of coloured marbles discussed in Shifting Patterns. Of course the patterns are now in 3 dimensions rather than 2, and connections between the balls/marbles are important, but the fundamental idea is there. Consider, for instance, the following picture of the ammonium molecule (NH_{4}):

We can, with little trouble, consider the various ways in which we can rearrange the 4 indistinguishable hydrogen atoms and yet keep the underlying structure that makes it an ammonium molecule. That is, using our abstraction of pattern, we can describe an algebra that captures the features that make the pattern of four hydrogen atoms and one nitrogen atom a molecule of ammonium. Furthermore, we can do the same for any other molecule we care to consider. The exact algebra that results will differ from molecule to molecule, with the individual idiosyncrasies of the different algebras describing the the individual idiosyncrasies of the different molecules. To understand the particular nature of the associated algebra is to understand a great deal about the particular nature of the molecule.

It is, of course, possible to do this sort of analysis just by staring at the patterns and never resorting to the sort of abstraction we’ve been discussing. This approach runs the risk of being both haphazard, and superficial. By contrast, working in terms of pattern abstraction algebras affords us the ability to be both comprehensive and systematic in our analysis. Rather than trying to divine properties out of thin air via visual inspection, we take the resulting algebra and, by merely pushing symbols around on a piece of paper, pick apart every last nuance of its behaviour. Indeed, this sort of analysis (which extends to a level of detail in characterising the algebra that we won’t touch on for some time) is now fundamental to understanding much in chemistry, from spectroscopy to crystallography. Similar approaches to patterns and symmetry of particles lead to a variety of important results in quantum physics.

Our world is filled with patterns that are worth analysing with a systematic approach: understanding the peculiarities of the algebras associated to those patterns can tell us a great deal about our world. New applications of this theory to new fields are still regularly occurring. There is a quiet revolution underway that is changing how we see and describe the world, and the abstraction of pattern is at its heart.

* Through this post I will continue to use algebra in a generic sense to describe the symbolic calculus that we can associate to a pattern. This is a reminder that, in mathematics, *algebra* also has a more precise definition that does not apply to the objects under consideration here. I *do not mean* algebra in that more specific sense when using the word here.

We begin with a mere glimpse of what is to come along this road. Still, even this glimpse has been enough to frighten some. Indeed the (potentially apocryphal) tale of the first man to tread this road, a member of the Pythagorean Brotherhood, makes this very clear. The story goes that the insight came to him while travelling by ship on the Aegean. Excited, he explained his cunning proof to the fellow members of the Brotherhood aboard the boat. They were so horrified by the implications that they immediately pitched him overboard, and he drowned. For the secretive Pythagorean Brotherhood, who believed that reality was simply numbers, mathematics was worth killing over.

So what was this truth that the Brotherhood was willing to kill to keep secret? The fact that the square root of 2 is not expressible as a fraction. The proof of this is surprisingly simple, and runs roughly as follows. Let’s presume that √2 can be expressed as a fraction, and so we have numbers n and m such that √2 = n/m. As you may recall from A Fraction of Algebra, a particular fraction is really just a chosen representative of an infinite number of ways of expressing the same idea — we can choose whichever representative we wish. For the purposes of the proof we will assume that we have chosen n/m to be as simple as possible (i.e. there is no common factor that divides both n and m); you may want to verify for yourself that such a thing is always possible (it’s not too hard). Now, using the allowable manipulations of algebra we have:

√2 = n/m

⇒ 2 = n^{2}/m^{2}

⇒ 2m^{2}= n^{2}

Now 2m^{2} is an even number no matter what number m is, so n^{2} must be an even number as well. However, an odd number squared is always odd (again, this is worth verifying yourself if you’re uncertain, again it isn’t hard). That means the only way n^{2} can be even is if n itself is even. That means there must be some number x such that 2x = n. But then

2m

^{2}= n^{2}

⇒ 2m^{2}= (2x)^{2}

⇒ 2m^{2}= 4x^{2}

⇒ m^{2}= 2x^{2}

and so m^{2}, and hence m, must also be even. If both n and m are even then they have a common factor: 2; yet we specifically chose n and m so that wouldn’t be the case. Clearly, then, no such n and m exist, and we simply can’t express √2 as a fraction!*

This result (if not necessarily the proof) is well known these days; sufficiently so that many people take it for granted. It is therefore worth probing a little deeper to see what it actually means, and perhaps gain a better understanding of why it so incensed the Pythagorean Brotherhood. The first point to note is that √2 does crop up in geometry: if you draw a square with sides of unit length (and we can always choose our units such that this is so) then, by Pythagoras’ Theorem, the diagonal of the square has length √2. That, by itself, is not necessarily troubling; but consider that we’ve just seen that √2 is not expressible as a fraction. Recall that a fraction can be considered a re-interpretation of the basic unit, and you see that what we’re really saying is that there simply doesn’t exist a unit of length such that the diagonal of the square can be measured with respect to it. If you were measuring a length in feet and found that it was between 2 and 3 feet then you could simply change your units and work in inches — the distance is hopefully an integer number of inches. If inches aren’t accurate enough we can just use a smaller unit again (eighths of an inch for example). What we are saying when we say that √2 cannot be expressed as a fraction is that, no matter how small a unit we choose, we can still *never* accurately measure the diagonal of the square. Because we can simply keep dividing indefinitely to get smaller and smaller units, that means we need *infinitely* small units. And note the difference here:, unlike in Part I, arbitrarily small is not good enough, we need to go past arbitrarily small to actually infinitely small. For the Pythagoreans infinity was unreachable — something that could never be completed or achieved — and thus an infinitely small unit could never be realised. Therefore, in their world-view, the diagonal of a square couldn’t exist since its length was an unreachable, unattainable, distance**. That, as you an imagine, caused quite a bit of cognitive dissonance! Hence their desire to pretend such a thing never happened.

As you can see, it turns out (even though it may not have looked that way at first) that we are really butting our heads up against infinity again, just from a different direction this time. Things get worse however: if we had a line of length 2 then there surely exists a point somewhere along that line that is a distance of √2 away from the origin. We have just seen, however, that such a distance is not one we can deal with in terms of fractions. If we were to put points at every possible fractional distance between 0 and 2 we would have a hole at √2, and continuous lines don’t have holes in them. A new problem starts to raise its head.

If we wish to have a continuum we have to fill in all the holes. The question is how we can do that — where exactly are the holes? And, for that matter, how many holes are there? The first of these questions turns out to be rather easier than the second (which we will address next time we venture down this fork of the road). The trick to finding holes is to note that, since fractions allow us the arbitrarily (if not infinitely) small, we can get arbitrarily close to any point in the continuum, holes included. That is, while we can’t actually express a hole in terms of fractions, we can sidle up as close beside it as we like using only fractions. And that means we must reach again for the useful tools of *distance* and *convergence* to determine that we are getting closer and closer to, and hence converging to, a hole.

For our current purposes the definition of distance between numbers defined in Part I will be sufficient. What we want to do is figure out a way to ensure that a sequence of fractions converges — that is, that it gets closer and closer to *something*, without necessarily knowing what the something is. The trick to this is to require that the distance between different terms in the sequence gets smaller and smaller. In this way we can slowly but surely squeeze tighter and tighter about a limit point, without necessarily knowing what it is that we are netting. More formally, if we have an infinite sequence S_{1}, S_{2}, S_{3}, … then we require that for any ε > 0 there exists an integer N≥1 such that, for all m, n>N ,

|S

_{m}− S_{n}|<ε

(recalling that |x − y| gives the distance between numbers x and y). Such a sequence is called a *Cauchy sequence*. Now, since any Cauchy sequence converges to something, we can identify (consider equivalent) the sequence and the point at its limit. Furthermore, since we know that using fractions we can get arbitrarily close to any point on the continuum, there must be some sequence of fractions that converges to that point, and so if we consider all the possible infinite Cauchy sequences of fractions, we can cover all the points on the continuum — we are assured that no holes or gaps can slip in this time. We’ve caught all the holes — without even having to find them!

It is worth looking at an example: can we find a sequence of fractions converges to √2? Consider the decimal expansion of √2 which starts out 1.41421… and continues on without any discernible pattern; clearly the sequence 1, 1.4, 1.41, 1.414, 1.4142, 1.41421, … (where the nth term agrees with √2 for the first n−1 decimal places) converges to √2. More importantly each term can be rewritten as a fraction since each term has only finitely many non-zero decimal places; for example 1.4 = 14/10 and 1.4142 = 14142/10000 etc. Finally it is not hard to see that this sequence is a Cauchy sequence. We can do the same trick for any other decimal expansion, arriving at a Cauchy sequence that converges to the point in question. Of course there are many other Cauchy sequences of fractions that will converge to these values: we are dealing with something similar to our dilemma with fractions when we found that there were an infinite number of different pairs of natural numbers that all described the same fraction. In that case we simply selected a particular representative pair that was convenient (and could change between different pairs that represented the same fraction if it was later convenient to think of the fraction that way). We can do the same here: noting that a point is described by an infinite number of Cauchy sequences, we can simply select a convenient representative sequence to describe the point. For our purposes the sequence constructed via the decimal expansion will do nicely — in some sense you can think of the Cauchy sequence as an infinite decimal expansion.

Now that we at least have some idea of what these sequences might look like, it is time to take a step back and consider what is actually going on here. Back in The Slow Road we constructed natural numbers as a property of collections of objects. Then, in A Fraction of Algebra, we created fractions to allow us to re-interpret an object within a collection. This was another layer of abstraction — fractions were not really numbers in the same way that natural numbers were — fractions were a way of re-interpreting collections, and we could describe those re-interpretations by pairs of natural numbers. Perhaps rather providentially it turned out that the rules of algebra, the rules of arithmetic that were true no matter what natural numbers we chose, also happened to be true no matter what fractions we chose. It is this stroke of good fortune, combined with the fact that certain fractions can take the role of the natural numbers, that allows us to treat what are really quite different things in principle (fractions and natural numbers) as the same thing in practice: for practical purposes we usually simply consider natural numbers and fractions as “numbers” and don’t notice that, at heart, they are fundamentally different concepts. Now we are about to add a new layer of abstraction, built atop fractions, to allow us to describe points in a continuum. While all that was required to describe the re-interpretation of objects that constituted a fraction was a pair of numbers, points in a continuum can only*** be described by an infinite Cauchy sequence of fractions. Thus, in the same way that natural numbers and fractions are actually very different object, so fractions and points in a continuum are quite different. Again, however, we find that when we define arithmetic on sequences (which occurs in the obvious natural way) they all behave appropriately under our algebraic rules. When we consider that it is easy enough to find sequences that behave as fractions (any constant sequence for instance) it is clear that, again, for practical purposes, we can call these things numbers and assume we’re talking about the same thing regardless of whether we are actually dealing with natural numbers, fractions, or points in a continuum.

It should be pointed out that sometimes these distinctions are actually important. A simple example is computer programming, which does bother to distinguish floating point numbers (ultimately fractions) from integers. You can usually convert or cast from one to the other via a function (and at times that function can be implicit), but the distinction is important. Later we will start getting into mathematics where the distinction becomes important.

So, now that we have this construction, several layers of abstraction deep, that allows us to describe the continuum, does it resolve the problem the Pythagorean Brotherhood struggled with? Certainly within the continuum there is a point corresponding to √2, but even with our construction it is the *limit* of an infinite sequence — we still require a completed infinity. Of course accepting the idea of a completed infinite would get us out of this conundrum; what we require is a coherent theory of the completed infinite — were we to have that, then we needn’t fear the idea as the Pythagorean Brotherhood did. The next time we venture along this particular road we will discuss just such a theory, and explore the remarkable transfinite landscape that it leads to. We would be remiss to conclude here, however, without noting that there is some dissent on this topic. While the theory of the continuum based on completed infinites we will cover is remarkably widely accepted and used, there are still those who do not wish to have to deal with the completed infinite. So what is the alternative?

The idea is to construct a continuum using infinitesimals: a number ε such that we have ε^{2}=0, yet ε≠0. Using such a value we can create a continuum without holes as desired. If adding a seemingly arbitrary new element to the number system seems like cheating, remember that both fractions, and the infinite decimals via Cauchy sequences, are just as much artificial additions to the number sequence — they just happen to be ones we’re familiar with and now take for granted. The real dilemma is that, assuming the required properties of infinitesimals, we can deduce contradictions. As we noted at the start of this post, when a mathematical argument leads you somewhere you don’t wish to go you are left having to challenge the very foundations of logic itself. Surprisingly, that turns out to be the resolution: smooth infinitesimal analysis rejects the law of the excluded middle. The logic used for this alternative conception of the continuum rejects the idea that, given a proposition, either the proposition is true, or its negation is true. That means that saying that x=y is not true, does *not* mean that x≠y. This sounds like nonsense at first, because we generally take the law of excluded middle for granted, and it is ingrained in our thinking. We have to remember, however, that this is a theory dealing in *potential*, but not completed infinites, and it is that key word “potential” that helps clarify things. Consider two numbers x and y that have infinite decimal expansions; are they equal, or are they unequal? We can check the first decimal place, and they might agree; that does not mean they are equal, they might disagree further down; nor does it mean they are unequal, since they might indeed agree. We can check the first billion decimal places, and they might still agree; that does not mean they are equal, since it might be the billion and first decimal place at which they disagree; and yet we still can’t conclude they are unequal — they’ve agreed so far and could continue to do so. We even check the first 10^{20} decimal places, and still we can’t conclude either way whether x and y are equal or unequal. Because we can never complete the infinity and check *all* the decimal places, unless we have more information (such as that both number are integers****), it is not possible to conclude either way — we have an in between state where the numbers are neither equal, nor unequal, and it is this in-between possibility that causes the law of excluded middle to fall apart. To say that x=y is not true simply means we have not yet concluded that x=y, but that does not require that we must have concluded x≠y since we may still be torn in between, unable to reach a conclusion. As strange as this sounds at first, it actually provides a surprisingly natural and intuitive model of the continuum — and a remarkably different one from the classical one we will be developing. Enough sidetracks, however; it is time to return to the path.

We have rounded the bend, and can make out the rough expanse of the landscape below, but the land itself remains unexplored, and potentially quite alien. The next time we return to this road we’ll try and understand the implications of a continuum of completed infinites, including a variety of initially unsettling results. In the meantime, however, we will return the study of patterns and symmetry, and try and build a robust theory from our simple examples.

* This is, of course, a precis version of the proof. The devil is always in the details, and many details here have been glossed over as obvious, or left for the reader to verify. If you are interested in the nitty-gritty however, I recommend you try the Metamath proof of the irrationality of √2. Each and every step in the proof is referenced and linked to an earlier theorem previously proved. By following the links you can drill all the way down to fundamental axioms of logic and set theory. If you don’t care to follow the details yourself, you might note that, in this (extremely) explicit form, the proof can be machine verified.

** If you think you can get out of this by just starting with the diagonal as your unit of measure you will simply find that now the sides of the square are unmeasurable distances. The sides and diagonal of the square are *incommensurable* — we can’t measure both with the same units, no matter how fine a unit of measure we choose.

*** Technically other methods of describing such points exist. Indeed a very common formal approach is Dedekind cuts. Ultimately, however, Dedekind cuts represent more detail than we need right now, and will serve more as a distraction than anything. The interested reader is, however, encouraged to investigate them, and puzzle out why I chose to go with Cauchy sequences here.

**** Why does knowing that both numbers are integers help? because integers have fixed decimal expansions – the first decimal place is necessarily the same as all the rest: 0. As long as they agree up to the first decimal place, we are done. An exercise for the interested reader: what other knowledge about the numbers might allow us to conclude equality or inequality?

akaaka to

hi wa tsurenaku mo

aki no kaze

How hot the sun glows,

Pretending not to notice

An autumn wind blows!*

— Matsuo Basho

What is a haiku? Or, more specifically, what makes a particular composition a haiku, as opposed to one of the many other poetic forms? The defining feature most people will be familiar with is the 5-7-5 syllable structure. Within that basic structure, of course, the possibilities are almost endless, and this is what makes haiku so tantalizing to write: you can shift the words and syllables around to craft your message, and as long as you retain the classic 5-7-5 syllable structure you can still call your work a haiku**.

This is not an isolated trait. We constantly define, and categorise, and classify, according to patterns. We determine a basic pattern, an underlying structure, and then classify anything consistent with that structure accordingly. This is our natural talent for abstraction at work again, seeking underlying patterns and structure, and mentally grouping together everything that possesses that structure. It is the means by which we partition and cope with the chaotic diversity of the world. And yet, despite our natural talent for this, it wasn’t until the last couple of centuries that we had any treatment for this sort of abstraction comparable to our use of numbers to formalise quantity.

Since, unlike numbers, very few people have had the requisite abstractions drilled into them from a young age, we will have to go a little more slowly, and try and tease out the details. The first point to address is that fact that we have been very vague. It is certainly true that we find patterns, and classify things according to whether they preserve the pattern or not, but the very concept of a pattern is itself only very loosely sketched: we are hiding a lot of detail in words like “pattern” and “structure”. The best way to come to grips with this is to start with very simple examples for which we can agree on what we mean by pattern, and see if we can’t build up an abstraction from there.

Let’s consider an arrangement of coloured marbles (red, green, and blue) that looks like this:

and agree (hopefully) that by the “pattern” here, we mean the specific triangular layout with the colours arranged just so (two blue marbles in the top corners, a small triangle of green marbles, and a red marble at the bottom corner). We are interested in other arrangements of marbles that also have that pattern. That might sound like an impossible task since the only way to lay out marbles such that they are in that pattern is to lay out the marbles exactly as shown… there are no other ways, right? Not exactly, no. You see each green marble is different, so we could swap a couple of the green marbles; the marbles would then be laid out differently (we have put specific marbles in different places) but the pattern of colours has remained the same. It helps if we label the marbles like so:

and then we can see that this rearrangement of marbles still preserves the pattern of colours:

So what happened here? It may help to think in terms of the actions we need to take to go from the initial arrangement to the new rearranged version. We swapped the blue marbles, and rotated the green marbles around in a circle:

The trick now is to notice that, as long as we are thinking in terms of forming a rearrangement by interchanging marbles, any rearrangement that preserves the pattern works even more generally. That is, if we started with a different initial arrangement of the marbles like this:

then making the same interchanges of marbles as before (swapping the blue marbles, and cycling the three green marbles) will preserve this pattern as well. Of course if we were to add more marbles, or take some away (and thus alter our numbering scheme) things would once again get more complicated. Still, by thinking in terms of interchanging items we have managed to generalise across a wide variety of particular patterns. We should be taking that as a hint that this particular line of thinking is worth investigating further.

What we are seeing is that if we think if terms of rearrangements that preserve the *internal relationships* that make up a particular pattern, then those rearrangements will continue to preserve those same internal relationships for any other pattern that has them. In our case with the marbles the internal relationships were defined by which marbles we could tell apart from one another — that is, which marbles were the same colour. If we had swapped a red and green marble we would have broken the pattern; and that would have happened had we done so with the triangular arrangement, or the rectangular one. As long as we work in terms of rearrangements that refer to swapping marbles we can generalise over all the different particular spatial patterns at once. Don’t worry if that isn’t sinking in yet, there’s another example coming up. In the meantime, however, I want you to notice that the “rotation” of the green marbles can also be achieved by simply swapping marbles 2 and 5, and then swapping marbles 2 and 4 — try it out yourself. This sort of decomposition fo rearrangements will prove important.

Our next example, to try and get a feel for things, is a square. We’re interested all the different things we can do to the square that will have it end up looking the same we started (symmetries of the square, if you want to think of it that way). If you’re still feeling a little lost with all of this it might help to cut out a square of paper to manipulate yourself as you follow along. First, just as we did with the marbles, we’re going to label the square so we can keep track of what we’re doing — in this case we’re going to number the corners (it will probably be helpful to do this on your square of paper if you have one):

What we want to do is find all the different manipulations of the square that result in a square in exactly the position we started with, and we’ll keep track of the different manipulations by how they move the labels in the corners. With a bit of experimentation you’ll quickly find that we have three rotations like so:

and we can flip the square across four different axes like so:

and that’s all we can do; for example, if we tried just swapping corners 1 and 2 we would end up with something that isn’t a square anymore:

How is this similar to our example with marbles? In the same way that we found new arrangements of marbles by swapping marbles around, we are finding new arrangements for the corners of the square. With the marbles we were concerned about the internal relationships formed by the different colours (and our ability to distinguish marbles of different colour, but not marbles of the same colour). With the square the internal relationships are formed by adjacency relations of the corners; that is, we require, for instance, that the corner 1 is always between corners 4 and 2 and opposite to 3; similarly the corner 2 is always between corners 1 and 3, and opposite to 4. Thus swapping just corners 1 and 2, for example, results in the corner 1 being between 2 and 3, and hence breaking the internal relationship. What determines a pattern is how internal sub-objects relate to one another. What determines a different arrangement that preserves a pattern is whether that arrangement preserves those inter-relationships.

There is more that we can exploit with this example however. As with the marbles example, we can decompose complex rearrangements in terms of simpler ones. Let’s consider just two rearrangements of the square: a rotation by 90 degrees, and a flip through the vertical axis, which we’ll refer to by the letters r and f:

Through combination of just these two rearrangements we can produce all seven possible pattern preserving rearrangements; for example if we first flip through vertical axis, and then rotate by 90 degrees (which we will shorthand to fr for a flip followed by a rotation) then the resulting arrangement is the same as flipping about the diagonal through corners 2 and 4.

Our seven rearrangements turn out to decompose as follows:

- Rotation by 90 degrees: r
- Rotation by 180 degrees: rr
- Rotation by 270 degrees: rrr
- Flip about vertical axis: f
- Flip about horizontal axis: frr
- Flip about leading diagonal axis: frrr
- Flip about trailing diagonal axis: fr

More importantly, *any* combination of flips and rotations will still result in a rearrangement that preserves the square, since each individual flip and rotation along the way will preserve the square. That means, for instance, that the sequence of flips and rotations frrfrfrrr should correspond to one of these seven possibilities (or simply do nothing at all), but which one? Equally, what happens when we rotate by 270 degrees, then flip about the leading diagonal and rotate by a further 90 degrees? This turns out to be surprisingly easy (no playing with paper squares is required).

A first point to notice is that two consecutive flips (ff) is the same as doing nothing — we end up with our original arrangement. The same happens with four consecutive rotations (rrrr). Letting the symbol ⋅ stand for the null rearrangement of doing nothing, we can write these rules as

ff = ⋅

rrrr = ⋅

The last observation we need is that a rotation followed by a flip (rf) results in the same rearrangement as a flip followed by three rotations (frrr); that is

rf = frrr

We can put these rules together to completely understand any possible combination of flips and rotations.

At this point you should be noticing that things are looking a lot less like geometry and a lot more like algebra. This is a different sort of algebra altogether however. Previously, we developed algebra by letting a letter stand in for any possible number; something we could do because we had determined which arithmetic rules were true regardless of which particular numbers were used. Here we have letters standing not for numbers, but for rearrangements. The result is that the arithmetic rules look very different. When we were abstracting numbers we had the commutative law that x×y = y×x; here we find that isn’t true at all: instead of rf=fr we have rf = frrr. We do have, however, exactly what algebra offered us for numbers: a set of rules for what operations we can perform. In this case we know that we can use the fact that rf = frrr to steadily move all the rs to the right of any fs. That means we can rearrange any sequence of flips and rotations so that all the fs are together on the left, and all the rs are together on the right. Then all we have to do is use the other two rules to cancel down the fs and rs. We can have either 0 or 1 consecutive fs followed by 0, 1, 2, or 3 consecutive rs. A quick scan of our decomposition of seven rearrangements will show these cover all such possibilities (except the null case of 0 fs and 0 rs) .

This is perhaps best illustrated with an example, so lets consider our complex sequence of flips and rotations given by frrfrfrrr. We have

frrf(rf)rrr = frrf(frrr)rrr = frr(ff)(rrrr)rr = frr⋅⋅rr = f(rrrr) = f⋅ = f

So the end result is identical to a simple flip about the horizontal axis. Similarly, our other question, what happens if we rotate by 270 degrees, then flip about the leading diagonal and rotate by a further 90 degrees, can be resolved easily by expressing those complex rearrangements in their decomposed form and simplifying according to the rules:

(rrr)(frrr)(r) = rrrf(rrrr) = rrrf⋅ = rr(rf) = rr(frrr) = r(rf)rrr = r(frrr)rrr = (rf)(rrrr)rr = (frrr)⋅rr = f(rrrr)r = f⋅r = fr

which is a flip about the trailing diagonal.

What we have here is an algebra for the symmetries of a square. In this algebra letters symbolise not numbers, but rearrangements of the corners of a square, and as a result the rules of this algebra are quite different. Were we to perform a similar analysis for the rearrangements of marbles in our earlier example, we would find 11 rearrangements (plus the null rearrangement that does nothing), with three base rearrangements, and a different set of rules again. I leave the determination of these rules as an exercise for the interested reader. Indeed each distinct pattern (that is, each distinct set of internal relationships between some set of sub-objects) will have its own set of rules, and its own associated algebra. Our world is filled with patterns, and each and every such pattern has its own algebra describing how objects within the pattern can be rearranged while preserving that pattern. A whole new world begins to open up before us: what are all the possible algebras***, and what patterns do they describe? Are there different sets of rules that produce the same algebras, and if so, how can we tell?

Those questions, and a fuller exploration of this rich world which we have only just glimpsed here, will have to wait however. Next time we will return to the continuum, and continue to try and unravel the many paradoxes that surround it.

* Translation by Dorothy Britton, Haiku Journey: Basho’s Narrow Road to a Far Province, Kodansha International, 1974.

** In practice Japanese haiku have rather more subtle demands, and are both more, and less flexible than this; this example is more for illustrative purposes.

*** Note that I am using algebra here in an informal sense — there is a strict mathematical sense which is quite different.

The ideas of succession and repetition are fairly fundamental, and are apparent in nature in myriad ways. For example, the cycle of day and night repeats, leading to a succession of different days. Every such series of successive events is, in our experience, bounded — it only extends so far; up to the present moment. Of course such a series of events can extend back to our earliest memories. Via the collective memory of a society, passed down through written or oral records, it can even extend back to well before we were born. Thus, looking back into the past, we come to be aware of series of successive events of vastly varying, though always bounded, length. We can then, at least by suitable juxtaposition of a negation, form the concept of a sequence of succession that does *not* have a bound. And thus arises the concept of infinity. Is the concept coherent? Does succession without bound make any sense? With this conception of infinity it is hard to say, for we have only really said it is a thing without a bound. We have said what property infinity does not have, but we have said little about what properties it does have.

Indeed, despite the basic concept of infinity extending back at least as far as ancient Greece, whether infinity is a coherent concept has been a point of bitter debate, with no significant progress made until as recently as the end of the 19th century. Even now, despite having a fairly well grounded definition and theory for transfinite numbers, there is room for contention and differing conceptions of infinity, and in particular of the continuum. Such modern debate divides over subtle issues which we will come to in due course. First, however, it will be educational to look at some of the more straightforward reasons that people have difficulty contemplating infinity: the apparent paradoxes and contradictions that arise.

Some of the earliest apparent paradoxes that involve the infinite are from ancient Greece. Among the more well known are the “paradoxes” proposed by Zeno of Elea. Interestingly Zeno’s paradoxes (of which there are three) were not originally intended to discredit the concept of infinity — on the contrary they assume the coherency of infinity as a concept to make their point. Zeno was a student of Parmenides, who held that the universe was actually a static unchanging unity. Zeno’s paradoxes were intended to demonstrate that motion, and change, are actually just illusions. The paradoxes have, however, come to be associated with the paradoxical nature of the infinite.

The first of Zeno’s paradoxes, the *Dichotomy*, essentially runs as follows: Before a moving body can reach a given point it must traverse half the distance to that point, and before it can reach that halfway point it must traverse half of *that* distance (or one quarter of the distance to the end point), and so on. Such division of distance can occur indefinitely, however, so to get from a starting point to anywhere else the body must traverse an infinite number of smaller distances — and surely an infinite number of tasks cannot be completed in a finite period of time?

The second paradox, the most well known of the three, is about a race between Achilles and a tortoise, in which the tortoise is granted a head start. Zeno points out that, by the time Achilles reaches the point where the tortoise started, the tortoise will have moved ahead a small distance. By the time Achilles catches up to that point, the tortoise will again have moved ahead. This process, with the tortoise moving ahead smaller and smaller distances, can obviously occur an infinite number of times. Again we are faced with the difficulty of completing an infinite number of tasks. Thus Achilles will never overtake the tortoise!

The third paradox, the *Arrow*, raises more subtle questions regarding the continuum, so I will delay discussion of it until later. Taken together the paradoxes were supposed to show that motion is paradoxical and impossible. Few people are actually convinced, however: everyday experience contradicts the results that the paradoxes claim. The common reaction is more along the lines of “Okay, sure. What’s the trick?”. The “trick” is actually relatively subtle, and while rough and ready explanations can be given by talking about *convergent series*, it is worth actually parsing out the fine details here (as we’ve seen in the past, the devil is often in the details), as it will go a long way toward informing our ideas about infinity and continuity.

Let us tackle the *Dichotomy* first. To ease the arithmetic, let us assume that the moving body in question is traversing an interval of unit length (which we can always do, since we are at liberty to choose what distance we consider to be our base unit), and that it is travelling at a constant speed. We can show that, contrary to Zeno’s claim, the object can traverse this distance in some unit length of time (again, a matter of simply choosing an appropriate base unit) despite having to traverse an infinite number of shorter distances along the way. To see this, consider that, since the body is travelling at a constant speed, it would have to cover a distance of 1/2 in a time of 1/2, and before that it would cover a distance of 1/4 in a time of only 1/4, and so on. The key to resolving this is that the infinite sum 1/2 + 1/4 + 1/8 + 1/16 + … is equal to 1, and thus the infinite tasks can, indeed, be completed in finite time. This tends to be the point where most explanations stop, possibly with a little hand-waving and vague geometric argument about progressively cutting up a unit length. It is at this point, however, that our discussion really begins. You *can* make intuitive arguments as to why the sum turns out to be 1, but, given that we weren’t even that clear about what 1 + 1 = 2 means, a little more caution may be in order — particularly given that infinity is something completely outside our practical experience, so our intuitions about it are hardly trustworthy.

Since we can’t trust our intuitions about infinite sums yet, it seems sensible that we should look at finite sums instead. Certainly we can calculate the sum 1/2 + 1/4 = 3/4, and 1/2 + 1/4 + 1/8 = 7/8, and so on. Each of these sums will, in turn, give a slightly better approximation of the infinite sum we wish to calculate; the more terms we add, the better the approximation. The obvious thing to do, then, is to consider this sequence of ever more accurate approximations and see if we can say anything sensible about it. To save myself some writing I will use S_{n} to denote the sum 1/2 + 1/4 + 1/8 + … + 1/2^{n} (thus S_{2} = 1/2 + 1/4 and S_{4} = 1/2 + 1/4 + 1/8 + 1/16, and so on), and talk about the *sequence of partial sums* S_{1}, S_{2}, S_{3}, …

It may not seem that we’ve made much improvement, having shifted from summing up an infinite number of terms to considering an infinite sequence of sums, but surprisingly infinite sequences are easier to deal with than infinite sums — and we at least only have finite sums to deal with now. The trick from here is to deal with the n^{th} term of the sequence for values of n that are finite, but arbitrarily large. That means we get to work with finite sums (since for any finite n, S_{n} is a finite sum) which we can understand, but at the same time have no bound on how large n can be, which brings us into contact with the infinite. In a sense we are building a bridge from the finite to the infinite: any given case is *finite*, but which term the case deals with is *without bound*. Before we can get to the arbitrarily large, however, we must first deal with the arbitrarily small.

In some ways it was the arbitrarily small that lead to this problem — the paradox is founded on the presumption that the process of dividing in half can go on indefinitely, resulting in arbitrarily small distances to be traversed. It is precisely this property of infinite divisibility that is a necessary feature of the idea of a continuum: something without breaks or jumps. The opposite of the continuous is the discrete; a discrete set of objects can only be divided into the finest granularity provided by the discrete parts, since any further “division” would involve a reinterpretation of what constitutes an object. In presuming indefinite divisibility we have moved away from discrete collections of objects, and into the realm of continuous things. In the world of the continuous we may talk about the arbitrarily small (a result of arbitrarily *many* divisions — note the relationship between the infinite and the continuous). What we are really after is a concept of convergence; the idea that as we move further along the sequence we get closer and closer, and eventually converge to, some particular value. That is, we want to be able to say that, by looking far enough along the sequence we can end up *an arbitrarily small distance away* from some particular value that the sequence is converging to. This, in turn, leads us to the next concept: distance.

We need to be careful here because while the original problem was about a moving object covering a certain distance in the real world, we have abstracted away these details so as to have a problem solely about sequences of numbers. That means we are no longer dealing with practical physical distance, but an abstract concept of *distance between numbers*. So what does it mean for one number to be “close” to another? We need a concrete definition rather than vague intuition if we are to proceed. Since numbers are purely abstract objects we could, in theory, have “close” mean whatever we choose. There is a catch, however: when talking about numbers we generally assume that they are ordered in a particular way. For example, when arriving at rules for algebra we included rules for ordering numbers. This implicit ordering defines “closeness” in the sense that we would like to think that x < y < z means that y is “closer” to z than x is. Looking back at the rules regarding ordering we find that this means that the closer z − y is to 0, the closer y is to z. That’s really just saying that the smaller the difference between y and z, the smaller the distance between them, and so the definition of distance we need is the difference between y and z! The final catch is that we would like to be able to consider the distance from z to y to be the same as the distance from y to z, but z − y = −(y − z). The solution is simply to say that the direction of measurement, and hence the sign of the result, is irrelevant and take the absolute value to get:

The distance between y and z is |y − z|.

As a momentary aside, it is worth noting that we have defined a distance between numbers to be another number, but that the number that defines the distance is, in some sense, not the same type of number. The number defining the distance is a higher level of abstraction, since it is a number describing a property of abstract objects, while the numbers that we are measuring distance between are describing concrete reality. For the most part these differences don’t matter — numbers are numbers and all behave the same — but as we move deeper into the philosophy of mathematics teasing apart these subtleties will be important. Now, back to the problem at hand…

It is time to put the power of algebra — the ability to work with a number without having to specify exactly which number it is — to use. Let ε be some non-zero positive number, without specifying exactly what number (I’m using ε because it is the traditional choice among mathematicians to denote a number that we would like to presume is very small — that is, very close to zero). Then I can choose N to be a number large enough that 2^{N} is bigger than 1/ε, and hence 1/2^{N} is less than ε. Exactly how big N will have to be will depend on how small ε is, but since there is no bound on how big N can be, we can always find a big enough N no matter how small ε turns out to be. Now, if we note that, for any n, S_{n} = (2^{n}−1)/2^{n} (which you can verify for yourself fairly easily) then, if we assume that n is bigger than N, we find that the distance between 1 and S_{n} is:

|1 − S

_{n}| = |2^{n}/2^{n}− (2^{n}−1)/2^{n}| = |1/2^{n}| < |1/2^{N}| < ε.

That may not look that profound because it is buried in a certain amount of algebra, but we are actually saying a lot. The main point here is that ε was any non-zero positive number — it can be as small as we like; arbitrarily small even. Therefore, what we’ve just said is that we can always find a number (which we denoted N) large enough that every term after the N^{th} term is *arbitrarily close to 1*. That is, by going far enough down the sequence of partial sums (and there are infinitely many terms, so we can go as far as we like), we can reach a point where all the subsequent terms are as close to 1 as we like. This is what we mean when we say that a sequence converges. We have shown that the further along the sequence you go, the closer and closer you get to 1. It follows then, due to the way the sequence was constructed by progressively adding more terms to the sum, that the more terms of the sum we add together, the closer the sum gets to one. There is no limit on how close to 1 we can get, since there is no upper limit on the number of terms we can add. In this sense the infinite sum (which has no bound on the number of terms) is equal to 1 (since we are infinitesimally close to 1 by this point).

The key points here were the ideas of distance between numbers, and of convergence, which lets us show in concrete terms that we can end up an arbitrarily small distance away from our intended target, just by looking far enough (and we can look arbitrarily far) along a sequence. These ideas — of defining abstract distance, and of convergence as defined in terms of that distance — will continue to be increasingly important as we progress down this road.

Zeno’s second paradox, about Achilles and the tortoise, can be tackled in a similar manner. Once we abstract away the details of the problem and arrive at the question of whether we can sum together all the times for each ever smaller distance that Achilles must run to catch the tortoise, we find that the same basic tools, involving sequences of partial sums, and convergence, will yield the same kind or result — Achilles will overtake the tortoise in a finite period of time. I leave the proof, and the determination of how long it will take Achilles, as an exercise to the reader. So we have resolved two of Zeno’s paradoxes; in so doing, however, we have developed a much richer theory. I would like to pause and ask you to contemplate what we’ve actually done here. It is easy to get mired in the details, but the bigger picture is truly remarkable. Through the concept of convergence we have built a bridge between the finite and the infinite, between the discrete and the continuous. Convergence provides a tool that allows us to extend our concrete reasoning about the finite and the discrete, step by inexorable step, into the realm of the infinite and the continuous. It is a tool that allows us push out the boundaries of what we can reason about from restricted and mundane confines of everyday experience to the very limits of possibility and beyond: we can reason about a lack a bounds!

When we next deal with this stretch of road we will continue to develop our understanding of the continuum, and the infinite. Next, however, we will start down a different road, and consider other basic abstractions of a finite collection.

]]>Alice came to a fork in the road. “Which road do I take?” she asked.

“Where do you want to go?” responded the Cheshire cat.

“I don’t know,” Alice answered.

“Then,” said the cat, “it doesn’t matter.”

— Lewis Carroll, Alice’s Adventures in Wonderland

In the later years of his life, after his journey to the interior, Basho lived in a small abandoned thatched hut near lake Biwa that he described as being “at the crossroads of unreality”*. Now, still early in our journey, we have come to our own crossroads of unreality. We are caught between dichotomies of unreal, abstract, objects. One road leads to consideration of finite collections, and properties of composition (the algebraic properties 1 through 5 from the previous entry); the other road leads to the continuum and questions of ordering and inter-relationship (properties 7 through 10 from the previous entry). The first road will lead to a new fundamental abstraction from finite collections, different from, and yet as important as, the abstraction that we call numbers; this way lies group theory and the language of symmetry that has come to underlie so much of modern mathematics and physics. The second road will lead to deep questions about the nature of reality, and, brushing past calculus along the way, lead to a new and minimalist interpretation of a continuous space through the concept of topology.

Which road do we take? As the cat said to Alice, It doesn’t matter. We are at the crossroads of unreality, and the usual rules need not apply. Which road do we take? Both.

* From the translation of *Genjûan no fu* by Donald Keene, in Anthology of Japanese Literature.

There is a reason that these subjects give people pause when they first encounter them, and that is, quite simply, that they are difficult. They are difficult in that they represent another order of abstraction. Both fractions and elementary algebra must be built from, or abstracted from, the basic concept of numbers. Because of the sheer prevalence of numbers and counting in our lives from practically the moment we are born, people quickly develop a feel for this first, albeit dramatic, abstraction. It is when people encounter the next step, the next layer of abstraction, in the form of fractions and/or algebra, that they have to actively stretch their minds to embrace a significant abstraction for the first time. Most of us, having won this battle long ago, struggle to see the problem in hindsight — we might recall that we had trouble with the subject when we were younger, but would have a hard time saying why. We have developed the same sort of intuitive feel for fractions and algebra as we have for numbers and have forgotten that this is hard won knowledge.

I want to begin with fractions because, ultimately, it is by far the easier of the two — being only a semi-abstraction — and will provide an example of the process as background for stepping up to elementary algebra.

As was noted in the last entry, the complexity of mathematics begins to open up once we pass from considering numbers as referring to collections of objects and begin to think of them as objects in their own right. Once we have grasped that abstraction we can count numbers themselves, and operations on numbers, giving us the higher order construction of multiplication. Division operates in a similar way, providing an inverse to multiplication in the same way that subtraction provides an inverse to addition. That is, while addition asks “if I add a collection of size 3 to a collection of size 2, what size is the resulting collection?”, subtraction asks the inverse question “if I got a resulting collection of size 5 by adding some collection to a collection of size 3, how much must I have added?”; parallelling that we have multiplication asking “if I add together 5 collections of size 3, what size is the resulting collection”, and division reversing the question: “if I have a resulting collection of size 15, how many collections of size 3 must I have added together?”. Everything seems fine so far, but there is some subtlety here that complicates the issue.

If we are still thinking in terms of collections then dividing a collection of 2 objects into 4 parts doesn’t make sense. If we are viewing numbers and operations on them as entities in their own right then we can at least form the construction 2/4, and ask if it might have a practical use. It turns out that it does, since it allows a change of units. What do I mean by this? We can say a given collection has the property of having 2 objects in it, but to do so is make a decision about what constitutes a discrete object. Deciding what counts as an object, however, is not always clear — there are often several possible ways to do it, depending on what you wish to consider a whole object (that is, the base unit which you use to count objects in the collection). A simple example: in the World Cup soccer finals, do you count the number of teams, or the number of individual players? Both make sense depending on the kind of result you want to obtain, so considering a team, or each individual player, as a discrete object is a choice. The problem is even more common when dealing with measurement: a distance is measured as a certain number of basic lengths, but what you use as your basic length (the unit of measure) is quite arbitrary. We tend to measure highway distances in miles or kilometres and people’s heights in feet or metres, but we could just as easily switch to different units and measure highway distances in feet or metres and still be talking about the same distance. What matters is knowing what your base units are: we can count money in terms of dollars, or in terms of cents, but knowing which you are using makes big a difference.

Most importantly we can change our minds, or re-interpret, what constitutes a distinct object after the fact. Using this re-interpretation of what constitutes a discrete object, we can make sense of 2/4. If we re-interpret a distinct object such that what we had previously considered a single object is now considered two objects then we will have 4 objects in the collection, and we need 4 of these new objects to arrive at a collection that would be regarded as having 2 old objects. That is, 2/4 is expressible in terms of the re-interpreted objects, and in fact defines the relationship between old objects and new. But here’s the rub: we arrived at the new objects by considering each old object as 2 new objects, and so 1/2 expresses the same relationship between old and new objects: 1 old object reinterpreted as 2 results in the same new object as 2 old objects re-interpreted as 4.

Indeed, we can go on like this, with 3/6, 4/8, 5/10, and so on, all expressing the same relationship of new object to old – all different ways to arrive at the same “size” of new object. And so we have a catch – what are on inspection quite different expressions will, in practice, behave the same. Re-interpreting 1 object as 2, or 2 objects as 4 results in the “same” new objects, so counting, and hence addition, subtraction, and multiplication of these new objects will give the same result, whichever re-interpretation we use. Perhaps it doesn’t seem like much of revelation that 2/4 is the same as 1/2, but that is simply because we have learned, through practice, to automatically associate them. The reality is that 2/4 and 1/2 are quite distinct, and it is only because they behave identically with regard to arithmetic that we regard them as the same. In identifying them as the same we are abstracting over such expressions, forgetting the particularities of what size of initial collection we were dividing, caring only about the common behaviour with regard to arithmetic. Making sense of fractions involves abstracting over numbers – they are another level of abstraction, and this, I suspect, is why people find them difficult when they first encounter them.

There is an important idea in this particular abstraction that is worth paying attention to – it leads the way to algebra. We have an infinite number of different objects: 1/2, 2/4, 3/6, 4/8,… but because they all behave identically with respect to a given set of rules (in this case basic arithmetic) we pick a single symbol to denote the entire class of possible objects. Algebra can be thought of as extending that idea to its logical conclusion. The insight we need to make the step to algebra is that there is a subset of the rules of arithmetic for which *all numbers* behave identically. For example reversing the order of addition makes no difference to the result, no matter what numbers you are adding: 1+2=2+1, and 371+27=27+371. If you can identify which rules have the property that the specific numbers don’t matter, then you can pick a single symbol to denote the entire class of numbers for any manipulations within that set of rules. This is algebra.

This is important because it is a layer of abstraction over and above the abstraction of numbers. With numbers we considered many different collections and abstracted away everything about them except a certain property — the number of objects they contain. This proved to be useful because with regard to a certain set of rules, the rules of arithmetic, that was the only aspect of the collection that made a difference. Now we are regarding numbers as objects in their own right and, having identified a set of rules under which the particular number is unimportant, we are abstracting away what particular number we are dealing with. With numbers we could perform calculations and have the result be true regardless of the particular nature of the collections beyond the number of objects. Now, with algebra, we can perform calculations and have the result be true regardless of the particular numbers involved. This is an exceptionally powerful abstraction: it essentially does for numbers what numbers do for collections. This is why the rules of algebra, that subset of arithmetic rules under which all numbers behave identically, are so important.

In particular we can say that, no matter what numbers x, y and z are, the following are always true:

- x+y=y+x and x×y=y×x. These are referred to as
*commutative*properties. - x+(y+z)=(x+y)+z and x×(y×z)=(x×y)×z. These are referred to as
*associative*properties. - x×(y+z)=x×y+x×z. This is referred to as a
*distributive*property. - x+0=x and x×1=x. This property of 0 and 1 is referred to as being an
*identity element*for addition and multiplication (respectively). - There is a number, denoted –x such that –x+x=0. This refers to the existence of
*inverses*for addition.

We also have one odd one out — the existence of inverses for multiplication. The catch here is that it does matter what number *x* is; inverses exist for almost every number, but if x=0 there is no multiplicative inverse of x. Thus we have:

- If x is any number other than zero then there is a number, denoted 1/x, such that (1/x)×x=1.

If you have any curiousity you will be wondering why this special case occurred, breaking the pattern. Remember that we are talking about abstract properties common to all numbers, so the fact that this is a special case says something quite deep about both multiplication, fractions, and the number zero. Indeed, because we are two layers of abstraction up, referring to all numbers, which in turn each refer to all collections with a given property, the fact that this is a special case has significance with regard to almost everything in the physical world. It is worth spending some time thinking about what it truly means.

We have some further properties with regard to how numbers can be ordered. I haven’t touched on this topic yet — we’ve only referred to numbers as a property of collections, and not as an ordering — but it is sufficiently intuitive (that is, most people have a firm enough grasp on numbers) that I won’t get into details here; just be forewarned that numbers as order and numbers as size are actually distinct concepts that, at some point, we will have to carefully tease apart.

- Either x<y, y<x, or x=y.
- If x<y and y<z then x<z.
- If x<y then x+z<y+z.
- If x<y and 0<z then x×z<y×z.

Note that, again, 0 and multiplication have a significant interaction and provide another special case.Note that I gave names to properties 1 through 5 because these properties will keep cropping up again and again later; some will prove to be important, others less so. Which ones are important and which are not may be somewhat of a surprise, but I’ll leave that surprise till later.

At this point it is worth taking stock of how far we’ve come. Not only have we built up two layers of abstraction, each of which can be used to great practical effect (just witness how much of modern technology and engineering is built upon arithmetic and elementary algebra!), in doing so we’ve begun to uncover an even deeper principle — the principle that will form the foundation for much of the modern mathematics that is to follow. What do I mean? There is a common thread to how these successive abstractions have been built: we discerned a set of rules for which an entire class of objects (potentially even completely abstract objects) behave identically, and this allowed us to abstract over the entire class. The broader the class the broader the results we can draw; the higher the abstraction (in terms of successive layers) the deeper the results we can draw. The approach now will be to seek out rules, and classes that they allow us to abstract over; the broader and more layered, the better. In so doing we will part ways with numbers entirely. Fractions, ordering, and the difficulties of 0, will lead us towards a kind of generalised geometry, while consideration of properties 1-6 will lead us to a language of symmetry.

We have come to the first truly significant incline on our road. Behind us lies a vast plain of numbers, fractions, and algebra. There is much more to explore there — we haven’t even touched on popular topics such as trigonometry — but in following the path we have, we have stumbled across a road that leads deep into the mountains. We have identified a common property to the abstractions we are making, and will now seek to generalise it. The importance of this cannot be overstated! We are abstracting over the process of abstraction itself! This is the path to high places from which, when we finally arrive, we can look out, over all the plains we now leave behind, with fresh eyes, and deeper understanding.

]]>