Processing math: 100%

Notes on Permutations: Cycles and Transpositions
Tyler Neylon
332.2022

[Formats: html | pdf]

This is a collection of notes about permutations — all the different ways of ordering a set of distinct elements. In group theory, the group of permutations of size n is denoted Sn. For example:

S2 1221
S3 123132213231312321
S4 123412431324134214231432
213421432314234124132431
312431423214324134123421
412341324213423143124321

You can think of a single permutation as either a particular ordering of the integers 1 through n, or, equivalently, as a 1-1 and onto mapping f:{1,,n}{1,,n}. There’s a natural way to write such a mapping f, which is to list out f(1),f(2),,f(n) as a string; you can consider the lists of elements of S2,S3,S4 above to be in this string notation. By default, this article will view permutations as mappings.

This article covers the following independent observations about permutations:

When I say these are independent observations, I mean that these ideas are new to me. I have not tried much to check if they are new to the world altogether — in all likelihood, many of these ideas have been discovered by others before me. Still, I think they’re fun ideas worth sharing.

1 Cycle Notation

Any permutation is a specific ordering of a set of distinct elements. This article focuses on finite permutations, and I’ll generally use the variable n to denote the size of the set being permuted. It’s simple and traditional to think of a permutation as a bijection (a 1-1 and onto mapping) of the first n integers — for example, here’s a permutation of the set {1,2,,8} in string notation:

π=15273648.(1)

Despite the possible confusion with the constant π=3.141, it’s also traditional to use the variable π for a permutation, as (clearly) π stands for “πermutation.”

There are many fun ways to denote or think about a particular permutation. For example, given π as in (1), we could think of it as the sequence π1=1, π2=5, , π8=8. We could also think of it as a function π:[n][n], using [n] to denote the integers {1,2,,n}.

This article uses a particular notation for permutations called cycle notation: the idea for cycle notation is to write out a string of integers in [n] with the interpretation that any consecutive pair ij indicates that i is mapped to j, and any i at the end of a parenthesized group maps to j at the start of the same group. Our permutation from (1) looks like this in cycle notation:

π=(253)(47)

because π(2)=5, π(5)=3, and π(4)=7. Each sequence in parentheses wraps around, so π(3)=2 and π(7)=4. Thus each parenthesized sequence is a cycle. It is understood that any omitted elements are mapped to themselves; since π(1)=1, we don’t need to include 1 in the cycle notation for π.

Examples:

For n=5, (13) denotes 32145.
For n=5, (23)(45) denotes 13254.
For n=8, (152)(37)(486) denotes 51782436.

You could write the same permutation many ways in cycle notation; for example:

π=(74)(532)=(325)(74)=(253)(47).

π=(74)(532)=(325)(74)=(253)(47).

Among those choices, I consider the last one to be standard because it’s lexicographically first.

2 The Cut-Merge Operation

Now I’ll introduce an operation that acts naturally on the cycle notation of a permutation. Intuitively, a merge is an operation that combines two cycles by concatenating their cycle notation. When two elements — say 3 and 5 — are in different cycles, then I’ll write π(35) to mean “merge, in π, the cycles starting with 3 and 5.”

An example:

(12)(34)(56)(35)=(12)(3456).

What if 3 and 5 were already in the same cycle? Then π(35) takes on the meaning “cut apart the subcycles starting with 3 and 5:”

(12)(3456)(35)=(12)(34)(56)

or

(12345)(35)=(34)(512);

this last example might be easier to see when you keep in mind that

(12345)=(3_45_12).

Now I’m ready to more carefully define the cut-merge operation (xy). In the definition below, I’ll write (x) to denote the cycle starting with x; and, analogously, I’ll write (xy) to indicate a cycle containing both x and y, possibly with other elements indicated by the dots.

Definition The cut-merge operation (xy) operates on a permutation π via

π(xy)={(xy)if π=(x)(y) or,(x)(y)if π=(xy).

π(xy)={(xy)if π=(x)(y);(x)(y)if π=(xy).

If there are any other cycles (ones without x or y), then they are unaffected by the (xy) operation.

The operation cuts a cycle if x and y are in the same cycle; otherwise it merges the cycles of x and y. A shorthand for the definition is:

(xy)(xy)(x)(y).

Note that a merge can still operate on elements not explicitly written in cycle notation. If I write e to denote the identity permutation, with πi=ii, then

e(xy)=(xy).

Similarly,

(12)(23)=(213)

because, setting x=2 and y=3, we think of the left side as (x)(y)=(21)(3).

Observation 1 The cut-merge operation π(xy) is simply π composed with the permutation (xy).

To state this observation again, I’ll introduce the notation ρab() for the function that swaps input elements a and b, but otherwise acts as the identity function. Thus ρab() is the function version of the transposition that we would write in cycle notation as (ab). With ρab() defined, I can express observation 1 like so:

(π(xy))(i)=ρxy(π(i)),(2)

where I’m thinking of permutations as functions, so the (i) sub-expression is evaluating the function on input i{1,2,,n}.

I could stop to carefully prove observation 1, but I think you’ll have more fun if you convince yourself it’s true on your own.

Why did I bother to define the cut-merge operation when I could have just started with (2) instead? Because the perspective of the definition I’m using will let us make some interesting further observations, as we’ll see below.

3 The Magnitude and Parity of a Permutation

Definition The magnitude of a permutation π is given by

m(π):=#(cycle elements of π)#(cycles of π),

m(π):=#(cycle elements of π)#(cycles of π),

where a cycle element of π is any element written in the cycle notation of π, and where we similarly count the cycles of π based on how many cycles are written in π’s cycle notation.

For example, the permutation π=(253)(47) has 5 cycle elements and 2 cycles, so m(π)=3.

Notice that m(π) remains the same if we include singleton cycles in our cycle notation. It would be nonstandard to include singletons, but it would still be a consistent notation. Continuing our example, π=(1)(253)(47)(6) m(π)=74=3, as before. This leads to:

Observation 2 m(π)=n#(orbits), where an orbit is either a cycle with multiple elements, or a singleton, which is an element i:π(i)=i.

Let’s see why the definition of m(π) is interesting.

Observation 3

  1. Every merge increases m(π) by 1.
  2. Every cut decreases m(π) by 1.
  3. m(π) is the least number of cut-merge operations which can reach π from the identity permutation e.

Parts (i) and (ii) are easy to check.

Part (iii) is clear from the cut-merge perspective in that (a) we cannot build π in fewer operations, based on (i); and (b) we can indeed build π in m(π) merges by appending together the appropriate singletons one at a time. For example:

(253)(47)=e(25)(23)(47).(3)

(253)(47)=e(25)(23)(47).(3)

Starting now, I’ll write simply στ or στ to denote the composition of permutations σ and τ. Using observation 1, we can see that the construction pattern used in (3) must work for any cycle that we want to write as a composition of transpositions:

(x1x2xk)=(x1x2)(x1x3)(x1xk).

In other words, every permutation is the product of m(π) transpositions, and cannot be the product of fewer. It could be the product of more, though:

(123)=(12)(13)(89)(89).

Now we’re ready to define an interesting way to classify all permutations into two different types:

Definition A permutation π is odd if m(π) is odd, and even otherwise.

As soon as we call something even or odd, we expect some kind of behavior like this to happen:

even+even=evenodd+even=oddodd+odd=even.(4)

It turns out that these rules do indeed hold true in the sense that, for example, if we compose an odd permutation with another odd permutation, the result is an even permutation. Next I’ll explain why the intuitive identities of (4) are true for permutations.

The key to the proof is to see that, for any sequence of transpositions t1,t2,,tk:

m(t1t2tk) is even iff k is even.(5)

This fact follows from observation 3. In more detail: Certainly m(e)=0 (recall that e is the identity permutation); this corresponds to k=0 in (5), an empty product. Now for larger values of k: If π=t1t2tk1 and we compose π with tk (move from π to πtk), then the parity (even/odd-ness) of the permutation switches because m(πtk)=m(π)±1 by observation 3. So πtk is even iff π is odd. This is enough to complete a proof of (5) by induction on k.

It will be useful to define the sign function of a permutation π like so: sign(π):={1if π is even, and1if π is odd. Notice that sign(π)=(1)m(π).

When we use this sign function, we’re essentially moving from an additive view of even/odd-ness and into a multiplicative view — but the two views are equivalent. For example, the equation even+odd=odd becomes (1)even(1)odd=(1)odd. The general equation (1)a(1)b=(1)a+b provides the equivalence.

We can now state the idea of (4) both more formally and more succinctly.

For any two permutations σ and τ,

sign(στ)=sign(σ)sign(τ).(6)

Let’s see why (6) is true. Let k=m(σ) and k=m(τ). Then, using observation 3(iii), we can find transpositions (ti)k1 and (ti)k1 so that σ=t1tkτ=t1tk. Then

sign(στ)=(1)m(στ)by def'n of sign=(1)m(t1tkt1tk) by def'n of ti and ti=(1)k+k by (5)=(1)k(1)k=sign(σ)sign(τ).

sign(στ)=(1)m(στ)(by def'n of sign)=(1)m(t1tkt1tk)(by def'n of ti and ti)=(1)k+k(by (5))=(1)k(1)k=sign(σ)sign(τ).

This completes the proof of (6). And (6) expresses the same idea as (4), so that we have now justified all of those formulae.

Moving back to group theory: Notice that if σ and τ are both even permutations, then so is στ. In other words, the subset of even permutations is closed under composition, so that they form a subgroup of Sn. This subgroup is called the alternating group. Since odd permutations are not closed under composition, they don’t form a subgroup.

4 Previous Approaches to Permutation Parity

I like the cut-merge proof of (6) because it provides an intuition for why parity makes sense for permutations. In a nutshell, every permutation is a sequence of transpositions (cut-merges), and every transposition is a flip of the parity because the magnitude must either increase or decrease by 1.

Here’s a sketch of the intuitive paths we’ve crossed:

transpositions cut-merges changes to cycles
changes to parity changes to m(π)

transpositionscut-mergeschanges to cycleschanges to m(π)changes to parity

I’ll contrast the proof above of (6) with two other approaches.

4.1 Herstein’s Proof

In his book Topics in Algebra, I.N. Herstein uses the Vandermonde polynomial:

p(x1,,xn):=i<j(xixj).

Given a permutation π, we can now define

π(p(x1,,xn)):=p(xπ1,,xπn).

The polynomial π(p) is still a product of (xixj) for all ij, but the sign of the polynomial may change. For example, when n=3, and π=(12):

p(x1,x2,x3)=(x1x2)(x1x3)(x2x3)π(p)=(x2x1)(x2x3)(x1x3)=p.

Since π(p)=±p for any π, we can (re)define

sign(π):=π(p)/p{1,+1}.(7)

(Temporarily forget our original definition of sign(π). We’ll re-prove (6), and because our new and old sign() functions are the same on e and on transpositions, it will follow that they are in fact the same function.)

Notice that

π(p)=π(p)(8)

based on the definition of π(p).

This means that, for any two permutations σ and τ,

sign(στ)p=τ(σ(p))by (7)=τ(sign(σ)p)by (7)=sign(σ)τ(p)by (8)=sign(σ)sign(τ)p.

In summary, sign(στ)=sign(σ)σ(τ), which confirms (6) for this new definition of sign(). This proof provides less insight than the cut-merge approach as to why (6) is true — into what is going on behind the scenes to make things click.

4.2 Artin’s Proof

The other approach I’ll consider is from the book Algebra by Michael Artin. This second alternative proof works by translating permutations into the language of linear algebra, and relies upon properties of the determinant of a matrix. On the off chance that you’re not fluent in properties of determinants, you can skim or skip this section and safely resume reading afterwards (the remainder of this note does not rely on linear algebra). I’m not a huge fan of this proof because it’s so easy to get lost in the details of tracking the relationship between vector coordinates and how permutations act on them. I’ll break the proof down into small steps to walk you through the dance of indices.

I’ll use the variable ei to denote the column vector of length n:

ei=001ith place.00

For example, if n=3, then

e1=(100)e2=(010)e3=(001).

Now I can define, for permutation π, the n×n matrix Mπ as having eπi for its ith column; this is how Artin represents a permutation as a matrix. For example, if σ=(321), then Mσ=(111).

Let’s see why this matrix definition makes sense. Using the column-expansion perspective of matrix multiplication, with x=(x1xn),

Mπx=x1eπ1+x2eπ2++xneπn=(xi) in row πi.(9)

Mπx=x1eπ1+x2eπ2++xneπn=(xi) in row πi.(9)

In other words, when we multiply Mπ on the left of a column vector, we take whatever value was in the ith row and move it to the πthi row. For example, Mσ(123)=(111)(123)=(231). Continuing with this intuition, left-multiplying by MτMσ is like performing σ followed by τ in the sense that the ith row is moved to the τ(σ(i))th. I personally prefer to have statements like this algebraically checked, so in the next couple of paragraphs, I’ll spell out more carefully why this is true.

It’s useful to slightly rewrite (9) as in the right-hand expression below, using the notation [v]i to indicate the ith coordinate of column vector v:

[Mπ(x1xn)]πi=xi[Mπx]i=xπ1i.(10)

[Mπ(x1xn)]πi=xi[Mπx]i=xπ1i.(10)

So left-multiplying by both Mτ and Mσ works like so, using (10):

[MτMσx]i=[Mσx]τ1(i)=xσ1(τ1(i)).(11)

[MτMσx]i=[Mσx]τ1(i)=xσ1(τ1(i)).(11)

Writing Mστ for the matrix of permutation στ, we have

[Mστx]i=x(στ)1(i)=xσ1(τ1(i)).(12)

Together, (11) and (12) confirm that: Mστ=MτMσ in the sense that they act the same in matrix multiplications, from which we can conclude that they are in fact identical matrices.

At long last, I’m ready to (re)define the sign() function, following Artin: sign(π):=det(Mπ), where det() is the determinant function of a square matrix.

It turns out that this definition is also consistent with our earlier definitions because:

In using this approach, Artin is able to show that (6) holds simply by delegating the work to the properties of matrix determinants. In particular, using the fact that det(AB)=det(A)det(B):

sign(στ)=det(MτMσ)=det(Mτ)det(Mσ)=sign(σ)sign(τ).

sign(στ)=det(MτMσ)=det(Mτ)det(Mσ)=sign(σ)sign(τ).

Artin’s proof is certainly valid, though I think it offers less insight into why equation (6) holds when compared to the cut-merge proof.

5 Cycle Structures

Let σ=(13)(245) and τ=(14)(23). Then στ=(12)(345), andτσ=(152)(34).

Notice that these two permutations each have one cycle of length 2 and another of length 3. Is this a coincidence? It turns out that this similarity of structure will always hold between στ and τσ. I’ll have to define a new concept before I can precisely formulate this relationship.

Definition Given a permtuation π, a cycle set S of π is a set of elements in the domain of π such that, for any a,bS, k:πk(a)=b. Notice that every element in the domain of π is in exactly one cycle set of π.

The cycle structure cs(π) of π is a function f:N1N0 such that f(i):= the number of cycle sets of π with i elements.

As a shorthand, I’ll write (f(1),f(2),f(3),) to express a cycle structure. For example, when n=5, the permutation (12)(345) has cycle structure (0,1,1); so does the permutation (152)(34). Now I can state:

Observation 4 For any permutations σ and τ, cs(στ)=cs(τσ).

I’ll provide two proofs because they each shed their own light on why this is true.

Proof 1 Given permutation π, I’ll write orderπ(i)=k to mean that πk(i)=i, and there is no <k:π(i)=i. To prove the observation, it’s sufficient to show that there is a 1-1 and onto mapping f:[n][n] such that orderστ(i)=korderτσ(f(i))=k.

Suppose orderστ(i)=k. Let j=τ1(i). In the following equations, I’ll use a left-to-write, input-to-output notation, where iπ means π(i) and iστ means τ(σ(i)). We have: i=i(στ)k=jτ(στ)k=j(τσ)kτ. Now apply τ1 to the left-most and right-most expressions above to arrive at:

j=iτ1=j(τσ)korderτσ(j)k=orderστ(i).(13)

j=iτ1=j(τσ)korderτσ(j)k=orderστ(i).(13)

On the other hand, if =orderτσ(j), then

i=(j)τ=(j)(τσ)τ=(j)τ(στ)=i(στ) orderστ(i)=orderτσ(j).(14)

Combining (13) and (14), orderστ(i)=orderτσ(j=τ1(i)), and it turns out that τ1 is the mapping f:[n][n] we needed to complete the proof.

Proof 2 Given any permutation product στ, we can decompose τ into transpositions as in τ=t1tk. Now we can think in terms of a series of smaller steps that carry us from στ over to τσ: π0=στ=σt1tkπ1=tkστt1k=tkσt1tk1πk=t1tkστt1kt11=τσ. Each small step here can be characterized as πi+1=tkiπitki, using the fact that t1j=tj, which is true for any transposition. So the problem is reduced to showing that, for any permutation π and transposition t, cs(π)=cs(tπt).(15) It turns out that applying t=(xy) in this manner simply swaps the places of x and y in the cycle notation of π. For example: (13)(1_253_7)(13)=(3_251_7). This confirms (15), which completes the proof.

Notice that observation 4 can be generalized for any rotation of any finite product of permutations. For example,

cs(abcd)=cs(bcda)=cs(cdab)=cs(dabc).

cs(abcd)=cs(bcda)=cs(cdab)=cs(dabc).

The equation cs(στ)=cs(τσ) makes it tempting to suspect that cycle structure is preserved no matter what order we multiply a given list of permutations. For example, we could ask if the following identities would hold for any three permutations a,b,c:

cs(abc)?=cs(acb)?=cs(bac)?=cs(bca),etc.

cs(abc)?=cs(acb)?=cs(bac)?=cs(bca),etc.

for all the orderings of a,b,c; and similarly for any finite list of permutations.

But this is not true in general. For example, if a=(123), b=(12), c=(13), then abc=(123)(12)(13)=(132)cba=(13)(12)(123)=e. There is a natural follow-up question to observation 4 which I can state clearly once I’ve provided a couple new definitions.

Given any permutation π, let flips(π):={ba:π=ab}. And let

same_cycles(π):={σSn:cs(σ)=cs(π)}.

same_cycles(π):={σSn:cs(σ)=cs(π)}.

Question 1 Observation 4 tells us that, for any πSn, flips(π)same_cycles(π). Are these sets actually equal?

This is similar to asking: Given π and the freedom to decompose it as π=ab, what are all the possible values of ba?

The answer arises from a simple permutation decomposition. Let’s start with:

Observation 5 The cycle notation of τ1στ is the same as the cycle notation of σ after each element i is replaced with (i)τ.

The observation is true because (iτ)(τ1στ)=iστ; that is, if σ:=τ1στ and σ:ij, then σ:iτjτ.

For example, if τ=(253) and σ=(12)(345), then

τ1στ=(15)(243)=[(12)(345)].replace(τ).

τ1στ=(15)(243)=[(12)(345)].replace(τ).

Note that this replacement operation is specific to cycle notation. If we were to instead write out π as the string π1π2πn, then the corresponding string for τ1στ is no longer replacement with τ. I’ll illustrate this distinction with an example:

τ=(253)σ=21453= (This is σ1σ2σ3σ4σ5.)τ1στ=54231[21453].replace(τ)στ=51432=[21453].replace(τ).

τ=(253)σ=21453= (This is σ1σ2σ3σ4σ5.)τ1στ=54231[21453].replace(τ)στ=51432=[21453].replace(τ).

To help build intuition for the set same_cycles(π), we can choose a canonical “first” element of same_cycles(π). A simple way to do this it to treat all singleton-free cycle notations in same_cycles(π) as strings, treating the “)” character as the last in the alphabet (we’ll ignore the “(” character for the sake of ordering). Then we can define canon(π) to be the lexicographically first element of same_cycles(π). For example, canon(342)(15)=(123)(45). If we had not specified the ordering of “)”, then it would be unclear if (12)(345) or (123)(45) were first.

Now we’re ready for:

Answer to Question 1 Given any σsame_cycles(π), we can decompose π=ab so that ba=σ. That is, flips(π)=same_cycles(π).

Proof We can use observation 5 to find τ so that π=τ1στ.(16) Let a=τ1 and b=τπ. Then ab=πba=τπτ1=σby (16).

The main idea of this proof is to use observation 5 to see that cs(τ1στ)=cs(σ) for any τ. As the proof shows, by choosing the right τ, we can make τ1στ equal to anything in flips(σ). And anything of the form τ1στ is in flips(σ) by considering a=τ and b=τ1σ.

Note that the τ used in the proof may not be unique because there are different cycle notation strings for the same permutation. Consider π=(12) and σ=(34). We can convert π into σ in different ways:

(12)(12)τ=(13)(24)τ=(14)(23)(34)(43)

(12)τ=(13)(24)(34). (12)τ=(14)(23)(43)

The above diagram is an informal way of writing that τ1πτ=(34)=(43)=τ1πτ.

6 The Magnitude is Like a Norm

In this section I’ll point out some interesting properties of the magnitude function.

Observation 6 For any permutations π and σ:

  1. m(π)0; and m(π)=0 iff π=e.
  2. m(πσ)m(π)+m(σ).
  3. m(π1)=m(π).
  4. m(πσ)=m(σπ).

Proof Start by noticing that m(π) is the least number of transpositions by which we can get from e to π; ie, m(π) is the least integer for which we can write π=m(π)i=1ti for some transpositions (ti)m(π)1.

Considering each of π and σ as a composition of this minimal number of transpositions, parts (i) and (ii) are straightforward.

Part (iii) follows since the cycle notation for π1 is simply the reverse of the cycle notation for π.

Part (iv) follows from observation 4.

In many ways, the magnitude of a permutation acts like a vector norm — the key difference is that we don’t have a meaningful interpretation of kπ for any kR, so we don’t have the additional rule m(kπ)=|k|m(π).

This makes it interesting to define a notion of distance between permutations:

Definition Given permutations π and σ, dist(π,σ):=m(π1σ).

I’ll justify why this definition makes sense. Let x=π1σ. Then πx=σ, so dist(π,σ) is measuring the fewest “hops” — transpositions — needed to get from π to σ.

We’d like a good distance function to be symmetric, and luckily for us, this one is:

dist(σ,π)=m(σ1π)=m(π1σ)=dist(π,σ),

dist(σ,π)=m(σ1π)=m(π1σ)=dist(π,σ),

the middle equality following since m(π)=m(π1) by observation 6. This symmetry addresses the apparent inelegance of the definition; indeed, the definition may appear to treat π and σ differently, but the end result does not depend on the order of these inputs to dist().

To conclude this note, I’ll ask a fun and intriguing question whose answer I don’t yet know:

Question 2 Is there a way to generalize Sn and m() so that the product kπ makes sense for some values of kR, and so that m(kπ)=|k|m(π)?

References

Artin, M. 1991. Algebra. Prentice Hall.
Herstein, I. N. 1990. Abstract Algebra. Macmillan Pub.

You can get emails about new posts if you'd like: