Notes on Permutations: Cycles and Transpositions
Tyler Neylon

[Formats: html | pdf]

This is a collection of notes about permutations — all the different ways of ordering a set of distinct elements. In group theory, the group of permutations of size \(n\) is denoted \(S_n\). For example:

\(S_2\) \(\tt 12 \quad 21\)
\(S_3\) \(\tt 123 \quad 132 \quad 213 \quad 231 \quad 312 \quad 321\)
\(S_4\) \(\tt 1234 \quad 1243 \quad 1324 \quad 1342 \quad 1423 \quad 1432\)
\(\tt 2134 \quad 2143 \quad 2314 \quad 2341 \quad 2413 \quad 2431\)
\(\tt 3124 \quad 3142 \quad 3214 \quad 3241 \quad 3412 \quad 3421\)
\(\tt 4123 \quad 4132 \quad 4213 \quad 4231 \quad 4312 \quad 4321\)

You can think of a single permutation as either a particular ordering of the integers 1 through \(n\), or, equivalently, as a 1-1 and onto mapping \(f:\{1,\ldots, n\}\to\{1,\ldots,n\}\). There’s a natural way to write such a mapping \(f\), which is to list out \(f(1), f(2), \ldots, f(n)\) as a string; you can consider the lists of elements of \(S_2, S_3, S_4\) above to be in this string notation. By default, this article will view permutations as mappings.

This article covers the following independent observations about permutations:

When I say these are independent observations, I mean that these ideas are new to me. I have not tried much to check if they are new to the world altogether — in all likelihood, many of these ideas have been discovered by others before me. Still, I think they’re fun ideas worth sharing.

1 Cycle Notation

Any permutation is a specific ordering of a set of distinct elements. This article focuses on finite permutations, and I’ll generally use the variable \(n\) to denote the size of the set being permuted. It’s simple and traditional to think of a permutation as a bijection (a 1-1 and onto mapping) of the first \(n\) integers — for example, here’s a permutation of the set \(\{1, 2, \ldots, 8\}\) in string notation:

\[\pi = 15273648.\qquad(1)\]

Despite the possible confusion with the constant \(\pi=3.141\ldots\), it’s also traditional to use the variable \(\pi\) for a permutation, as (clearly) \(\pi\) stands for “\(\pi\)ermutation.”

There are many fun ways to denote or think about a particular permutation. For example, given \(\pi\) as in (1), we could think of it as the sequence \(\pi_1=1\), \(\pi_2=5\), \(\ldots\), \(\pi_8=8\). We could also think of it as a function \(\pi:[n]\to[n]\), using \([n]\) to denote the integers \(\{1, 2,\ldots, n\}\).

This article uses a particular notation for permutations called cycle notation: the idea for cycle notation is to write out a string of integers in \([n]\) with the interpretation that any consecutive pair \(ij\) indicates that \(i\) is mapped to \(j\), and any \(i\) at the end of a parenthesized group maps to \(j\) at the start of the same group. Our permutation from (1) looks like this in cycle notation:

\[ \pi = (2\;5\;3)(4\;7) \]

because \(\pi(2) = 5\), \(\pi(5) = 3\), and \(\pi(4) = 7\). Each sequence in parentheses wraps around, so \(\pi(3)=2\) and \(\pi(7)=4\). Thus each parenthesized sequence is a cycle. It is understood that any omitted elements are mapped to themselves; since \(\pi(1)=1\), we don’t need to include \(1\) in the cycle notation for \(\pi\).


For \(n=5\), \((1\;3)\) denotes \(\tt 32145\).
For \(n=5\), \((2\;3)(4\;5)\) denotes \(\tt 13254\).
For \(n=8\), \((1\;5\;2)(3\;7)(4\;8\;6)\) denotes \(\tt 51782436\).

You could write the same permutation many ways in cycle notation; for example:

\[ \pi = (7\;4)(5\;3\;2) = (3\;2\;5)(7\;4) = (2\;5\;3)(4\;7). \]

\[\begin{align*} \pi = (7\;4)(5\;3\;2) &= (3\;2\;5)(7\;4) \\[0.2cm] &= (2\;5\;3)(4\;7). \end{align*}\]

Among those choices, I consider the last one to be standard because it’s lexicographically first.

2 The Cut-Merge Operation

Now I’ll introduce an operation that acts naturally on the cycle notation of a permutation. Intuitively, a merge is an operation that combines two cycles by concatenating their cycle notation. When two elements — say 3 and 5 — are in different cycles, then I’ll write \(\pi * (3\;5)\) to mean “merge, in \(\pi\), the cycles starting with 3 and 5.”

An example:

\[(1\;2)(3\;4)(5\;6) * (3\;5) = (1\;2)(3\;4\;5\;6).\]

What if 3 and 5 were already in the same cycle? Then \(\pi * (3\;5)\) takes on the meaning “cut apart the subcycles starting with 3 and 5:”

\[(1\;2)(3\;4\;5\;6) * (3\;5) = (1\;2)(3\;4)(5\;6)\]


\[(1\;2\;3\;4\;5) * (3\;5) = (3\;4)(5\;1\;2);\]

this last example might be easier to see when you keep in mind that

\[(1\;2\;3\;4\;5) = (\underline{3}\;4\;\underline{5}\;1\;2).\]

Now I’m ready to more carefully define the cut-merge operation \(*(x\;y)\). In the definition below, I’ll write \((x{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot})\) to denote the cycle starting with \(x\); and, analogously, I’ll write \((x{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}\;y{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot})\) to indicate a cycle containing both \(x\) and \(y\), possibly with other elements indicated by the dots.

Definition \(\quad\) The cut-merge operation \(*(x\;y)\) operates on a permutation \(\pi\) via

\[ \;\pi * (x\;y) = \begin{cases} (x{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}\;y{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}) & \text{if } \pi = (x{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot})(y{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}) \text{ or,} \\ (x{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot})(y{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}) & \text{if } \pi = (x{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}\;y{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}). \\ \end{cases}\]

\[ \;\pi * (x\;y) = \begin{cases} (x{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}\;y{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}) \\ \; \text{if } \pi = (x{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot})(y{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}); \\[0.3cm] (x{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot})(y{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}) \\ \; \text{if } \pi = (x{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}\;y{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}). \\ \end{cases}\]

If there are any other cycles (ones without \(x\) or \(y\)), then they are unaffected by the \((x\;y)\) operation.

The operation cuts a cycle if \(x\) and \(y\) are in the same cycle; otherwise it merges the cycles of \(x\) and \(y\). A shorthand for the definition is:

\[ (x{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}\;y{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}) \; \underset{*(x\;y)}{\longleftrightarrow} \; (x{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot})(y{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot}). \]

Note that a merge can still operate on elements not explicitly written in cycle notation. If I write \(e\) to denote the identity permutation, with \(\pi_i=i\;\forall i\), then

\[e * (x\;y) = (x\;y). \]


\[(1\;2) * (2\;3) = (2\;1\;3)\]

because, setting \(x=2\) and \(y=3\), we think of the left side as \((x{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot})(y{\cdot}\kern -0.1pt{\cdot}\kern -0.1pt{\cdot})=(2 1)(3)\).

Observation 1 \(\quad\) The cut-merge operation \(\pi * (x\;y)\) is simply \(\pi\) composed with the permutation \((x\;y)\).

To state this observation again, I’ll introduce the notation \(\rho_{ab}()\) for the function that swaps input elements \(a\) and \(b\), but otherwise acts as the identity function. Thus \(\rho_{ab}()\) is the function version of the transposition that we would write in cycle notation as \((a\;b)\). With \(\rho_{ab}()\) defined, I can express observation 1 like so:

\[ \left(\pi * (x\;y)\right)(i) = \rho_{xy}(\pi(i)), \qquad(2)\]

where I’m thinking of permutations as functions, so the \((i)\) sub-expression is evaluating the function on input \(i \in \{1, 2, \ldots, n\}\).

I could stop to carefully prove observation 1, but I think you’ll have more fun if you convince yourself it’s true on your own.

Why did I bother to define the cut-merge operation when I could have just started with (2) instead? Because the perspective of the definition I’m using will let us make some interesting further observations, as we’ll see below.

3 The Magnitude and Parity of a Permutation

Definition \(\quad\) The magnitude of a permutation \(\pi\) is given by

\[ m(\pi) := \#(\text{cycle elements of }\pi) - \#(\text{cycles of }\pi), \]

\[\begin{align*} m(\pi) & := \#(\text{cycle elements of }\pi) \\ & \;- \#(\text{cycles of }\pi), \end{align*}\]

where a cycle element of \(\pi\) is any element written in the cycle notation of \(\pi\), and where we similarly count the cycles of \(\pi\) based on how many cycles are written in \(\pi\)’s cycle notation.

For example, the permutation \(\pi = (2\;5\;3)(4\;7)\) has 5 cycle elements and 2 cycles, so \(m(\pi) = 3\).

Notice that \(m(\pi)\) remains the same if we include singleton cycles in our cycle notation. It would be nonstandard to include singletons, but it would still be a consistent notation. Continuing our example, \(\pi = (1)(2\;5\;3)(4\;7)(6)\) \(\Rightarrow\) \(m(\pi) = 7 - 4 = 3\), as before. This leads to:

Observation 2 \(\quad\) \[ m(\pi) = n - \#(\text{orbits}), \] where an orbit is either a cycle with multiple elements, or a singleton, which is an element \(i : \pi(i) = i\).

Let’s see why the definition of \(m(\pi)\) is interesting.

Observation 3 \(\quad\)

  1. Every merge increases \(m(\pi)\) by 1.
  2. Every cut decreases \(m(\pi)\) by 1.
  3. \(m(\pi)\) is the least number of cut-merge operations which can reach \(\pi\) from the identity permutation e.

Parts (i) and (ii) are easy to check.

Part (iii) is clear from the cut-merge perspective in that (a) we cannot build \(\pi\) in fewer operations, based on (i); and (b) we can indeed build \(\pi\) in \(m(\pi)\) merges by appending together the appropriate singletons one at a time. For example:

\[(2\;5\;3)(4\;7) = e * (2\;5) * (2\;3) * (4\;7).\qquad(3)\]

\[\begin{alignat*}{1} & (2\;5\;3)(4\;7) \\ & \quad = e * (2\;5) * (2\;3) * (4\;7).\quad (3) \end{alignat*}\]

Starting now, I’ll write simply \(\sigma \tau\) or \(\sigma\cdot\tau\) to denote the composition of permutations \(\sigma\) and \(\tau\). Using observation 1, we can see that the construction pattern used in (3) must work for any cycle that we want to write as a composition of transpositions:

\[(x_1\;x_2\cdots x_k) = (x_1\;x_2)(x_1\;x_3)\cdots(x_1\;x_k).\]

In other words, every permutation is the product of \(m(\pi)\) transpositions, and cannot be the product of fewer. It could be the product of more, though:

\[ (1\;2\;3) = (1\;2)(1\;3)(8\;9)(8\;9). \]

Now we’re ready to define an interesting way to classify all permutations into two different types:

Definition \(\quad\) A permutation \(\pi\) is odd if \(m(\pi)\) is odd, and even otherwise.

As soon as we call something even or odd, we expect some kind of behavior like this to happen:

\[ \begin{alignedat}{2} \text{even} &+ \text{even} &&= \text{even} \\ \text{odd} &+ \text{even} &&= \text{odd} \\ \text{odd} &+ \text{odd} &&= \text{even}. \\ \end{alignedat} \qquad(4)\]

It turns out that these rules do indeed hold true in the sense that, for example, if we compose an odd permutation with another odd permutation, the result is an even permutation. Next I’ll explain why the intuitive identities of (4) are true for permutations.

The key to the proof is to see that, for any sequence of transpositions \(t_1, t_2, \ldots, t_k\):

\[m(t_1t_2\cdots t_k)\text{ is even iff } k \text{ is even}.\qquad(5)\]

This fact follows from observation 3. In more detail: Certainly \(m(e)=0\) (recall that \(e\) is the identity permutation); this corresponds to \(k=0\) in (5), an empty product. Now for larger values of \(k\): If \(\pi=t_1t_2\cdots t_{k-1}\) and we compose \(\pi\) with \(t_k\) (move from \(\pi\) to \(\pi\cdot t_k\)), then the parity (even/odd-ness) of the permutation switches because \(m(\pi\cdot t_k) = m(\pi) \pm 1\) by observation 3. So \(\pi\cdot t_k\) is even iff \(\pi\) is odd. This is enough to complete a proof of (5) by induction on \(k\).

It will be useful to define the sign function of a permutation \(\pi\) like so: \[\textsf{sign}(\pi) := \begin{cases} \phantom{-}1 & \text{if $\pi$ is even, and} \\ -1 & \text{if $\pi$ is odd.} \end{cases}\] Notice that \(\textsf{sign}(\pi) = (-1)^{m(\pi)}\).

When we use this sign function, we’re essentially moving from an additive view of even/odd-ness and into a multiplicative view — but the two views are equivalent. For example, the equation \[\text{even} + \text{odd} = \text{odd}\] becomes \[(-1)^\text{even}(-1)^\text{odd} = (-1)^\text{odd}.\] The general equation \((-1)^a(-1)^b = (-1)^{a+b}\) provides the equivalence.

We can now state the idea of (4) both more formally and more succinctly.

For any two permutations \(\sigma\) and \(\tau\),

\[\;\textsf{sign}(\sigma\cdot\tau) = \textsf{sign}(\sigma) \cdot \textsf{sign}(\tau).\!\!\!\!\!\!\!\qquad(6)\]

Let’s see why (6) is true. Let \(k = m(\sigma)\) and \(k' = m(\tau)\). Then, using observation 3(iii), we can find transpositions \((t_i)_1^k\) and \((t'_i)_1^{k'}\) so that \[\begin{align*} \sigma &= t_1 \cdots t_k \\ \tau &= t'_1 \cdots t'_{k'}. \end{align*}\] Then

\[ \begin{alignedat}{2} \textsf{sign}(\sigma\cdot\tau) &= (-1)^{m(\sigma\tau)} & \text{by def'n of }\textsf{sign}\\ &= (-1)^{m(t_1\cdots t_k \cdot t'_1 \cdots t'_{k'})} & \quad\text{ by def'n of $t_i$ and $t'_i$} \\ &= (-1)^{k + k'} & \text{ by (5)} \\ &= (-1)^k (-1)^{k'} \\ &= \textsf{sign}(\sigma)\textsf{sign}(\tau). \end{alignedat} \]

\[ \begin{alignedat}{2} \textsf{sign}(\sigma\cdot\tau) &= (-1)^{m(\sigma\tau)} \\[0.2cm] && \quad\llap{(\textit{by def'n of }\textsf{sign})} \\[0.2cm] &= (-1)^{m(t_1\cdots t_k \cdot t'_1 \cdots t'_{k'})} \\[0.2cm] && \quad\llap{(\textit{by def'n of $t_i$ and $t'_i$})} \\[0.2cm] &= (-1)^{k + k'} \\[0.2cm] && \quad\llap{(\textit{by (5)})} \\[0.2cm] &= (-1)^k (-1)^{k'} \\[0.2cm] &= \textsf{sign}(\sigma)\textsf{sign}(\tau). \end{alignedat} \]

This completes the proof of (6). And (6) expresses the same idea as (4), so that we have now justified all of those formulae.

Moving back to group theory: Notice that if \(\sigma\) and \(\tau\) are both even permutations, then so is \(\sigma\tau\). In other words, the subset of even permutations is closed under composition, so that they form a subgroup of \(S_n\). This subgroup is called the alternating group. Since odd permutations are not closed under composition, they don’t form a subgroup.

4 Previous Approaches to Permutation Parity

I like the cut-merge proof of (6) because it provides an intuition for why parity makes sense for permutations. In a nutshell, every permutation is a sequence of transpositions (cut-merges), and every transposition is a flip of the parity because the magnitude must either increase or decrease by 1.

Here’s a sketch of the intuitive paths we’ve crossed:

\(\fbox{transpositions}\) \(\longleftrightarrow\) \(\fbox{cut-merges}\) \(\longleftrightarrow\) \(\fbox{changes to cycles}\)
\(\fbox{changes to parity}\) \(\longleftrightarrow\) \(\fbox{changes to \(m(\pi)\)}\)

\[ \begin{array}{c} \fbox{transpositions} \\ \updownarrow \\ \fbox{cut-merges} \\ \updownarrow \\ \fbox{changes to cycles} \\ \updownarrow \\ \fbox{changes to $m(\pi)$} \\ \updownarrow \\ \fbox{changes to parity} \\ \end{array} \]

I’ll contrast the proof above of (6) with two other approaches.

4.1 Herstein’s Proof

In his book Topics in Algebra, I.N. Herstein uses the Vandermonde polynomial:

\[ p(x_1,\ldots, x_n) := \prod_{i<j}(x_i-x_j). \]

Given a permutation \(\pi\), we can now define

\[ \pi(p(x_1,\ldots, x_n)) := p(x_{\pi_1},\ldots, x_{\pi_n}). \]

The polynomial \(\pi(p)\) is still a product of \((x_i - x_j)\) for all \(i\ne j\), but the sign of the polynomial may change. For example, when \(n=3\), and \(\pi=(1\;2)\):

\[\begin{align*} p(x_1, x_2, x_3) &= (x_1 - x_2)(x_1 - x_3)(x_2 - x_3) \\[0.2cm] \pi(p) &= (x_2 - x_1)(x_2 - x_3)(x_1 - x_3) \\[0.2cm] &= -p. \end{align*}\]

Since \(\pi(p) = \pm p\) for any \(\pi\), we can (re)define

\[ \textsf{sign}(\pi) := \pi(p)/p \in \{-1, +1\}. \qquad(7)\]

(Temporarily forget our original definition of \(\textsf{sign}(\pi)\). We’ll re-prove (6), and because our new and old \(\textsf{sign}()\) functions are the same on \(e\) and on transpositions, it will follow that they are in fact the same function.)

Notice that

\[\pi(-p) = -\pi(p)\qquad(8)\]

based on the definition of \(\pi(p)\).

This means that, for any two permutations \(\sigma\) and \(\tau\),

\[ \begin{alignedat}{2} \textsf{sign}(\sigma\tau)\cdot p &= \tau(\sigma(p)) & \text{by } (7) \\ &= \tau(\textsf{sign}(\sigma)\cdot p) & \text{by } (7) \\ &= \textsf{sign}(\sigma)\cdot\tau(p) & \text{by } (8) \\ &= \textsf{sign}(\sigma)\cdot\textsf{sign}(\tau)\cdot p. \end{alignedat} \]

In summary, \(\textsf{sign}(\sigma\cdot\tau) = \textsf{sign}(\sigma)\cdot\sigma(\tau)\), which confirms (6) for this new definition of \(\textsf{sign}()\). This proof provides less insight than the cut-merge approach as to why (6) is true — into what is going on behind the scenes to make things click.

4.2 Artin’s Proof

The other approach I’ll consider is from the book Algebra by Michael Artin. This second alternative proof works by translating permutations into the language of linear algebra, and relies upon properties of the determinant of a matrix. On the off chance that you’re not fluent in properties of determinants, you can skim or skip this section and safely resume reading afterwards (the remainder of this note does not rely on linear algebra). I’m not a huge fan of this proof because it’s so easy to get lost in the details of tracking the relationship between vector coordinates and how permutations act on them. I’ll break the proof down into small steps to walk you through the dance of indices.

I’ll use the variable \(e_i\) to denote the column vector of length \(n\):

\[e_i = \begin{array}{l} 0 \\ 0 \\ \vdots \\ 1 \longleftarrow i^\text{th}\text{ place}. \\ \vdots \\ 0 \\ 0 \\ \end{array}\]

For example, if \(n=3\), then

\[ e_1 = \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} \quad e_2 = \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} \quad e_3 = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}. \]

Now I can define, for permutation \(\pi\), the \(n\times n\) matrix \(M_\pi\) as having \(e_{\pi_i}\) for its \(i^\text{th}\) column; this is how Artin represents a permutation as a matrix. For example, if \(\sigma = (3\;2\;1)\), then \(M_\sigma = \left(\begin{smallmatrix}&1\\ &&1\\ 1 \end{smallmatrix}\right)\).

Let’s see why this matrix definition makes sense. Using the column-expansion perspective of matrix multiplication, with \(x=\begin{pmatrix}x_1 \\ \vdots \\ x_n\end{pmatrix}\),

\[M_\pi x = x_1\cdot e_{\pi_1} + x_2\cdot e_{\pi_2} + \ldots + x_n\cdot e_{\pi_n} = \left( \begin{array}{l} \vdots \\ x_i \\ \vdots \\ \end{array} \right) \longleftarrow\text{ in row }\pi_i. \qquad(9)\]

\[\begin{align*} M_\pi x &= x_1\cdot e_{\pi_1} + x_2\cdot e_{\pi_2} + \ldots + x_n\cdot e_{\pi_n} \\[0.3cm] &= \left( \begin{array}{l} \vdots \\ x_i \\ \vdots \\ \end{array} \right) \longleftarrow\text{ in row }\pi_i. \;\; (9) \end{align*}\]

In other words, when we multiply \(M_\pi\) on the left of a column vector, we take whatever value was in the \(i^\text{th}\) row and move it to the \(\pi_i^\text{th}\) row. For example, \(M_\sigma\left(\begin{smallmatrix}1 \\ 2 \\ 3 \end{smallmatrix}\right) = \left(\begin{smallmatrix}&1\\ &&1\\ 1 \end{smallmatrix}\right) \left(\begin{smallmatrix}1 \\ 2 \\ 3 \\ \end{smallmatrix}\right) = \left(\begin{smallmatrix}2 \\ 3 \\ 1 \\ \end{smallmatrix}\right)\). Continuing with this intuition, left-multiplying by \(M_\tau M_\sigma\) is like performing \(\sigma\) followed by \(\tau\) in the sense that the \(i^\text{th}\) row is moved to the \(\tau(\sigma(i))^\text{th}\). I personally prefer to have statements like this algebraically checked, so in the next couple of paragraphs, I’ll spell out more carefully why this is true.

It’s useful to slightly rewrite (9) as in the right-hand expression below, using the notation \([v]_i\) to indicate the \(i^\text{th}\) coordinate of column vector \(v\):

\[\left[M_\pi\begin{pmatrix} x_1 \\ \vdots \\ x_n\end{pmatrix} \right]_{\displaystyle\, \pi_i} \!\! = x_i \quad\Rightarrow\quad \left[M_\pi x \right]_{i} = x_{\pi^{-1}i}. \qquad(10)\]

\[\begin{align*} \left[M_\pi\begin{pmatrix} x_1 \\ \vdots \\ x_n\end{pmatrix} \right]_{\displaystyle\, \pi_i} \!\! = x_i \quad\Rightarrow\\[0.2cm] \left[M_\pi x \right]_{i} = x_{\pi^{-1}i}. \quad (10) \end{align*}\]

So left-multiplying by both \(M_\tau\) and \(M_\sigma\) works like so, using (10):

\[ \left[M_\tau M_\sigma x\right]_{i} = \left[M_\sigma x\right]_{\tau^{-1}(i)} = x_{\sigma^{-1}(\tau^{-1}(i))}. \qquad(11)\]

\[\begin{align*} \left[M_\tau M_\sigma x\right]_{i} &= \left[M_\sigma x\right]_{\tau^{-1}(i)} \\[0.2cm] &= x_{\sigma^{-1}(\tau^{-1}(i))}. \quad (11) \end{align*}\]

Writing \(M_{\sigma\tau}\) for the matrix of permutation \(\sigma\tau\), we have

\[ \left[M_{\sigma\tau} x\right]_{i} = x_{(\sigma\tau)^{-1}(i)} = x_{\sigma^{-1}(\tau^{-1}(i))}. \qquad(12)\]

Together, (11) and (12) confirm that: \[ M_{\sigma\tau} = M_\tau M_\sigma \] in the sense that they act the same in matrix multiplications, from which we can conclude that they are in fact identical matrices.

At long last, I’m ready to (re)define the \(\textsf{sign}()\) function, following Artin: \[\textsf{sign}(\pi) := \det(M_\pi),\] where \(\det()\) is the determinant function of a square matrix.

It turns out that this definition is also consistent with our earlier definitions because:

In using this approach, Artin is able to show that (6) holds simply by delegating the work to the properties of matrix determinants. In particular, using the fact that \(\det(AB) = \det(A)\det(B)\):

\[\textsf{sign}(\sigma\cdot\tau) = \det(M_\tau M_\sigma) = \det(M_\tau)\det(M_\sigma) = \textsf{sign}(\sigma)\textsf{sign}(\tau). \]

\[\begin{align*} \textsf{sign}(\sigma\cdot\tau) &= \det(M_\tau M_\sigma) \\[0.2cm] &= \det(M_\tau)\det(M_\sigma) \\[0.2cm] &= \textsf{sign}(\sigma)\textsf{sign}(\tau). \end{align*}\]

Artin’s proof is certainly valid, though I think it offers less insight into why equation (6) holds when compared to the cut-merge proof.

5 Cycle Structures

Let \(\sigma = (1\;3)(2\;4\;5)\) and \(\tau = (1\;4)(2\;3)\). Then \[\begin{align*} \sigma \tau &= (1\;2)(3\;4\;5), \text{ and} \\ \tau \sigma &= (1\;5\;2)(3\;4). \end{align*}\]

Notice that these two permutations each have one cycle of length 2 and another of length 3. Is this a coincidence? It turns out that this similarity of structure will always hold between \(\sigma \tau\) and \(\tau \sigma\). I’ll have to define a new concept before I can precisely formulate this relationship.

Definition \(\quad\) Given a permtuation \(\pi\), a cycle set \(S\) of \(\pi\) is a set of elements in the domain of \(\pi\) such that, for any \(a,b\in S\), \(\exists k:\pi^k(a)=b\). Notice that every element in the domain of \(\pi\) is in exactly one cycle set of \(\pi\).

The cycle structure \(\mathsf{cs}(\pi)\) of \(\pi\) is a function \(f:\mathbb{N}_{\ge 1}\to\mathbb{N}_{\ge 0}\) such that \(f(i) :=\) the number of cycle sets of \(\pi\) with \(i\) elements.

As a shorthand, I’ll write \((f(1), f(2), f(3), \ldots)\) to express a cycle structure. For example, when \(n=5\), the permutation \((1\;2)(3\;4\;5)\) has cycle structure \((0, 1, 1)\); so does the permutation \((1\;5\;2)(3\;4)\). Now I can state:

Observation 4 \(\quad\) For any permutations \(\sigma\) and \(\tau\), \[ \mathsf{cs}(\sigma\tau) = \mathsf{cs}(\tau\sigma). \]

I’ll provide two proofs because they each shed their own light on why this is true.

Proof 1 \(\quad\) Given permutation \(\pi\), I’ll write \(\textsf{order}_\pi(i)=k\) to mean that \(\pi^k(i)=i\), and there is no \(\ell < k : \pi^\ell(i)=i\). To prove the observation, it’s sufficient to show that there is a 1-1 and onto mapping \(f:[n]\to[n]\) such that \(\textsf{order}_{\sigma\tau}(i)=k \Leftrightarrow \textsf{order}_{\tau\sigma}(f(i))=k\).

Suppose \(\textsf{order}_{\sigma\tau}(i)=k\). Let \(j=\tau^{-1}(i)\). In the following equations, I’ll use a left-to-write, input-to-output notation, where \(i\pi\) means \(\pi(i)\) and \(i\sigma\tau\) means \(\tau(\sigma(i))\). We have: \[ i = i(\sigma\tau)^k = j\tau(\sigma\tau)^k = j(\tau\sigma)^k\tau. \] Now apply \(\tau^{-1}\) to the left-most and right-most expressions above to arrive at:

\[ j= i\tau^{-1} = j(\tau\sigma)^k \;\Rightarrow\; \textsf{order}_{\tau\sigma}(j) \le k = \textsf{order}_{\sigma\tau}(i). \qquad(13)\]

\[\begin{align*} j= i\tau^{-1} &= j(\tau\sigma)^k \\[0.2cm] \;\Rightarrow\; \textsf{order}_{\tau\sigma}(j) \le k &= \textsf{order}_{\sigma\tau}(i). \;(13) \end{align*}\]

On the other hand, if \(\ell=\textsf{order}_{\tau\sigma}(j)\), then

\[ i = (j)\tau = (j)(\tau\sigma)^\ell\tau = (j)\tau(\sigma\tau)^\ell = i(\sigma\tau)^\ell \] \[ \Rightarrow \textsf{order}_{\sigma\tau}(i) \le \ell = \textsf{order}_{\tau\sigma}(j). \qquad(14)\]

Combining (13) and (14), \[ \textsf{order}_{\sigma\tau}(i) = \textsf{order}_{\tau\sigma}(j=\tau^{-1}(i)), \] and it turns out that \(\tau^{-1}\) is the mapping \(f:[n]\to [n]\) we needed to complete the proof. \(\square\)

Proof 2 \(\quad\) Given any permutation product \(\sigma\tau\), we can decompose \(\tau\) into transpositions as in \(\tau = t_1\cdots t_k\). Now we can think in terms of a series of smaller steps that carry us from \(\sigma\tau\) over to \(\tau\sigma\): \[\begin{align*} \pi_0 &= \sigma\tau = \sigma\cdot t_1 \cdots t_k \\ \pi_1 &= t_k \cdot \sigma \cdot \tau \cdot t_k^{-1} = t_k\sigma t_1\cdots t_{k-1} \\ \vdots \\ \pi_k &= t_1\cdots t_k \sigma \tau t_k^{-1}\cdots t_1^{-1} = \tau\sigma. \end{align*}\] Each small step here can be characterized as \(\pi_{i+1} = t_{k-i}\pi_i t_{k-i}\), using the fact that \(t_j^{-1}=t_j\), which is true for any transposition. So the problem is reduced to showing that, for any permutation \(\pi\) and transposition \(t\), \[ \mathsf{cs}(\pi) = \mathsf{cs}(t \pi t). \qquad(15)\] It turns out that applying \(t=(x\;y)\) in this manner simply swaps the places of \(x\) and \(y\) in the cycle notation of \(\pi\). For example: \[ (1\;3)(\underline{1}\;2\;5\;\underline{3}\;7)(1\;3) = (\underline{3}\;2\;5\;\underline{1}\;7). \] This confirms (15), which completes the proof. \(\square\)

Notice that observation 4 can be generalized for any rotation of any finite product of permutations. For example,

\[ \mathsf{cs}(abcd) = \mathsf{cs}(bcda) = \mathsf{cs}(cdab) = \mathsf{cs}(dabc). \]

\[\begin{align*} \mathsf{cs}(abcd) &= \mathsf{cs}(bcda) \\[0.2cm] = \mathsf{cs}(cdab) &= \mathsf{cs}(dabc). \end{align*}\]

The equation \(\mathsf{cs}(\sigma\tau) = \mathsf{cs}(\tau\sigma)\) makes it tempting to suspect that cycle structure is preserved no matter what order we multiply a given list of permutations. For example, we could ask if the following identities would hold for any three permutations \(a,b,c\):

\[ \mathsf{cs}(abc) \stackrel{?}{=} \mathsf{cs}(acb) \stackrel{?}{=} \mathsf{cs}(bac) \stackrel{?}{=} \mathsf{cs}(bca), \text{etc.} \]

\[\begin{align*} \mathsf{cs}(abc) &\stackrel{?}{=} \mathsf{cs}(acb) \\[0.2cm] \stackrel{?}{=} \mathsf{cs}(bac) &\stackrel{?}{=} \mathsf{cs}(bca), \text{etc.} \end{align*}\]

for all the orderings of \(a,b,c\); and similarly for any finite list of permutations.

But this is not true in general. For example, if \(a = (1\;2\;3)\), \(b = (1\;2)\), \(c=(1\;3)\), then \[ \begin{alignedat}{2} a\;b\;c &= (1\;2\;3)(1\;2)(1\;3) &&= (1\;3\;2) \\ c\;b\;a &= (1\;3)(1\;2)(1\;2\;3) &&= e. \end{alignedat} \] There is a natural follow-up question to observation 4 which I can state clearly once I’ve provided a couple new definitions.

Given any permutation \(\pi\), let \[\textsf{flips}(\pi) := \{b\cdot a: \pi = a\cdot b\}. \] And let

\[ \textsf{same$\\\_$cycles}(\pi) := \{\sigma\in S_n : \mathsf{cs}(\sigma)=\mathsf{cs}(\pi)\}. \]

\[\begin{alignat*}{2} & \textsf{same$\\\_$cycles}(\pi) \\[0.2cm] & \qquad := \{\sigma\in S_n : \mathsf{cs}(\sigma)=\mathsf{cs}(\pi)\}. \end{alignat*}\]

Question 1 \(\quad\) Observation 4 tells us that, for any \(\pi \in S_n\), \[ \textsf{flips}(\pi) \subset \textsf{same$\\\_$cycles}(\pi). \] Are these sets actually equal?

This is similar to asking: Given \(\pi\) and the freedom to decompose it as \(\pi=a\cdot b\), what are all the possible values of \(b\cdot a\)?

The answer arises from a simple permutation decomposition. Let’s start with:

Observation 5 \(\quad\) The cycle notation of \(\tau^{-1}\sigma\tau\) is the same as the cycle notation of \(\sigma\) after each element \(i\) is replaced with \((i)\tau\).

The observation is true because \[ (i\tau)(\tau^{-1}\sigma\tau) = i\sigma\tau; \] that is, if \(\sigma' := \tau^{-1}\sigma\tau\) and \(\sigma : i \mapsto j\), then \(\sigma' : i\tau \mapsto j\tau\).

For example, if \(\tau = (2\;5\;3)\) and \(\sigma = (1\;2)(3\;4\;5)\), then

\[ \tau^{-1}\sigma\tau = (1\;5)(2\;4\;3) = [(1\;2)(3\;4\;5)].\text{replace}(\tau).\]

\[\begin{align*} \tau^{-1}\sigma\tau &= (1\;5)(2\;4\;3) \\ &= [(1\;2)(3\;4\;5)].\text{replace}(\tau). \end{align*}\]

Note that this replacement operation is specific to cycle notation. If we were to instead write out \(\pi\) as the string \(\pi_1 \pi_2 \ldots \pi_n\), then the corresponding string for \(\tau^{-1}\sigma\tau\) is no longer replacement with \(\tau\). I’ll illustrate this distinction with an example:

\[ \begin{alignedat}{2} \tau &= (2\;5\;3) \\ \sigma &= 2\;1\;4\;5\;3 && \phantom{ = }\text{ (This is $\sigma_1\sigma_2\sigma_3\sigma_4\sigma_5$.)} \\ \tau^{-1}\sigma\tau &= 5\;4\;2\;3\;1 && \ne [2\;1\;4\;5\;3].\text{replace}(\tau) \\ \sigma\tau &= 5\;1\;4\;3\;2 && = [2\;1\;4\;5\;3].\text{replace}(\tau). \end{alignedat} \]

\[ \begin{alignedat}{2} \tau &= (2\;5\;3) \\ \sigma &= 2\;1\;4\;5\;3 \\ & \phantom{ = }\text{ (This is $\sigma_1\sigma_2\sigma_3\sigma_4\sigma_5$.)} \\[0.5cm] \tau^{-1}\sigma\tau &= 5\;4\;2\;3\;1 \\ &\ne [2\;1\;4\;5\;3].\text{replace}(\tau) \\[0.5cm] \sigma\tau &= 5\;1\;4\;3\;2 \\ &= [2\;1\;4\;5\;3].\text{replace}(\tau). \end{alignedat} \]

To help build intuition for the set \(\textsf{same$\\\_$cycles}(\pi)\), we can choose a canonical “first” element of \(\textsf{same$\\\_$cycles}(\pi)\). A simple way to do this it to treat all singleton-free cycle notations in \(\textsf{same$\\\_$cycles}(\pi)\) as strings, treating the “)” character as the last in the alphabet (we’ll ignore the “(” character for the sake of ordering). Then we can define \(\textsf{canon}(\pi)\) to be the lexicographically first element of \(\textsf{same$\\\_$cycles}(\pi)\). For example, \(\textsf{canon}(3\;4\;2)(1\;5) = (1\;2\;3)(4\;5)\). If we had not specified the ordering of “)”, then it would be unclear if \((1\;2)(3\;4\;5)\) or \((1\;2\;3)(4\;5)\) were first.

Now we’re ready for:

Answer to Question 1 \(\quad\) Given any \(\sigma\in\textsf{same$\\\_$cycles}(\pi)\), we can decompose \(\pi=a\cdot b\) so that \(b\cdot a=\sigma\). That is, \(\textsf{flips}(\pi) = \textsf{same$\\\_$cycles}(\pi)\).

Proof \(\quad\) We can use observation 5 to find \(\tau\) so that \[ \pi = \tau^{-1}\sigma\tau. \qquad(16)\] Let \(a = \tau^{-1}\) and \(b=\tau\pi\). Then \[ \begin{alignedat}{2} ab &= \pi \\ ba &= \tau\pi\tau^{-1} = \sigma\quad\text{by (16).} \end{alignedat} \] \(\square\)

The main idea of this proof is to use observation 5 to see that \(\mathsf{cs}(\tau^{-1}\sigma\tau) = \mathsf{cs}(\sigma)\) for any \(\tau\). As the proof shows, by choosing the right \(\tau\), we can make \(\tau^{-1}\sigma\tau\) equal to anything in \(\textsf{flips}(\sigma)\). And anything of the form \(\tau^{-1}\sigma\tau\) is in \(\textsf{flips}(\sigma)\) by considering \(a=\tau\) and \(b=\tau^{-1}\sigma\).

Note that the \(\tau\) used in the proof may not be unique because there are different cycle notation strings for the same permutation. Consider \(\pi=(1\;2)\) and \(\sigma=(3\;4)\). We can convert \(\pi\) into \(\sigma\) in different ways:

\[ \begin{alignedat}{2} & (1\;\;2) \quad\quad && (1\;\;2) \\ & \kern 1pt\downarrow\;\;\downarrow \;\;\tau=(1\;3)(2\;4) \quad\quad && \kern 1pt\downarrow\;\;\downarrow \;\;\tau'=(1\;4)(2\;3) \\ & (3\;\;4) && (4\;\;3) \end{alignedat} \]

\[ \begin{alignedat}{2} & (1\;\;2) \quad\quad \\ & \kern 1pt\downarrow\;\;\downarrow \;\;\tau=(1\;3)(2\;4) \\ & (3\;\;4) \\ \phantom{.} \end{alignedat} \] \[ \begin{alignedat}{2} & (1\;\;2) \\ & \kern 1pt\downarrow\;\;\downarrow \;\;\tau'=(1\;4)(2\;3) \\ & (4\;\;3) \end{alignedat} \]

The above diagram is an informal way of writing that \[ \tau^{-1}\pi\tau = (3\;4) = (4\;3) = \tau'^{-1}\pi\tau'. \]

6 The Magnitude is Like a Norm

In this section I’ll point out some interesting properties of the magnitude function.

Observation 6 \(\quad\) For any permutations \(\pi\) and \(\sigma\):

  1. \(m(\pi) \ge 0\); and \(m(\pi) = 0\) iff \(\pi=e\).
  2. \(m(\pi\sigma) \le m(\pi) + m(\sigma).\)
  3. \(m(\pi^{-1}) = m(\pi).\)
  4. \(m(\pi\sigma) = m(\sigma\pi).\)

Proof \(\quad\) Start by noticing that \(m(\pi)\) is the least number of transpositions by which we can get from \(e\) to \(\pi\); ie, \(m(\pi)\) is the least integer for which we can write \(\pi=\prod_{i=1}^{m(\pi)}t_i\) for some transpositions \((t_i)_1^{m(\pi)}\).

Considering each of \(\pi\) and \(\sigma\) as a composition of this minimal number of transpositions, parts (i) and (ii) are straightforward.

Part (iii) follows since the cycle notation for \(\pi^{-1}\) is simply the reverse of the cycle notation for \(\pi\).

Part (iv) follows from observation 4. \(\square\)

In many ways, the magnitude of a permutation acts like a vector norm — the key difference is that we don’t have a meaningful interpretation of \(k\pi\) for any \(k\in\mathbb{R}\), so we don’t have the additional rule \(m(k\pi) = |k|\cdot m(\pi)\).

This makes it interesting to define a notion of distance between permutations:

Definition \(\quad\) Given permutations \(\pi\) and \(\sigma\), \[\mathsf{dist}(\pi, \sigma) := m(\pi^{-1}\sigma).\]

I’ll justify why this definition makes sense. Let \(x=\pi^{-1}\sigma\). Then \(\pi x=\sigma\), so \(\mathsf{dist}(\pi, \sigma)\) is measuring the fewest “hops” — transpositions — needed to get from \(\pi\) to \(\sigma\).

We’d like a good distance function to be symmetric, and luckily for us, this one is:

\[\mathsf{dist}(\sigma,\pi) = m(\sigma^{-1}\pi) = m(\pi^{-1}\sigma) = \mathsf{dist}(\pi, \sigma),\]

\[\begin{align*} \mathsf{dist}(\sigma,\pi) &= m(\sigma^{-1}\pi) = m(\pi^{-1}\sigma) \\ &= \mathsf{dist}(\pi, \sigma), \end{align*}\]

the middle equality following since \(m(\pi) = m(\pi^{-1})\) by observation 6. This symmetry addresses the apparent inelegance of the definition; indeed, the definition may appear to treat \(\pi\) and \(\sigma\) differently, but the end result does not depend on the order of these inputs to \(\mathsf{dist}()\).

To conclude this note, I’ll ask a fun and intriguing question whose answer I don’t yet know:

Question 2 \(\quad\) Is there a way to generalize \(S_n\) and \(m(\cdot)\) so that the product \(k\cdot\pi\) makes sense for some values of \(k\in\mathbb{R}\), and so that \(m(k\cdot\pi) = |k|\cdot m(\pi)\)?


Artin, M. 1991. Algebra. Prentice Hall.
Herstein, I. N. 1990. Abstract Algebra. Macmillan Pub.

You can get emails about new posts if you'd like: