### there is no probability theory - only zuul!

The curve described by a simple molecule of air or vapor is regulated in a manner just as certain as the planetary orbits; the only difference between them is that which comes from our ignorance.

Probability is relative, in part to this ignorance, in part to our knowledge.

Probability is relative, in part to this ignorance, in part to our knowledge.

-Pierre Simon de Laplace, in A Philosophical Essay on Probabilities

I've always been fascinated by the question of determinism. Is everything about the future already contained in the state of the world today?

Addressing this question, one has to come to terms with what it could mean for the world to be non-deterministic. Intuitively, it means that certain events are non-predictable - we can think of them as random. And here is where we have to confront the fact that any useful model of the world, even a probabilistic one, has to be formally deterministic.

Why? Because, in math, there is no random. When a mathematician or statistician studies random behavior, they consider functions whose input is thought of as random. And by thinking of an input as random, the intuition and practical applications follow.

For example, suppose you ask what the standard deviation is for choosing the number 5 a third of the time, and the number 17 the rest of the time. You can model this with a function f:[0,1]->{5,17} which maps [0,1/3) to 5 and [1/3,1] to 17. From here you can study the random behavior of this experiment to your heart's content. The point is that the function f is completely deterministic, and there is nothing here to give us a single example of an actually random number.

So when it comes to a probabilistic model of the universe, we would still have to use a deterministic function. To include the idea of randomness, we could add an extra input, which we think of as unknown or external to the world. For a moment, pretend that the world moves forward in discrete time units. Let x be the state of the world, y an unknown input, and the formula

x' = f(x,y)

gives the state of the world a moment later. The presence of y is the non-determinism.

The thing that's easy to forget is that probability theory does not dictate the way the world will behave. Rather, it is nothing more than a way to do something with a balance of partial knowledge, and an awareness of our own ignorance.

Here's a quick thought experiment to help illustrate the difference:

Choose a number between 1 and 1000. I'll also choose a number in the same range. What's the probability that they're the same?

It's a trick question. They either are the same, or they're not. Of course, if you assume that we're both equally likely to choose any of the 1000 numbers, then you can say the odds are 1 in 1000. But the point is that this model is entirely in your mind. Since it's a one-off experiment, and we don't really even know that these numbers are chosen in any manner we could call random, there's no real basis for that probability model. Maybe I always choose 7 and you always choose 23, and the numbers were bound to be mismatched from the start. The point is that, while probability is incredibly useful, it is wise to keep in mind - as Laplace did - that it is a mitigation of the unknown.

One last thought experiment toward comprehending what a probabilistic model might mean:

Some physicists play with the idea of parallel universes, so let's do something like that. Suppose I toss a coin (heads/tails = H/T) several times in a row, and write down the outcomes in order. If each outcome is random, then we can model that in at least two different ways. One way is to assume that there is an external source of randomness, and that every result pulls in information from that random source - this would be analogous to the extra input to the function f(x,y) as above. Another way is to assume that every time a choice is to be made, the universe forks into two parallel versions: one in which the coin lands H, and another where it lands T. There is never a decision to be made, so no extra information is needed. Which model is better? Is there a way to choose between these two models, if we were in that world?

There is basically no difference. Why not? Try this out: write down every H/T possibility for any number of coin tosses. For 3 tosses, you would get HHH, HHT, HTH, HTT, THH, THT, TTH, TTT. Most of those lists look random. In fact, you wouldn't be amazed at any of them. Do the same thing for 50 tosses in a row. 50 heads in a row would be surprising, but the number of lists besides that one is orders of magnitudes larger than the total number of humans who have ever lived so far.

In other words, no matter what, most of the outcomes will look random. In fact, they have to, by any reasonable definition of looking random. This is simply because a probability is nothing more than the number of outcomes including a certain event divided by the total number of outcomes (implicitly assuming they are equally likely). So any very likely event, such as there being about as many tails as heads, will by definition include most of the H/T sequences you write down.

A random model looks the same as a parallel model.

So when I say there is no probability theory (only zuul), what I really mean is: probability theory is just the study of deterministic functions where you really don't know the input, but you pretend to know it's distribution.

Addressing this question, one has to come to terms with what it could mean for the world to be non-deterministic. Intuitively, it means that certain events are non-predictable - we can think of them as random. And here is where we have to confront the fact that any useful model of the world, even a probabilistic one, has to be formally deterministic.

Why? Because, in math, there is no random. When a mathematician or statistician studies random behavior, they consider functions whose input is thought of as random. And by thinking of an input as random, the intuition and practical applications follow.

For example, suppose you ask what the standard deviation is for choosing the number 5 a third of the time, and the number 17 the rest of the time. You can model this with a function f:[0,1]->{5,17} which maps [0,1/3) to 5 and [1/3,1] to 17. From here you can study the random behavior of this experiment to your heart's content. The point is that the function f is completely deterministic, and there is nothing here to give us a single example of an actually random number.

So when it comes to a probabilistic model of the universe, we would still have to use a deterministic function. To include the idea of randomness, we could add an extra input, which we think of as unknown or external to the world. For a moment, pretend that the world moves forward in discrete time units. Let x be the state of the world, y an unknown input, and the formula

x' = f(x,y)

gives the state of the world a moment later. The presence of y is the non-determinism.

The thing that's easy to forget is that probability theory does not dictate the way the world will behave. Rather, it is nothing more than a way to do something with a balance of partial knowledge, and an awareness of our own ignorance.

Here's a quick thought experiment to help illustrate the difference:

Choose a number between 1 and 1000. I'll also choose a number in the same range. What's the probability that they're the same?

It's a trick question. They either are the same, or they're not. Of course, if you assume that we're both equally likely to choose any of the 1000 numbers, then you can say the odds are 1 in 1000. But the point is that this model is entirely in your mind. Since it's a one-off experiment, and we don't really even know that these numbers are chosen in any manner we could call random, there's no real basis for that probability model. Maybe I always choose 7 and you always choose 23, and the numbers were bound to be mismatched from the start. The point is that, while probability is incredibly useful, it is wise to keep in mind - as Laplace did - that it is a mitigation of the unknown.

One last thought experiment toward comprehending what a probabilistic model might mean:

Some physicists play with the idea of parallel universes, so let's do something like that. Suppose I toss a coin (heads/tails = H/T) several times in a row, and write down the outcomes in order. If each outcome is random, then we can model that in at least two different ways. One way is to assume that there is an external source of randomness, and that every result pulls in information from that random source - this would be analogous to the extra input to the function f(x,y) as above. Another way is to assume that every time a choice is to be made, the universe forks into two parallel versions: one in which the coin lands H, and another where it lands T. There is never a decision to be made, so no extra information is needed. Which model is better? Is there a way to choose between these two models, if we were in that world?

There is basically no difference. Why not? Try this out: write down every H/T possibility for any number of coin tosses. For 3 tosses, you would get HHH, HHT, HTH, HTT, THH, THT, TTH, TTT. Most of those lists look random. In fact, you wouldn't be amazed at any of them. Do the same thing for 50 tosses in a row. 50 heads in a row would be surprising, but the number of lists besides that one is orders of magnitudes larger than the total number of humans who have ever lived so far.

In other words, no matter what, most of the outcomes will look random. In fact, they have to, by any reasonable definition of looking random. This is simply because a probability is nothing more than the number of outcomes including a certain event divided by the total number of outcomes (implicitly assuming they are equally likely). So any very likely event, such as there being about as many tails as heads, will by definition include most of the H/T sequences you write down.

A random model looks the same as a parallel model.

So when I say there is no probability theory (only zuul), what I really mean is: probability theory is just the study of deterministic functions where you really don't know the input, but you pretend to know it's distribution.