thinking

My Photo
Name: Tyler
Location: Mountain View, California, United States

thinking := [life, games, movies, philosophy, math, coding, pizza, &c.]

Thursday, April 30, 2009

there is no probability theory - only zuul!

The curve described by a simple molecule of air or vapor is regulated in a manner just as certain as the planetary orbits; the only difference between them is that which comes from our ignorance.

Probability is relative, in part to this ignorance, in part to our knowledge.


-Pierre Simon de Laplace, in A Philosophical Essay on Probabilities

I've always been fascinated by the question of determinism. Is everything about the future already contained in the state of the world today?

Addressing this question, one has to come to terms with what it could mean for the world to be non-deterministic. Intuitively, it means that certain events are non-predictable - we can think of them as random. And here is where we have to confront the fact that any useful model of the world, even a probabilistic one, has to be formally deterministic.

Why? Because, in math, there is no random. When a mathematician or statistician studies random behavior, they consider functions whose input is thought of as random. And by thinking of an input as random, the intuition and practical applications follow.

For example, suppose you ask what the standard deviation is for choosing the number 5 a third of the time, and the number 17 the rest of the time. You can model this with a function f:[0,1]->{5,17} which maps [0,1/3) to 5 and [1/3,1] to 17. From here you can study the random behavior of this experiment to your heart's content. The point is that the function f is completely deterministic, and there is nothing here to give us a single example of an actually random number.

So when it comes to a probabilistic model of the universe, we would still have to use a deterministic function. To include the idea of randomness, we could add an extra input, which we think of as unknown or external to the world. For a moment, pretend that the world moves forward in discrete time units. Let x be the state of the world, y an unknown input, and the formula

x' = f(x,y)

gives the state of the world a moment later. The presence of y is the non-determinism.

The thing that's easy to forget is that probability theory does not dictate the way the world will behave. Rather, it is nothing more than a way to do something with a balance of partial knowledge, and an awareness of our own ignorance.

Here's a quick thought experiment to help illustrate the difference:

Choose a number between 1 and 1000. I'll also choose a number in the same range. What's the probability that they're the same?

It's a trick question. They either are the same, or they're not. Of course, if you assume that we're both equally likely to choose any of the 1000 numbers, then you can say the odds are 1 in 1000. But the point is that this model is entirely in your mind. Since it's a one-off experiment, and we don't really even know that these numbers are chosen in any manner we could call random, there's no real basis for that probability model. Maybe I always choose 7 and you always choose 23, and the numbers were bound to be mismatched from the start. The point is that, while probability is incredibly useful, it is wise to keep in mind - as Laplace did - that it is a mitigation of the unknown.

One last thought experiment toward comprehending what a probabilistic model might mean:

Some physicists play with the idea of parallel universes, so let's do something like that. Suppose I toss a coin (heads/tails = H/T) several times in a row, and write down the outcomes in order. If each outcome is random, then we can model that in at least two different ways. One way is to assume that there is an external source of randomness, and that every result pulls in information from that random source - this would be analogous to the extra input to the function f(x,y) as above. Another way is to assume that every time a choice is to be made, the universe forks into two parallel versions: one in which the coin lands H, and another where it lands T. There is never a decision to be made, so no extra information is needed. Which model is better? Is there a way to choose between these two models, if we were in that world?

There is basically no difference. Why not? Try this out: write down every H/T possibility for any number of coin tosses. For 3 tosses, you would get HHH, HHT, HTH, HTT, THH, THT, TTH, TTT. Most of those lists look random. In fact, you wouldn't be amazed at any of them. Do the same thing for 50 tosses in a row. 50 heads in a row would be surprising, but the number of lists besides that one is orders of magnitudes larger than the total number of humans who have ever lived so far.

In other words, no matter what, most of the outcomes will look random. In fact, they have to, by any reasonable definition of looking random. This is simply because a probability is nothing more than the number of outcomes including a certain event divided by the total number of outcomes (implicitly assuming they are equally likely). So any very likely event, such as there being about as many tails as heads, will by definition include most of the H/T sequences you write down.

A random model looks the same as a parallel model.

So when I say there is no probability theory (only zuul), what I really mean is: probability theory is just the study of deterministic functions where you really don't know the input, but you pretend to know it's distribution.

Tuesday, March 24, 2009

experimental integrity and the search for causality

The phrase the scientific method implies that there is some universal, automated process that investigators blindly follow in order to do science. In truth, there is a great deal of improvisation and creativity required for the doing of good science. Great leaps forward, such as general relativity or the complex (as in complex numbers) proof of the prime number theorem, often rely on bold, inspired insights into the nature of an unsolved problem.

However, there are a few common principles that unite the rational attitudes of modern research. I want to highlight a few that I feel are somewhat neglected. They are:
  • experimental candor,
  • easily reproducible experiments, and
  • induced correlation.
Experimental candor

Here's a nice way to get great results: suppose you think that drug A will help people lose weight. Conduct a thousand studies on small groups of test subjects. Suppose one of those studies shows good results - publish those good results, and throw away the rest of the results.

This may sound a bit unrealistic, but something like this can happen much more easily in computer science. In this case, there is a growing field of algorithms which are both probabilistic and approximate - very similar to experimental drugs in medicine. If they do pretty well most of the time, that's good enough. Yet with an algorithm, it's incredibly easy to run a million trials of your code, and only publish the best subset of that. Even if the quality of your results are completely random, it's just a matter of time before one small subset of the test results look good.

Hence the need for experimental candor. It's important to reveal all the relevant experiments performed, including the negative or inconclusive ones. The web is the perfect platform for this kind of data disclosure - you can pre-publish your intended experiments and hypotheses before you actually run the experiments. This way, good results look better, and other researchers won't waste time on previously failed experiments. Of course, it's always possible that an experiment failed for unaccounted-for parameters (including human error), which is why experimental reproducibility is also crucial to good research.

Easily reproducible experiments

This scientific tenet is well agreed upon, but poorly executed. In practice, I know of very few experiments which can be very easily reproduced at the research level. In some cases, one may wish to build upon the work of another, such as by augmenting a biochemical procedure with a new step. Articles involving experimental lab work do indeed contain careful procedural explanations meant just for this purpose, which is great. But in many cases, even this is not enough for other researchers - in my days as a grad student, I would see other grad students emailing or calling other investigators (often ones who were considered serious competitors) to ask for critical clarifications in procedure.

We can do better than that.

I'm going to pick on computer scientists for a moment, because they're the worst offenders. An algorithmic experiment has the most potential to be easily reproducible. Ironically, it seems typical to leave out necessary parameters to perform the experiments used in many papers. In order to reproduce a certain graph of time complexity versus input size on a certain real-world dataset, for example, a reader will often have to code up the algorithm based on very vague pseudocode and hand-wavy explanations, guess at parameter values, and separately download the dataset. I've even seen code used which was nowhere available in either pseudocode or executable code - the reference given was by personal communication with another researcher (who won't answer my emails).

There is no excuse for this. Any good algorithmic experiment can be reproducible at the click of a button. The experimenters have already written the code - it is simply a matter of adding a link to this code to a website. It would be friendly to add a little documentation; or better yet, to follow a pattern of operation for the field, in much the same way that some software installation procedures have become standardized.

Induced correlation

This point is a call for the conscious recognition of an idea that's been implicitly used for some time.

Certain experiments have the goal of looking for something like a causal relationship. If a drug company is testing a weight-loss drug, they want to know that their drug causes the weight loss, as opposed to it causing something else, or something else causing the weight loss.

Unfortunately, there's no fool-proof way to experimentally test causality. This is a well-known problem. It's also interesting to note that, philosophically, causality itself is subjective in nature, although that is the matter of another post.

Here's the trouble: Let's hypothesize that chemical X causes weight gain. As an experiment, get a large group of people together. We randomly select some folks as the control - they won't change their diets, and we randomly select some others to change their diet to no longer consume chemical X. We see the desired results: the control group gains a little weight on average, but the experimental group (no chemical X) actually loses some.

Does that mean anyone can prevent weight gain by avoiding chemical X? Absolutely not. Here is one possible explanation: Suppose that the vast majority of foods contain both chemicals X and Y together, or not at all. So when the experimental group avoided X, they were also avoiding Y without knowing it. Now you unleash your study on the world, and everyone starts avoiding X. But there are some foods with chemical Y in it, without X. It could happen that those foods become more popular, or that certain people subconsciously crave Y. In either case, we have people consuming Y, not X, and gaining weight.

Is there anything we can do to experimentally show something stronger than mere correlation? A little bit, yes - we can show induced correlation. This is a correlation between parameters which was observed specifically by either turning on or off the cause in each trial, and purposefully leaving all other known parameters the same. Let's use the term natural correlation to indicate experiments where the cause was either present or absent without any control by the experimenters. Induced correlation gives more evidence of causality than natural correlation since there is more evidence that we can control the effect by controlling the cause.

I think this general idea has been understood already, but I'm not sure that it has been explicitly recognized. My goal throughout this post has been to encourage the codification and emulation of a few good core principles of scientific investigation. There are definitely more key principles, although I've been reminded many times that at least these three could use a little more awareness and observation.

Friday, March 06, 2009

thoughts on junk DNA

It's interesting to think of DNA as the source code for life. A lot of ideas fall into place nicely with this analogy.

You need some sort of compiler or interpreter; this role is given to RNA. You need a basic set of atomic instructions, and something like labels to certain parts of the code base - pointers into memory. Codons are the instruction set, with start codons helping to act as labels. A central processing unit executes the commands - ribosomes turn the codon sequences into proteins, and the proteins interact to achieve various goals. Chemistry itself is the ultimate processor, but it takes more focused form in the complex interaction of the enzymes produced by the DNA. Some of the proteins act as inhibitors, decreasing the activity of enzymes; others are activators, doing the opposite. These constructed molecules are capable of effecting or halting the production of still other amino acid complexes. The end result is a logically sophisticated dance worthy of the millennia of evolution which produced it.

As I write code on my own, in an experimental fashion, I sometimes don't worry about the readability of the code. It is in this scenario that the evolution of source code best matches that of DNA. There is a small cost to having extra/old code, yes, but it is far outweighed by the raw functionality created.

Looking at some source which has grown up just a little bit, mostly unsupervised, offers a few suggestions about bits of information that may, at first glance, appear non-functional (aka junk DNA):
  • Old functions which are never or rarely ever called

    As code evolves, some functions become less useful, or replaced by newer ones. It would make sense that some codon sequences would become obsolete, and the encoding would remain in the DNA.

  • Literal strings and other initialization data

    There might be a bit of initialization data in DNA - information not obviously functional, yet still used. For example, some DNA may only be active for a very short time when an embryo is first developing, or triggered temporarily at certain key development stages. An even more interesting hypothesis is the possibility that some instincts, or primal knowledge, are somehow encoded in DNA, in a manner somewhat different than traditional protein transcription.

  • Debug code

    Debug code is useful for figuring out what part of a process has failed. Although there may not be a conscious debugger to check the output, we could still hypothesize that a little extra information about each step in a procedure could give enough information to locate and react to a failure or attack in the system. In this case, the usually non-functional code would be rarely and temporarily activated as a defensive mechanism.

Tuesday, February 24, 2009

top movies of 2008

There are two things that make a movie worth seeing: because it moves you or makes you think by reflecting on reality, or because it entertains you by helping to escape reality.

If we are pained by a sympathetic situation that was experienced by someone in history, or alive today; or if we feel vicarious joy for a simple act of triumph (say, winning a spelling bee), the reality of the situation, symbolic or literal, is a key factor in our empathy. We are moved because this is the way life really is. On some level, we can relate to the plights and victories of these characters.

On the other hand, it's nice to tickle your imagination from time to time with an escape. We don't really believe Indiana Jones could easily be real, or that Spider Man might one day exist. Nor do we expect a monster like Godzilla to ever attack a nearby city (after all, Tokyo is far away for most people). The entertainment here lies with a contrast to reality. Everyday lives are kind of boring. Monsters don't attack, nobody wins the lottery, most days you don't fall in true love for the first time, or find an alternate dimension, or save the world. But it can be cool to daydream.

The best movies work with these principles - they choose a side. Sometimes you can mix these two aspects, but you have to be careful about it. If a piece of a film is just-for-fun, there's no harm in bending - or even reversing - reality. But if you're trying to move your audience, trying to comment on a state of the human condition, you have to be more careful. Symbolism and admitted exaggeration can work, because we understand the reality being represented. But to toy with reality to suite the message of the film is to defeat your own purpose. For example, Syriana presents a terribly bleak and pessimistic view of political and industrial intrigue. We are lead to believe that this situation could be real, but it felt like they were stretching a little too far; as a viewer I felt bereft of both entertainment and reality.

This is some background for my top movie picks of 2008. I think in ten years, these movies will still be worth watching, while a lot of other highly anticipated films from this year will be forgotten.

  1. Gran Torino

    Clint Eastwood knows film. He's been involved in about a half century of movie evolution, and I think he's kept the good parts of more traditional film alive here. The film says a lot without being overly symbolic, and the characters are visceral and quotable without feeling cliched. This is a contemporary, somewhat realistic (even if parabolically so) film about redemption and opportunity. It's good because we leave the theatre feeling for the story, not worrying about its plausibility. There are no u-turns or magic revelations. Everyone is flawed or troubled, and nobody wins everything. Yet there are pieces of fun, of power, of thought, of sacrifice, and of compassion.

  2. Wall•e

    Only Pixar could seriously attempt this: Let's make a dialogue-free, post-apocalyptic love story between two robots caught in a conspiracy that might crush the threadbare hopes of the space-stranded remnants of humanity. For kids. And somehow it works. Wall•e is visually rewarding, touching, whimsical, nostalgic, and engrossing. It's sci-fi speculation is escapist entertainment first, and social commentary far second. It gets away with allegorical statements on the irresponsibility of humanity because the reality in it is not presented as the truth, but rather as a kind of cautionary fable.

  3. Iron Man

    There are two common superhero movie mistakes: they don't know how seriously to take themselves, and the heroes are often portrayed as everyday people who happen to have a heart of pure gold. This film tackles that second mistake - Tony Stark is neither your everyday guy, nor endowed with such heart. We like him because, unlike our super/spider/batmen, when we become iron man, we don't have to shoulder the great responsibility of great power, and we don't have to cower under a shroud of modesty. We can just do our thing and enjoy the moment. Somehow I find Tony Stark more realistic and more entertaining at the same time. Of course, this film is not about the human condition - to spell out the obvious, this is just for fun. And it succeeds.

  4. Be Kind Rewind

    See The Science of Sleep before you see Be Kind Rewind. Michel Gondry is a child with the ability to turn his daydreams into movies, and to really appreciate the world you've entered, it helps to speak the language. This one got a number of poor reviews because it's outside the realm of normal moviedom for casual viewers. It's unusual Gondryan style is cubism in crayon. And this is the subtle genius of it. When a critic is confused, they have to decide if it's because the movie is above them or below them to avoid looking dumb. With Be Kind Rewind, the confusion is simply a different narrative medium -- the film is just for fun, but seriously so. If you try to use Duchamp's fountain the way you're used to, you'll be missing the point.

    Enough defense. Be Kind Rewind is good because it's fun. The characters and the plight - the foundations - are tangible. Beyond this - the devices and exposition - there is not much pretension of reality. The key components are in place - what's real is what moves us, what's art is what makes us laugh.

  5. Wanted

    Like Iron Man, Wanted breaks the chains of the stereotypical hero movie. In this case, it really doesn't take itself too seriously. On top of this, the dramatic tension is very personal - Wesley (our protagonist) desperately wants to avenge the death of his father. Saving the world takes the backseat. It works because it doesn't really bother with the less entertaining aspects of the world - things like the rules of physics applied to bullet trajectories, or oracles more traditional than giant looms.

Honorable mentions: Cloverfield, Kung Fu Panda, Pineapple Express

Monday, January 26, 2009

Enter The Tangent Space (dot com)

Here's a new site for exploring cool math ideas: thetangentspace.com

My blog posts here have often alternated between technically detailed mathy or algorithmic thoughts, and more informal musings on life, the world, and interesting things in it. In my mind, there are really two audiences - one for the mathy stuff, and one for everything else. So it probably makes sense to have two places to put these different thoughts. From now on, my posts here will be less mathy, and I'll feel free to go math-crazy (or algorithm-crazy, as the case may be) on thetangentspace.

I'm trying a new type of blog with thetangentspace. It's about math research, and research is about communicating and collaborating. Even if it's a slow channel, it's an interactive process. So thetangentspace is both a blog and a wiki. The blog is meant as an easy stream of intuitive ideas - something you can keep up with, without investing too much thought. The wiki is where the details go - the full proofs and formal definitions. It's also a place for other mathematicians to make significant additions - beyond what you can leave in the comments of a blog - using the same software as wikipedia. My hope is that some of the ideas and questions I post will inspire others to build on these initial offerings.

Check it out!
thetangentspace.com

Thursday, January 01, 2009

mathskool.com

I just launched the alpha version of mathskool.com.

This is a website I've been working on for the past month, meant to help connect great math teachers with motivated middle and high school students. The idea is to provide a centralized library that many math teachers can contribute to, and which gives students free access to short, focused videos. I imagine teachers recommending them as supplementary material to classes, or students searching for a single particular topic while stuck on their homework or studying for a test, or even curious people learning new things on their own.

I plan to continue adding features and videos to this site gradually over time. YouTube and other math-oriented sites already offer videos, but I think mathskool is unique in focusing on math education, being free, and encouraging a more interactive community with a nice question/answer system. For now I've included a few videos of my own, and several from other people.

Let me know if you know any math teachers who might be interesting in using or contributing to the site. The next step is to start building a community of users - teachers and students.

Check it out! mathskool.com

Tuesday, December 30, 2008

the scaled interest principle

Here's an idea that I've seen in action throughout my life, although I've never seen it explicitly put into words:
Events of interest tend to happen more quickly at smaller scales, and slower on large scales.
Interpreting relativity as putting a speed limit on the flow of information, gives a natural justification of the principle in the physical world. The idea jumps out when you consider the (admittedly imperfect) analogy between atoms and solar systems.

We can also see it in other ways. Small companies usually react more quickly than big ones. Flies move more quickly, and die more quickly, than elephants or whales. Smaller computer programs often run faster than large ones. Things happen faster in dense cities than in a sparse countryside. An idea of little interest fades faster than a popular meme. A simple system is easier to work with than a complex one.