Chaotic Extrapolation

by Tom Temple

31 May 2005

I was sitting in on a lecture by the esteemed professor “Phan”: who was talking about non-linear function approximation. In particular he was covering Nueral Nets and their applications. He made a point which, while silly, I found very exciting. I’ll try to explain why.

Nueral Nets

First let me quickly describe the math version of the nueron. A math nueron takes a linear function of its inputs and then passes that through a non-linear function. The non-linear function is predefined, e.g. tanh.

The exciting thing about a (math) nueron is that it is to some extent modeled after a brain neuron. The brain neuron recieves inputs from other neurons on dendrites. These inputs are “gated” by electrical and chemical regulators in a way that could conceivably be represented as a scaling. Then something happens in the middle of the neuron. The first bit, the linear function, summing the weighted inputs is probably not terribly accurate. But then we talk about neurons “firing.” I can do “firing” that with predefined non-linear basis functions no problem. So that part could conceivably be well modeled as well. Then it sends a signal down the axon to some other neurons.

I say that it is at least a workable model for the brain, considering that it has been proven that 1) a (math) neural net can be arbitrarily descriptive 2) by putting loops into the net, we can have memory.

Then also think about the fact that if we were to simulate the brain, it would take some 10^14 (yes thats one hundred trillion) parameters to describe it. Thats a large number but I might get to see computers with memory that big before I die; maybe even computers that could do computations with such a network.

Learning Chaos

The interesting thing about a neural network is that it can be “trained” to approximate any non-linear function. “Training” merely consists of a mathematically systematic way of tuning all of the parameters so there is an optimal match (or perhaps just locally).

Now we can train the network to model something unpredictably non-linear or chaotic. Then the network would be able to reproduce the signal, but only over the input range. If we were to try to extrapolate beyond the input, there would be unpredictable results.

At this point, Phan (jokingly) brought up the question of creativity. What if I trained the network to reproduce the first half of a song by Bach and then asked it how the rest of the song went? I would certainly get something. Under the right circumstances, I might even get something that sounds like music. I might even get something that sounds like Bach! Of course, I am the first person to dismiss the phenomenon of “emergent behavior” (it is just another way of calling a “bug” a “feature”). But if I were to design a program that could compose “Variations on Bach,” that would certainly pull some more people into Turing’s computability camp.

Then I remembered how Brayton would often start whistling a bar of some classic rock balad, and then go off into a noodle-land that still remained vaguely coherent. Maybe that’s chaotic extrapolation. I doubt it though. He does the same thing in Karaoki.

Comments:

  • Michael
    Jun 1, 07:52 AM

    Tom wrote: ”[E]mergent behavior” [...] is just another way of calling a “bug” a “feature”

    Tell that to the Tacoma Narrows bridge (and all the other suspension bridges that tore themselves apart in similar fashion in the half-century prior to that particular disaster). I’ll grant you that to an engineer, emergent behaviour is usually something to be suppressed (or at least quantized out of the problem), but there’s no reason whatsoever you can’t take advantage of it, if you can find a reasonable way to model it.

    And as far as extrapolation as a model for creativity, even a Markov chain can do a pretty good job of simulating musical invention, as long as you don’t want it to go too far afield. I wrote a computer program to improvise Jazz solos based on this idea, maybe seventeen years ago. There wasn’t much genius in it, but it could do a pretty decent eight-bar blues, if you let it run overnight.

  • Michael
    Jun 1, 07:56 AM

    Oh, and speaking of Turing’s “camp,” you might find this article written by my advisor to be interesting.

  • Tom
    Jun 1, 09:24 AM

    No shit! 17 years ago? How old were you? 17 years ago all I could do was write Madlibs in GwBASIC on a 186. I tried writing a sidescroller (a la Chopper Command on the 2600) but I didn’t get much further than drawing the helicopter.

    100 A$ = “l8r16l8d2…”
    110 REM JUST LIKE MY ETCHASKETCH
    ...
    1000 DRAW A$

    Would that jazz code be proprietary? Is it in FORTRAN or something incomprehensible? Can I try it?

    In class I suggested that a Markov model could do the same thing and was hoping Phan would illuminate the differences. Phan didn’t seem very interested in MMs. It doesn’t make sense to me how disjoint machine-learning and AI are.

  • Tom
    Jun 1, 09:39 AM

    I had a great musical intro also

    10 A$ = “e1d1c1d1e1e1e1p1…”
    20 PLAY A$

    Man that Kemeny was great, huh! How long until Wright’s term is over?

  • joran
    Jun 1, 11:19 AM

    How will we know when computers can think? I’m not even sure when the humans I see everyday are thinking.

  • Jon Shea
    Jun 1, 12:29 PM

    When there’s doubt as to whether humans are thinking or not, I usually guess that they are not.

    Whatever method you use to make your determination on humans, I suggest that you use the same on computers.

  • Michael
    Jun 1, 12:51 PM

    Junior year in high school, that would have been 1987-1988, so that’s about right. It was written in BASIC for the Apple //e, which was the only computer I had regular access to at that point (actually, the school had an Apple //gs, but I could only use that for a couple hours a day).

    I’m pretty sure I still have the disk containing the code, if we can find a way to read the code off there, I wrote a BASIC to C compiler that we can probably use to make it runnable on something more modern. Anybody got a 5.25” floppy drive compatible with late-80’s vintage double-sided ProDOS floppies?

    Mind you, at the time I didn’t realize I was using a Markov model; I just had a table of probabilities painstakingly constructed from the pitch and duration of a dozen or so blues solos I transcribed off of tapes. But, in essence, I was using a third-order model, with some tweaks. Back then, I relied a lot on brute force, and not so much on knowing what the hell I was doing. Actually, now that I mention it, not much has changed.

    Jon wrote: “Whatever method you use to make your determination on humans, I suggest that you use the same on computers.”

    Yup, that’s pretty much Turing’s idea, too. His method is weird, though.

  • joran
    Jun 1, 01:13 PM

    Having immersed myself once in the “Is the brain a Turing machine?” issue (with philosophy students, unfortunately) I’ve decided to avoid dividing the world into only two camps on the question.

    I don’t think the question has an answer. I suspect it is one of those non-provable/disprovable statements that Godel made famous.

    I think the number of things that computers can do that previously only humans could do will continue to increase.

    I also think there will always be something humans can do that computers can’t.

    This does not preclude computers at some point “thinking”. I just think they can’t do everything (halting problem, anyone?) and some of these things will be (relatively) easy for humans.

    In my phil class, this position put me squarely in the Wittgenstein camp (so I was told) and also in a rather significant minority.

  • Jon Shea
    Jun 1, 02:55 PM

    Halting Problem? I’m tempted to say I don’t believe humans can, in general, figure that one out either.

    Michael, yeah, I think Turing wiffed a little in the parameters for his eponymous test. We could have left it a little under defined. If the computer is doing well enough to be a gray-area case, well, then I think it won.

  • joran
    Jun 1, 03:25 PM

    Jon –

    My example of the Halting Problem was meant to illustrate the fact that there are theoretical limits to Turing machines, and that in at least some cases, these limitations don’t exist for (all) humans.

    The fact that there isn’t a single human that can correctly identify every nonterminating function doesn’t mean that “humans can’t solve the halting problem”.

    The issue is what we mean by what something can “do”. If computers can “do” something, we mean one computer could do it.

    Human activity is, I think, not so simple. Let’s say some guy runs a 3 minute mile. Only he does it, and he only does it once. And then he dies.

    Is running a 3 minute mile now something that “humans can do”? I say yes.

    I guess the problem is that there is no Human Capability Theory, analogous to Computability Theory for computers.

  • Jon Shea
    Jun 1, 03:53 PM

    I mean, check me if I’m wrong, but by that criteria Joran, can’t I write a computer program that is able to determine if a particular subset of programs terminates or not? Is solving the halting problem (for those special cases) no something that “computers can do”?

  • Tom Temple
    Jun 1, 06:51 PM

    The halting problem is that you can’t write an algorithm that will tell whether any program will halt. I have actually seen a program that tell whether or not a program will halt or not. It works pretty much all the time on real programs. It works better than a human at telling. That’s why it costs like $50000.

    On the philosophical note, has anyone else watched “Ghost in the Shell”?

  • joran
    Jun 2, 07:49 AM

    Third time is the charm.

    I haven’t been expressing myself very well.

    I didn’t mean to get into the nitty gritty details of the halting problem.

    What I mean is this: we have a pretty good theory on what computers can and can not do. And this theory implies at least some limits on computers.

    We don’t have a very good theory for what humans can do. We know what they have done, but we have no (good) method for predicting theoretical limits to human thought. I’m not even sure if such a theory would really be possible.

    Given this state of things, I would bet (and I am a betting man) there will always be something that humans can do that computers can’t.

    That’s all.

  • Tom
    Jun 2, 08:48 AM

    On a slightly different note, quantum computers potentially change the question here. There are certain problems (among them NPC) that people, nor current computers, can’t solve tractably. But people are talking about computers that can solve these buggers. That implies that quantum computers will be able to do something that people can’t. It isn’t can’t in a quantitative sense, like I can’t count to a billion in under three seconds. It is a more fundamental difference. Unless I’m mistaken, a quantum computer is not a turing machine.

    Data on TNG—totaly going to happen. Like I said, I wouldn’t be surprised if my grandchildren have computers that rival the brain. By then there could also be a very solid model for what goes on in the brain. I see no reason why a near-perfect brain emulator couldn’t be built. After a few tries I’m sure that emulator would be able to “think” every bit as well as a human after a near-identical fashion. Now your interesting question is whether that emulator will require more capability than a Turing machine. Frankly, I doubt it will. Otherwise, I think people would be able to solve NPC problems faster.

    What will be far more exciting will be the brain implants. Too bad I’m probably going to miss out on that. Even if I don’t, I will be too old to learn to use one.

  • Michael
    Jun 4, 05:24 AM

    When talking about the Halting Problem and other undecidable computational problems, it’s important to avoid falling into the following subtle trap. The argument that there is no algorithm to solve (say) the Halting Problem is based upon an unprovable assertion of the Church-Turing thesis, namely, that any algorithm a human can come up with could be encoded as a Turing Machine program.*

    The key here is the word “algorithm,” for which there is no hard and fast definition. In brief, an algorithm is a finite recipe composed of well-defined discrete operations you could build a machine to do for you (you know this already, but I want to be clear about my definitions).

    We like to say “algorithm = Turing Machine” because so far, nobody has ever come up with something we’d agree is an algorithm, but which couldn’t be translated into a TM.

    All we can prove logically about the Halting Problem is that no Turing Machine can answer it. Only if you accept that every algorithm is expressible as a Turing Machine may you extend this result to imply that the Halting Problem has no algorithm at all. The “subtle trap” is to take the Church-Turing thesis as an implicit truth about the world; there’s no reason it is necessarily true, and, like the consistency of ZFC, unprovable.

    But the Church-Turing thesis is not arbitrary, either, given the following facts:

    1. Nobody has ever come up with an algorithm we couldn’t translate into a TM.

    2. Nobody has ever come up with a model of computation more powerful than a TM, in that there was some algorithm it could execute that a TM couldn’t.

    Humans can often solve problems using operations that are not currently seen as “effectively computable,” like intuition. Maybe someday we will figure out how to model intuition computationally, and they’ll add an INTUIT instruction to the CPU, but you and I and everyone we know will be safely and comfortably dead long before that happens.

    It’s quite true that there are algorithms that can tell whether a large and interesting class of programs will halt. In fact, in ML, a non-terminating program won’t even compile (because it won’t typecheck). But, that comes at the cost of some loss of power; there are algorithms a TM can do that an ML program cannot. For most purposes, we don’t care, however.

    *Or some other computationally equivalent formalism, such as a URM program or a lambda expression.

  • Tom Temple
    Jun 4, 06:31 AM

    Thanks Michael. That was very clear. Now that you are a “candidate”, might you get to teach 49? or 68?

    You want INTUIT to be a CPU instruction?! Is that just hedging your bet on the “long dead” estimate?

    About the ML typechecker, did you hang out with Chris Hawblitzel? He loved that thing. He wrote one for C and maybe even for a slightly restricted C++ also.

    That lamba expressions are equivalent to TMs is really impressive to me. I am pretty comfortable with the idea of building a computer out of a TM. Doing the same in lambda calc is a terrifying proposition.

  • Michael
    Jun 4, 08:56 PM

    Alas, no, I will only get to teach the same stuff I’ve been doing all along: 18, 4, and 23. I would love to teach 49 (now called 39) or 68, but that will be for after I graduate and go to some other school, where they call them something else. :-)

    Yes, I know Chris, and I’m quite familiar with his love of the Hindley-Milner type system, as well as his work on Clay—a nice idea, but one which turned into a crawling horror, from a syntactic perspective. Somewhat amazingly, Lea Wittie was able to write device drivers in it, though. You have to be at least a little impressed by type-safe device drivers. Yeah.

    Re: INTUIT—am I just hedging my bets? Naw. We’re not even close to understanding human language well enough to build a computer that can simulate it effectively, even if we did have the raw power. Back in the 50’s and 60’s, they thought they had the machine translation problem wrapped up. Back then, they just didn’t have enough computes or memory to do it. Today, we nearly do—a few more iterations of Moore’s Law, and we’re there. But we have no damned clue how.

    I used to feel as you do about the lambda calculus; but nowadays, I find it much more natural than a Turing Machine. If you really need a formalism that feels more like a computer, use a URM.

  • Tom Temple
    Jun 5, 05:26 AM

    Reading English is a problem that really sticks out. We had a couple papers in my last class on it. It was kinda shocking how little headway has been made. What people are doing now (they are still calling them HMMs but they have kinda evolved into Dynamic Bayes Nets or maybe what I would call Probabalistic Finite State Machines.) isn’t very good. They think it is because they can only use a second or third order model. I think it is because they are doing it completely different than people do.

    I think more should be done using parts of speech less should be done with word adjacency. Remember when you learned to parse a sentence? There was a somebody and he or she did something (potentially to someone else). But then consider the sentence “Then he did it.” Who did what? Clearly we need to come into that sentence with a prior(s) over subjects/actions. We also need some notion of time and space for an event. But then Jon would try to break it with something like “Unrestraint begets calamity” just like when we play Telephone Pictionary.

    There is a lot of faculty and not a lot of courses. How do they decide who gets to teach what? Do the most senior pirates get to pick first, or is there a panel that tries to be smart about it?

    In either case, it seems odd that you would get to play either 4 or 18 (while 23 could definitely be the dregs from the pirate scheme). I imagine they would be some of the most fun. I also would think that a committee would put one of their starters (Drysdale, Jayanti, Cormen etc.) in those clutch positions. I bet you can teach every bit as well as they can, but I’m not the one you would need to heckle with “Put me in, coach.”

  • Jon Shea
    Jun 5, 09:37 AM

    If someone proved that it would be impossible for a Turing machine to prove P vs NP, then should we stop trying? (Joran?)

    Tom, when you read Latin all you get is part of speech. There’s almost no information to be had in word adjacency. Either humans who read Latin use entirely different parsing schemes than humans who read English, or “they” are wrong.

  • Michael
    Jun 12, 12:41 PM

    Catching up on a few old ones here…

    Tom asked: “There is a lot of faculty and not a lot of courses. How do they decide who gets to teach what? Do the most senior pirates get to pick first, or is there a panel that tries to be smart about it?”

    There are a lot of factors—seniority, sabbatical, and areas of interest seem to play the biggest roles in the decision. I believe the faculty discuss it in a meeting, but the Chair has the ultimate say over what the teaching schedule will be. They seem to make some effort to keep the schedule the same from year to year, when possible, once it’s been found to work—but there are nearly always some additional factors at work.

    Jon wrote: “Tom, when you read Latin all you get is part of speech. There’s almost no information to be had in word adjacency.”

    That’s especially true of classical-period Latin poetry. On the other hand, much of the later Latin text we have—liturgical texts and other records relating to the Catholic Church in Europe—subsumes a great deal of syntax from the matrix language (e.g., French, German), and thus has more information than usual encoded in word order.

    On the other hand, if you look at inflection as being primarily a lexical issue, you can ignore a lot of the syntactic overhead you have to handle for highly positional languages like English, once you have figured out how to read the morphology. That isn’t trivial, though, unless you’re working with heavily normalized text.

Comment: