Halting Problem: Composability and Compositionality
Thursday, 18. June 2009, 23:50:46
The difference between these two concepts may be somewhat confusing. The primary difference is that composability deals with connecting language constructs together and compositionality deals with the relationships between data items (using the previously mentioned compositions).
I mentioned in the opening paragraph that it's possible to have a program that can be executed and be invalid at the same time. What does that mean? How is that possible? Consider a scenario where you can compose a program, but where the relationship between data items makes no sense at all. Usually, language constructs and formal systems are built to minimize this possibility. But it's almost impossible to avoid it completely. In the remainder of this article, we will look at exactly how it's possible to compose a program, yet have the relationship between data items be invalid.
To simplify a few compositional properties, we need to take a look at a simpler execution model. We will still use the common von Neumann architecture, but we'll only use three instructions (though two would suffice). For this machine, we will disregard addressing modes. To most people that have dealt with digital components and circuits, they know that you only need one kind of gate to be able to compute anything computable (at least until proven otherwise). Either the NAND gate or the NOR gate is sufficient. The rest involves how you connect the output of one NAND gate to the input of another. In a von Neumann architecture, control statements are the only other way to connect outputs to inputs aside from sequential execution.
So those are our two main instructions. One that deals with computations and one that deals with execution flow. The instruction that deals with computations is easy. We pick either NAND or NOR operation. We'll use NAND in this article. This operation AND's two inputs together and then negates the result. For the execution flow instruction, we simply need a conditional jump. We'll make up an instruction that jumps to a destination if a particular data items is true (non-zero). We'll call it JT. While JT requires two inputs, one is a data item and the other is a destination location to continue execution.
We said that our machine has three instructions. So far, we have two. The last instruction is an unconditional JUMP. It only has one input which is a destination location. We could have used a special version of JT, but I believe this keeps things clearer and more distinct.
- NAND
- JT
- JUMP
Those are our opcodes. This machine can compute anything that your PC can. Please keep in mind that a real machine would need many more constructs for moving data around to connect the outputs of one NAND operation to another (such as registers, memory, addressing modes, stack, etc.). That's why the simplest of von Neumann machines have at least 6 or 7 opcodes. Only one opcode is necessary for computations. The rest is used for accessing data or controlling execution flow.
Here, a few properties are starting to reveal themselves as far as compositionality is concerned. The NAND operation requires two inputs. So the relationship between the inputs and outputs is clear. What's more, we see that ANYTHING that requires an output MUST have a NAND or JT opcode unless it passes its input through unmodified (in which case the entity is superfluous and can be removed completely).
I cannot stress how important that property is. More specifically, if there is any computation to be done, then your program MUST have a NAND opcode. If it does not have a NAND opcode (or a JT opcode since that could be used to build a selector program), then your program may as well not exist. Unfortunately, programs that have no NAND and no JT opcodes can either halt or not halt. Luckily, these tend to be rather simple to analyze because without any data, the only opcode that can execute is the unconditional JUMP. Let's take a look at a few examples.
while(true);
The value of 'true' seems like there is raw data, but this is not so. An unconditional jump can do the same job by jumping to itself. So it's trivial to see that this will not halt.
return true;
Again, we have a value of 'true', but it is not used in any way. A calling function may use this value, but not here. The only operation here is again an unconditional jump to the calling function.
int f(Function g)
{
return g(f);
}
// execute the following
f(f);
Should not be too difficult to see that this is infinite recursion. But we can see more by using properties of compositionality. Without data, there is only execution.
Step 1: We invoke function f with parameter f.
Step 2: Function f invokes function f with parameter f.
By induction, we see that all future steps will involve invoking function f with parameter f. Invocations are unconditional jumps. And having proven that there will be an infinite number of them executed, we know this will not halt.
There is a variation on the last example we need to look at.
int f(Function g)
{
return g(f)+1;
}
// execute the following
f(f);
This example is different in that it has an addition. This is clearly something that would use a NAND opcode. Yet, that would be incorrect. The reason is that we've already proven that the invocations will never end. So no function will ever return to produce a value to be used with the addition. This means that all code after the invocation is dead code. It will never be executed. It's as if it doesn't exist. We could create an equivalent function without it and the execution would be identical to the previous example. Heck, we could even replace the 'return' statement with a goto.
The important point to remember is that even though there may be computations in the source code, they may be irrelevant if it can be shown that the execution path will never go there.
Just now, we have given meaning to source code (as it relates to those that only have unconditional jumps) toward their halt and not halt properties. We may now expand this meaning to programs that have other opcodes, especially when it comes to their relationship with their inputs.
Before going too far, we should first see if it's possible for the Oracle to only have unconditional jumps. The reason as to why we want to know this will be explained later, but it will have to do with the inputs to the Oracle.
If we only have unconditional jumps, then two and only two possibilities may occur. This applies to ALL programs that only have unconditional jumps.
1. Your program goes into an infinite loop (or recursion).
2. Your program returns a static value.
Because unconditional jumps have no data inputs, there is nothing to decide. Your program will either halt or not. And it's simple to make that determination since all you have to do is follow each jump until you return or jump to a location already visited. So unless you have a program of infinite length, this is always decidable.
Back to Oracle... is it possible that it only has unconditional jumps? The first possibility (infinite loop) is not allowed since the Oracle must halt according to its definition. Going into an infinite loop 100% of the time is a ridiculous concept for the Oracle. As to the other possibility (returning static value), it means that the Oracle would have to return the same result every time. We know that some programs halt and others don't, so that's no good either. Of course, the Oracle could simply pass through an input as is, but again, that is ridiculous because the Oracle would be equivalent to having nothing at all.
What this means is that it's impossible for the Oracle to be built completely with JUMP opcodes. It's not allowed. That leave two opcodes that both require raw data as input. In other words, the Oracle MUST act on its input and must have raw data available to it. Note that it can use unconditional jumps, but not exclusively. It must have at least one of the other two instructions. Please understand that this is a requirement based on the definition of the Oracle as stated by the Halting Problem.
We'll come back to the other properties of the Oracle later and see how this all fits together. For now, what would happen if your program used conditional jumps or NAND instructions but wasn't supplied with raw data? In our machine, whatever is in memory will be used. But when using abstract concepts such as functions, you can indeed compose programs where no actual value is given.
The difference between a function and a value is very real and has drastic consequences. A function is a relationship between inputs and outputs. As such, connecting outputs of one function to inputs of another, you achieve composability. But only when we have the data can we determine the true meaning of the overall program. It's why the Oracle needs a second input. This second input is the data given to the program P [as in Oracle(Program P, Input I)]. Without data, determining the meaning of all programs is impossible, especially as it relates to the halting property.
This is the deceiver program as commonly described in halting proofs.
Bool Deceiver(Function f)
{
if (f(Deceiver,f)==HALT)
while(true);
return true;
}
// Execute this
Oracle(Deceiver, Oracle);
This is an amazing piece of code that takes advantage of a compositional flaw while retaining composability. In doing so, the halting proof tries to wrongfully claim that there is a contradiction. There is no contraction. It's a simple compositional flaw in the program.
Here, we see that we only have two functions. We have the Deceiver and the Oracle. We've already determined that the Oracle has conditional jumps or NAND's opcodes. This means we know that the Oracle needs raw data from its inputs (since it can be easily established that not using its inputs would always give the same answer and this would contradict the definition of the Oracle).
Now let's look if we can determine the behaviour of the Deceiver without looking at its input. With our theoretical machine, we know that we simply have to determine whether or not the function in question uses unconditional jumps and nothing else. We see that the 'if' statement is a conditional jump requiring the JT instruction and the equality comparison would require at least a few NAND operations. Because we don't exclusively have unconditional jumps (and we can't reduce or eliminate any of the existing code), we know that we must look at the input to determine the execution property of the Deceiver.
What specifically are we looking for? The 'if' statement (JT opcode) takes as input the result of the comparison operation. And the comparison operation requires two inputs. One is given as HALT. The other is given by Oracle(Deceiver, Oracle) after parameter substitution. But this isn't a value. It's a function. What we're looking for is the value itself so that we can determine the relationship between the data items upon execution.
The problem we have here is that neither the Oracle nor the Deceiver program produces this value. We know that the Deceiver doesn't produce it since it invokes the Oracle. But the Oracle has already determined that the Deceiver needs raw data for its equality comparison to execute. As such, the input is required. But the input is the Oracle itself. We've already determined that the Oracle doesn't have the necessary raw data input. It's why we looked at the Deceiver's input in the first place. Since there is no raw data to operate on, it's impossible to ascertain the compositional properties of the program.
Let's go back to what I said at the top of the article.
[...] compositionality deals with the relationships between data items.
We've established that neither the Oracle nor the Deceiver function is ever given the RAW DATA required to produce a relationship between them. In other words, the Deceiver program uses a side effect of composability to produce a compositional flaw. The authors of the halting proofs have then taken this to be a contradiction when it is no such thing. It is simply what happens when you're trying to determine the relationship between two data items and you're only given one of them.
The real issue here is that functions can take as inputs either a raw value or another function. If you never supply the required raw values to computational operations by instead supplying functions, then there is no way to ascertain any meaning to the overall program.
What the halting proof is actually saying is that determining the meaning of a relationship between a data value and NOTHING is impossible. I hope I don't have to explain how ridiculous this is.
The real unfortunate side of all this is that far too many people get caught up with the fact that the Deceiver is not executed. At no point in this article did we ever come close to attempting to execute the Deceiver. There are many compositional properties that can be determined without ever executing anything. And it doesn't matter if you believe the source code to be raw data. It's been shown exactly what compositional properties can be ascertained from the source code alone. So it's not enough to say that the source is raw data. That's simply hand waving and avoiding the true nature of what the Halting Problem is asking.
The diagonalization proof is likewise faulty in that it uses the source code as raw data without regard to the compositional properties of that function. To be more precise, the proof itself is defining the Oracle in a way that is contrary to the definition stated in the Halting Problem. Since these definitions obviously conflict, it is said that there is a contradiction. Unfortunately, this must fail since the definition stated in the Halting Problem should be the only one allowed. The diagonalization proof (and the Deceiver one) are doing nothing other than presenting their own definition of the Oracle and showing how it differs from the one stated in the Halting Problem.
The Halting Problem is asking about a very specific compositional property between a program and its input. As such, if you use a composability trick to avoid giving the the program its required raw data to complete any internal relationships, then you are not fulfilling the Halting Problem's requirement that the program be given its input. There's a reason it's needed. It's called compositionality. Without data, the program becomes meaningless. You could delete the program and you would have an equivalent program. Therefore, the halting proofs (by deceiver program and by diagonalization) are likewise meaningless. This is also further reason why the diagonalization proof fails. We've shown how raw data is required by the Oracle (used by program P) in some cases. If both inputs are functions, then there is no raw data. And we know that the diagonalization proof works by saying that there is an entry where both inputs end up being a custom function. This ironically invalidates the proof. Also, whoever thought up the diagonalization proof doesn't understand that the representation of a function is not the function itself. It's called a REPRESENTATION for a reason. Think of it as an agent or an ambassador. What is true of the agent is not true of the principal. So any conclusions on the representation (source) are not necessarily valid for the principal (program).
In conclusion, the Halting Problem requires an input for a reason. The Halting Problem is asking about a compositional property between a program and its input. Does the relationship between a program and its input cause it to halt or no? And because we know that computations require RAW values, they must be provided in this case. If not, then any and all conclusions are meaningless. Finally, it's impossible to use an algorithm on a function. That is an absurd concept. You can only operate on its representation. But that is not what the Halting Problem is interested in.
(note: In case anyone is wondering, I'm not posting often because I'm EXTREMELY busy on finishing up other projects I'd started a while back. They are not computer related, but this will be VERY good for everyone involved and will allow me to be much more productive in the future, especially with Project V. I'm going to be busy for at least another month. Things are moving rather rapidly and am really enjoying the process.)


How to use Quote function: