Home/Blog/Programming/Quines: Self-replicating programs

Quines: Self-replicating programs

11 min read

May 12, 2023

content

Can a machine create a copy of itself?

Programs that can print their own code

First attempts

A better attempt

Countering the infinite regress

The final attempt

The complete code

An implementation in JavaScript

Quines

Computing with self-replicating programs

Kleene’s recursion theorem

Consequences

Unsolvability of the halting problem

Proof of the unsolvability of the halting problem

Fixed-point theorem

Proof of our fixed-point theorem

Other fixed-point theorems

Become a Software Engineer in Months, Not Years

From your first line of code, to your first day on the job — Educative has you covered. Join 2M+ developers learning in-demand programming skills.

Surely, one may think, that a machine that can create another machine has to be more complicated than the created machine. A part can’t be of the same complexity as the whole, right?

Not necessarily. All that implies is that a machine cannot contain a copy of itself. It says nothing about a machine “creating” or “generating” a copy of itself. Instead of worrying about the general problem, we’ll demonstrate this point by focussing on computer programs, for which it’s certainly possible!

Programs that can print their own code#

Let’s first try to understand the task. We’re not asking you to read the source file of the code and print it. The program should be able to print its own code using just the print statements (or its equivalents in your favorite language).

Now the problem looks more interesting!

First attempts#

When we start to look at this problem, it becomes apparent that we can’t just put the entire code of the program inside a print statement. This is because the code that needs to be printed would then include this print statement as well. If we try to include it in the print statement, then that print statement would also have to be printed. This pattern will continue endlessly. Clearly, this approach cannot work.

A better attempt#

Let’s try to circumvent this by trying out different code organizations, say, by dividing the program into two functions: function A and function B. The function A will be responsible to print the code of the function B, and function B will be responsible to print the code of the function A. We can then run the function B first to print the function A, and then run the function A to print the function B.

Writing A is simple enough because we can just copy the code of B—whatever it may be—and simply paste it into A.

function A() {
    print <code of function B here>
}

As long as we have the code of function B available, this can be done.

Now, how should we go about writing the function B? One might be tempted to write it similarly, as follows:

function B() {
    print <code of function A here>
}

But this won’t work. The reason is the same infinite regress we encountered earlier.

Since function A contains the code of function B, function B can’t contain the code of function A because that’d imply function B contains its own code. And this—as we’ve already realized—is impossible. So what should we do?

Countering the infinite regress#

If it’s possible at all for a program to print its own code, it’d have to compute or generate at least some part of its code. Any other attempt will run into some sort of infinite regress. There’s no way around it.

We’ll now try doing just that.

The final attempt#

As our final attempt, let’s write function A in the same way but let’s write function B so that, instead of containing a copy of function A, it computes the code of function A.

To obtain the code of function A, all we need is the code of function B. If we have access to it, we know how to generate the code of A. We’d simply add function A() { and a print statement before this code, and append a closing } at the end.

function B(code_of_B) {
    print <function A() {
               print <code_of_B>
           }>
}

As long as we have the code of function B, we can generate the code of A computationally. But how can we obtain the code of function B without running into the infinite regress we encountered in all of our earlier attempts?

We can obtain the code of function B by running function A, of course!

Function A is responsible for just one task: printing the code of function B. So if we run A first and pass its output as input to function B, we can generate the code of function A!

The function A is assisting in generating its own code. How cool is that!

The complete code#

To give the complete structure of the code, we’d have to worry about exactly how and where to run the two functions to obtain the code. But those are just details that can be easily taken care of. We’ve made some minor adjustments to the two functions. Function A is now responsible for providing the code of everything except itself, and function B is responsible for providing the code of function A.

With these adjustments, the program would have the following structure:

function A() {
    print <function B(code_of_B_and_the_rest) {
                code_of_A =  <function A() {
                                  return <code_of_B_and_the_rest>
                              }>
                return <code_of_A>
           }
           
           code_of_B_and_the_rest = A()
           code_of_A = B(code_of_B_and_the_rest)
           print code_of_A
           print code_of_B_and_the_rest>
}

function B(code_of_B_and_the_rest) {
    code_of_A =  <function A() {
                      return <code_of_B_and_the_rest>
                  }>
    return <code_of_A>
}

code_of_B_and_the_rest = A()
code_of_A = B(code_of_B_and_the_rest)
print code_of_A
print code_of_B_and_the_rest

Note that, as we observed earlier, all code starting with function B needs to be duplicated inside function A.

An implementation in JavaScript#

Now, in an actual implementation in some language, we’d need to carefully print string delimiters as well, even when they appear inside other string delimiters.

For example, see the following implementation in JavaScript in which we’ve taken care of this problem:

Take care to copy any additional code you may write and paste it inside function A as well. This is because function A is responsible for generating the entire file (except itself).

In lines 26–32, we define the factorial function in the usual way, except we don’t use the built-in recursion of the language. Instead, we evaluate the entire code of the file appended with the factorial(n-1) call. This would non-recursively evaluate the factorial function from the obtained program code on n-1!

This can come in handy in languages that do not support recursion natively.

Kleene’s recursion theorem#

Though we haven’t done the formal proof of this theorem here, we’ve looked at some examples to see that this can be done. The proof is a generalization of the techniques that are apparent in our examples.

Consequences#

It’s time to explore some of the consequences of the recursion theorem.

In addition to its obvious applications in enabling a computer virus to self-replicate, the recursion theorem helps in obtaining—often constructive—proofs of some important theorems. Let’s look at two of these below.

Unsolvability of the halting problem#

It’s well-known that the halting problem (i.e., the problem of determining whether a given program will halt on a given input) is unsolvable algorithmically. The proof makes use of a tricky diagonalization technique initially developed by Georg Cantor and has been used influentially since then in mathematics and computer science. We can come up with a simpler proof of the unsolvability of the halting problem using the recursion theorem.

Proof of the unsolvability of the halting problem#

We’ll show that if we assumed that the halting problem is solvable by a program H, we’d arrive at a contradiction (something that we know to be false or impossible). We can do so simply by creating an impossible machine P as follows:

function P(w) {
    1. Obtain the code of P via the recursion theorem
    2. Run H on the program P and and input w. 
    3. If H says P will halt on w, go into an infinite loop. 
    4. If H says P will not halt on w, simply halt.
}

Now it’s clear that P will halt on w if and only if P will not halt on w, which is clearly impossible. Hence, H cannot exist! QED.

Fixed-point theorem#

We’ll end this blog by stating and proving a very implausible result. Suppose you have a transformer $T$ of computer programs. That is, you have a computer program $T$ that takes in a program $P$ and outputs another program $T(P)$ after performing some arbitrary changes in the code of $P$ . In case the transformation $T$ introduces syntax errors, the convention is to regard $T(P)$ as a program that just returns false on all inputs. The fixed-point theorem states that no matter how arbitrary such a transformer is, there’ll always be a program whose behavior remains unchanged by this transformer. Sure, the code will change, but both $P$ and $T(P)$ will be equivalent, in the sense that they’d both give the same outputs on the same inputs.

Proof of our fixed-point theorem#

The proof is constructive. That is, we’ll describe a program $F$ whose behavior remains unchanged by the transformation $T$ .

function F(w) {
    1. Obtain the code of F using the techniques discussed above.
    2. Apply the transformation T on the obtained code of F to obtain a program T(F).
    3. Simulate T(F) on the input w, and do what T(F) does.
}

Clearly, both $F$ and $T(F)$ have the same behavior since both compute the same outputs on the same inputs. This is because $F$ is just simulating $T(F)$ . Hence the program $F$ is our fixed-point. QED.