Learning to Hypercompute? An Analysis of Siegelmann Networks
Keith Douglas[1]

Abstract. This paper consists of a further analysis (continuing that of [11]) of the hypercomputing neural network model of Hava Siegelmann ([21]).

1  INTRODUCTION

This paper consists of a further analysis (continuing that of Douglas [11][2]) of the hypercomputing neural network model of Hava Siegelmann ([21]). It consists of three large sections. In the first section, a brief description of Siegelmann’s model is presented. This section will be followed by a discussion of the merits of taking this model as a factual model (pace the “abstract” approach of Sieg [20]). Third, a discussion of one of Siegelmann’s key, heretofore poorly explored, assumptions (the “linear precision suffices” claim) will be addressed and is the primary focus of the paper. This discussion will proceed along the following three subsections of analysis: it will discuss (1) a not-fully “noticed” suggestion of Arlo-Costa ([1]) and Douglas ([11])[3] that the Siegelmann network model actually requires a supertask to perform; (2) the merits of treating Siegelmann’s proposal as one involving an idealization in the sense of Norton ([15]). The latter two will also allow a brief discussion of a factual interpretation of the arithmetic, analytic and similar familiar hierarchies; (3) that pace Davis and Scott ([9]) “non-recursive black boxes” are not exactly untestable, making use of the work of Kelly and Schulte ([16]). Subsections (2) and (3) are not independent and yield similar findings.

I end with summary conclusions. The conclusions will largely be more negative and “skeptical” about the merits of the Siegelmann network than those of herself or some of those (e.g. Copeland) who have defended it, but hope to provide more details on the areas of the model where interested parties could work on improving its plausibility and (nomological) possibility.

SIEGELMANN NEURAL NETWORKS

Hava Siegelmann’s monograph ([21]) is the principle source of detailed discussion of her model (hereafter SN), and includes much information about the computational strengths of various forms of neural networks, their properties from the perspective of computational complexity and much off topic for our present purpose. Subsequently here, we only need to focus on aspects that are unique or unusual in her presentation with regards to the question of computability. I assume the reader is familiar enough with usual artificial neural network models (hereafter, ANN) to follow the discussion involving such matters as nodes and weights (see, e.g.,[8]).
These unique/unusual aspects are: (1) the necessity of using an “output protocol”, (2) her claims about the (real-valued) weights in the network and (3) the “sensitivity” of the network, i.e., a matter of interpreting the activation function.
This section of the paper will simply remind or inform the audience of these features as they do play a role later on, and not critically discuss them completely at this point. I bring these up to show that there are additional problems with SNs not discussed as much as the problem of weights already familiar in the literature and because they will play a crucial role in the discussions of idealizations later on.
In the case of “output protocol”, what is meant is the convention adopted by Siegelmann (see in particular, [21], pp. 23-24) to indicate when a SN is finished its computation and is returning the value so calculated to its users. A state flag called the “output validation line” is set to 1 and is held in this value for the duration of the significant output and is set and held at 0 at all other times. One can then, during the time this line is 1, read off the value of the output from another, more conventional, output line. The “hypercomputing” part of this proposal is somewhat significant and yet hidden in her presentation.
In particular, how does this flag get set? Take the case of a recursively undecidable problem, for which these networks are supposedly useful at solving, like the halting problem for Turing machines (hereafter, TMs). In this case the output is a single encoded output, so in this case, the flag will be set to 1 at one “tick”[4] sometime in the future while (say) 1 comes out of the output line if the encoded TM halts and 0 otherwise. How does one know how to set this flag as a “programmer” of one of the networks? This depends on how the function is calculated, presumably. One has to know that the calculation is finished, that whatever state the network is in is the correct one. But this itself is a hyper-computational task; and so a regress seems to threaten.

Let us move on then to the case of the real valued weights of the network. This feature is the root of the hypercomputational power of the SN. Siegelmann does not tell us how these are to be obtained; merely calculates approximately how many digits of precision are needed after a given amount of run time. Since (by hypothesis) the network does not use registers, it is unclear what gaining digits of precision could refer to. An unspecified learning procedure is appealed to for the source of this extra precision, but without details this is simply an unknown as well. Notice that there are two concerns here - both the learning procedure and how its use gets “recorded”, “stored”, etc. are at stake. As for the activation functions, their “embodiment” or “implementation” also raises unanswered questions. For example, a threshold function (as the name suggests) is typically understood to be some representation of a node’s sensitivity to its inputs. In the case of SNs, these must be infinitely sensitive. Take a threshold function of the form (all of them discussed have the same problem; but since Siegelmann places special emphasis on the truncated linear one I use it here):

f(x) = 0 if x < 0
= x if 0 <= x <=1
= 1 if x > 1

To see the potential concern, consider a value of x = 0 + e, where e is some small value approximately (but not exactly) equal to zero. Represented in the usual notation, this is then some value 0.00000000000000000...1, say. The network has to be able to be able to “recognize” that value, no matter how small its difference is from 0, because the value of the output depends on it[5]. Siegelmann emphasizes truncation or rounding reduces the value represented at a node to a rational value and hence renders the computational properties of the network nonhypercomputational. I call the property of the nodes in question “sensitivity”, and as we have now seen, this is infinite in a real valued network (which allows literally any real value as a weight). Previous critics have pointed out the implausibility of finding (or knowing) that one had a hypercomputable weight in a SN (e.g., [9]); it is hopefully now clear that the problem is at least twice that, since one also needs a way for the network to make use of it, and that requires a “hypersensitive sensor” or something of the kind - subsystems that respond in infinitely precise ways to embody the activation functions. I might add in passing that this mistake or oversight is nothing new. Bunge ([5]) argues that a human brain is not suitably modeled by a TM because even a single electron can be in a continuum of states. Ignoring that this might prove too much, Bunge, like Siegelmann, has to argue that there can be an infinite number of (hyper)computationally (or, in Bunge’s case[6], cognitively) relevant states and events (state transitions: [10]).


3 SIEG (INDIRECTLY) ON SIEGELMANN
Sieg ([20]) has argued (in the context of a discussion of the Church-Turing thesis) that one can dispense with said thesis and instead:
“My strategy, when arguing for the adequacy of a notion, is to bypass theses altogether, and avoid the fruitless discussion of their (un-)provability. This can be done by conceptual analysis, i.e., by sharpening the informal notion, formulating its general features axiomatically, and investigating the axiomatic framework.”
This viewpoint dispenses with the need to analyze the Siegelmann network in detail, at least for the present purposes - were it correct. It would make it clear that hypercomputation is doomed to failure as a subject as the axiomatic framework in question makes it perfectly clear that broadly computational devices (including potential hypercomputers[7]) do not include anything like the SN[8].
However, as has been pointed out by Stewart Shapiro ([19]), it does not appear that Sieg successfully dispenses with theses here. In other words, there is the question of whether or not the axioms are correct[9] of the real (or non-abstract, non-Platonic, etc.: replace as necessary according to your favourite philosophy of mathematics) systems under consideration. How do we (hopefully) resolve this impasse? For if Sieg is right, there is nothing to investigate; we simply see that the SNs do not fall under the axioms of computing machines he has usefully provided and that would be the end of it. This seems too hasty for the present purpose, so the concern is pressing.
Here is where Shapiro is mistaken; he thinks that (following Quine [18] and others) one is dealing with some matter which is both mathematical and empirical. For some (perhaps for Sieg) this is impossible or unlikely; instead it is like investigating axioms for (say) groups[10]. If Sieg were right it would be a matter of getting (as he borrows a phrase from Hilbert in saying) the “Tieferlegung der Fundamente” right; Shapiro claims instead one has to look to the world too. I claim both are mistaken because they have overlooked the possibility that the matter is not about mathematics at all.
I argue that the debate should be construed as one about doing mathematics (or at least doing calculations or computations). Turing, as Sieg has rightly emphasized, analyzed a human “computor” (in the sense of Gandy [13]). Similarly, Gandy, him, and others have analyzed calculations by machine as well. Using Bunge’s ([3]) theory of reference and Sieg’s admirable presentation ([20]) of “Gandy machines”, one sees that the theory of Gandy machines is, indeed, about computing machines. This makes the subject matter a factual[11] one in Bunge’s sense[12]; see also Douglas [12]. In other words, it is not a matter of mathematics - one can (and should) use mathematics as a tool to describe the characteristics of the computers and computors, but this does not make the field mathematics anymore than using differential equations in the theory of reaction mechanisms makes chemistry a branch of mathematics.
Hence Shapiro is right in his claim: Sieg does not dispense with theses - or, if preferred, Church’s thesis is in need of “empirical” confirmation and hence SNs’ “usefulness” as a model of computing cannot be dismissed so hastily. Also hence in particular, we must address the question of whether SNs are empirically plausible. It is here that we run quickly into previous criticisms of her proposals from the likes of Martin Davis and Dana Scott.
Davis’ ([9]) paper quotes Scott concerning how we would recognize a “nonrecursive black box”. I feel this quotation is also slightly mistaken: it proves too much. I agree that no finite amount of interaction with a black box could show that it performs hypercomputational processes. However, no finite amount of observation could tell you that a black box contains a Turing machine. Any finite experimentation with input and output is consistent with the black box being a (perhaps very large) finite state automaton[13]. This is not to say Scott and Davis are mistaken concerning the difficulty of determining that one has a hypercomputer of some kind, but instead that it is important not to overstate this difficulty. He emphasizes how hard it would be to tell that one had a non- recursive “transparent box” (i.e. a black box with much of its workings well known). It seems to me that Scott and Davis have adopted almost an instrumentalist attitude towards (what Bunge would call factual) scientific theories here. Since instrumentalism is controversial amongst philosophers of science, we should be wary of this approach[14]. After all, how does (say) Newtonian dynamics (ND) get verified? This presumably factual theory uses continuous functions and such; whereas any measurement is only of finite precision and hence renders direct confirmation impossible. Davis and Scott thus “prove too much” with this approach. They might rejoin that one could state ND in terms of computable analysis. However, assuming it could be done in this case does not show it could be done in general. Also, since the theory is then different, how does one decide between the computable version and its (usual) noncomputable counterpart? It would seem one would have to apply more general principles about the nature of theories in (factual) science. Since these are arguably under debate, we are now back where we started.
Nevertheless, Davis and Scott have correctly (in my view) treated SNs as to what sort of proposal they are - namely a family of factual hypotheses. I have mentioned earlier (section 1) that there are what one might call “nomological” areas of discussion (problems, counterproposals, etc.) with the SN approach. I now turn to three of these.
4 NOMOLOGICAL CONSIDERATIONS ABOUT “LINEAR PRECISION SUFFICES”

The first of these stems from Arlò-Costa ([1]) and adapts prior remarks of Douglas ([11]) to that end. He asks whether or not the SN require a supertask to implement and hence “inherit” the implausibility of the accelerated Turing machine (see, e.g., Boolos and Jeffrey [2]) which most would agree is a purely “notional” device. In particular, note the difficulty even in computing a constant function with a SN. Since the weights of each node in a SN are of infinite precision, outputting their value directly is impossible by the protocol described. This arises because such a constant is still an infinite precision number, and so outputting its value requires an infinite amount of time[15], followed by a signal to indicate that the output is finished. At best this would require a supertask. A suitable re-encoding[16] would have to be found, and that is not suggested anywhere by Siegelmann. Moreover, such would have to handle rational values as well as surds, transcendental values, and even non-Turing computable numbers, like Chaitin’s constant. Of course, giving a finite representation of the latter sort of value cannot in general be done. My earlier remarks about the “output protocol” loom large here.