Fault-Tolerance without Security?

A key text to understand Erlang and the Erlang community “world view” is Joe Armstrong’s thesis, titled “Making reliable distributed systems in the presence of software errors,” (final version with correction updated on November 20th 2003).

This is a brilliant and historic text not only for Erlang, but also for the space of programming languages too, as it confirms a very strong turn to performance, concurrency and fault tolerance. Armstrong’s thesis is also a necessary text to understand the “philosophy” and “world-view” reflected in the language.

Joe Armstrong’s thesis can be downloaded from his web page at SICS: http://www.sics.se/~joe/

It is clear that Armstrong and other contributors to Erlang always have emphasised performance related aspects, but it is important to see that the primary target of Erlang engineering is fault tolerance. Armstrong underlines this characteristic:

“The central problem addressed by this thesis is the problem of constructing reliable systems from programs which may themselves contain errors. Constructing such systems imposes a number of requirements on any programming language that is to be used for the construction. I discuss these language requirements, and show how they are satisfied by Erlang.” (Page V)

And also writes:

“In this thesis I identify the essential characteristics which I believe are necessary to build fault-tolerant software systems.” (Page 1)

Given the fact that the Erlang language and its applications are clearly lacking in Security characteristics, how does this match the notion of fault-tolerance that is sought? Implicitly, I believe, there must be here a notion of software and/or hardware error that does not have Security implications. Or else there is a peculiar level of incompleteness in the foundational thinking around Erlang.

To put it in a different way: If the main cause of failure and unreliability assumed by Armstrong and collaborators were hardware and software errors, why were Security considerations excluded from the design?

This happened, despite the fact that the “problem domain” of Erlang is actually quite wide. Armstrong lists the ten requirements for the properties required for a fault-tolerant system (in the context of telecommunication systems, see page 13 and also the reference quoted by Armstrong – Bjarne Däcker, “Concurrent functional programming for telecommunications: A case study of technology introduction,”  November 2000.)

“1. The system must be able to handle very large numbers of concurrent activities.

“2. Actions must be performed at a certain point in time or within a certain time.

“3. Systems may be distributed over several computers.

“4. The system is used to control hardware.

“5. The software systems are very large.

“6. The system exhibits complex functionality such as, feature interaction.

“7. The systems should be in continuous operation for many years.

“8. Software maintenance (reconfiguration, etc.) should be performed without stopping the system.

“9. There are stringent quality, and reliability requirements.

“10. Fault tolerance both to hardware failures, and software errors, must be provided” (Pages 13 and 14)

Without going into a detailed discussion of these ten requirements, it is difficult –if not impossible—to think of any failure at these levels that does not have Security consequences.  I am inclined to think, therefore, that the lack of Security requirements and principles can be characterised as a problem due to “incompleteness” of the approach. Here “incompleteness” implies a lack of desire or inability to address all the moments or articulations of a computing environment. The fault-tolerance objective then, is “achieved” but only through an effective reduction and a large simplification of the problem domain.

Section 2.4.4  of Armstrong’s thesis (“Names of processes,” page 24) is the only place where the author addresses Security explicitly, but in doing so he reveals a peculiar understanding of Security concerns, or better said, a reduced, trivialised understanding of the Security domain:

“We require that the names of processes are unforgeable. This means that it should be impossible to guess the name of a process, and thereby interact with that process. We will assume that processes know their own names, and that processes which create other processes know the names of the processes which they have created. In other words, a parent process knows
the names of its children.”

This is problematic, because an attacker does not need to “guess” the name of a process, but may instead be able to “know” the process name by taking control of it by other means. If the attacker gains control of the Erlang server he or she “knows” the names of all the processes it spawns and/or communicates with. Armstrong continues:

“In order to write COPLs [concurrency-oriented programming languages] we will need mechanisms for finding out the names of the processes involved.

Exactly. And Erlang environments provide those mechanisms to obtain the names of processes, both programmatically at through the command line.

Then Armstrong writes:

“Remember, if we know the name of a process, we can send a message to that process. System security is intimately connected with the idea of knowing the name of a process. If we do not know the name of a process we cannot interact with it in any way, thus the system is secure. Once the names of processes become widely know the system becomes less secure. We call the process of revealing names to other processes in a controlled manner the name distribution problem— the key to security lies in the name distribution problem. When we reveal a Pid to another process we will say that we have published the name of the process.”

Here is the fundamental trivialisation done by Armstrong: the reduction of Security to “distributing” the “name” of a process. This in turn leads to the fiction that is process names (PIDs) are not “revealed” then the “system” is secure. But in fact in any system the PIDs *are* published more or less widely, or can be accessed directly or indirectly if an attacker takes control of the environment where Erlang is running. Hence, security cannot be reduced to “controlled publishing” in in fact, Security requirements must be applied to the forms, mechanisms of “publishing” itself, i.e. Security needs to be applied not to the names of processes but to the processes to distribute names.

The conclusion is inevitable within this trivialised scope:

“If a name is never published there are no security problems. Thus knowing the name of a process is the key element of security.”

These statements border on the absurd, because if the names are never published then there is also no “system” to speak about. So “knowing” the name of a process is not “the key element” of security. Instead we should instead assume that the names can and will be known and we should be able to plan and effect Security mechanisms to distribute these names in an acceptable fashion and under an agreed trust model.

I regret to leave things here as there is much to like in Armstrong’s thesis, a text which synthesises much of the debate around programming languages and opens new ways to address the tasks of application development.