Browse By

The Epistemology of Programming

Computer programming is an activity that lies at the intersection of epistemology and engineering.  The process of reducing a desired set of actions by a computer to the rigid set of instructions that can be directly executed by physical hardware requires a clarity of expression that can only be attained in a properly structured, logical language.  The history of programming languages has consisted largely of the advancement of these methods of expression toward an accurate model of human epistemology, with only a handful of tributary paths leading to apparent dead ends when they strayed too far from how human beings actually think.  However, in the latest round of language development, I believe we are seeing the entry of an alternative philosophy of thought that can only lead to a weakening of a programmer’s ability to attain a clear vision of what they are trying to create.

The earliest programming languages – COBOL, FORTRAN, BASIC, C, and many others – were capable of expressing the four fundamental operations of information flow: assignment of data to named memory locations (variables), iteration (loops), conditional branching (if…then…else), and, in their more advanced forms, function definitions (named subsequences of instructions).  In these primitive languages, each statement was a representation of concrete knowledge, with the only level of abstraction consisting of parameterized subsequences, or functions.  In a very loose sense, this might be analogous to a representation of an animal’s epistemology.  This isn’t to say these languages were not an advancement of technology – to the contrary, they brought the capability of translating internal states of the electronic computer into a language that could be read and manipulated by humans, allowing a dramatic improvement in programming efficiency.

However, as the desired capabilities of software systems increased, the inherent inefficiency of concrete-bound languages often led developers into increasingly complex code structures.  Without rigidly following self-imposed development guidelines, the resulting code could easily become nearly impossible to understand, even to the original author of the software.  Although such a situation can arise in any programming language without proper design discipline, documentation, and careful planning, the inability to express a proper level of abstraction in the concrete-bound programming languages made any approach to controlling the development of large-scale projects extremely difficult to maintain.

Starting in the 1980s a fundamentally new approach to large-scale software engineering emerged, object-oriented programming.  When used properly, the object-oriented framework for expressing the organization of information, its internal structures, interrelationships, and mechanisms of flow, can very accurately represent a model of human organization of thought.  Human thought is fundamentally based upon the identification and manipulation of concepts.  In object-oriented programming, concepts are represented by classes, which combine the abstract definition of data contained in a group of similar concrete objects, or instances of the class, with the operations the software can perform on these data members of the class. 

This closely parallels an epistemological model of human thought in which concrete objects are represented as instances of concepts.  Concrete objects in this sense mean any form of existent or potentially existent object, which includes not only physical objects, but things which are abstractions in themselves as well.  A concept contains all of the essential attributes shared by a set of concrete objects, with the values, or measurements, of these attributes removed.  An instance of the concept is then considered in the human mind with the values set to match those of the object being recalled.  The use of conceptual reasoning dramatically reduces the complexity of our mental interaction with the world by avoiding the need to retain all of the specific attributes of each object encountered and treat each of them as unrelated to all other similar objects.  We are thereby enabled to consider the interrelationships between concepts as a whole, instead of working directly with interrelationships of all of the objects these concepts represent. Simply replacing the word “concept” with “class” in the above description accurately describes the base structure of an object-oriented programming methodology, and indicates the power of expression possible in such a language. 

Two specific forms of relationship between concepts are directly represented in the object-oriented languages.  The attributes of a class can themselves be objects of other classes – hence the concept wheel is an attribute of the concept car, as a car has wheels, and a wheel itself is a bundle of additional attributes – radius, width, hub, tire and so on.  In the object-oriented terminology, this is referred to as a “has-a” relationship (a car has a wheel). 

A more powerful form of relationship is the abstraction of groups of concepts and classes into even more general concepts in parent-child structures.  Thus cars, trucks, and busses are all ground vehicles, and ground vehicles can be considered as their parent concept.  Separately, airplanes and helicopters are children of the aircraft concept, and supertankers and yachts children of the boat concept.  Individual attributes that are common among the child concepts – such as wheel in the case of ground vehicles – exist in the parent concept and are inherited into the child concepts.  Going further with our example, ground vehicles, aircraft and boats are all children of the transportation vehicle concept.  These are “is-a” relationships in the terminology of object-oriented software design – a car is a ground vehicle, which is a transportation vehicle.

A final key component in the object-oriented approach to modeling knowledge is control over access to the defining aspects of classes.  Access to attributes of a class can be open to all other classes, available only to child classes of a class, or completely hidden and manipulated only through the use of various functions defined in the class.  This again is a good representation of how information may be exchanged between conceptually separable elements of the real world, where the internal state of an object may only be ascertained through specific interfaces or appearances.

Despite the great success of fully object-oriented languages, they do require a certain discipline to be followed in order to be properly expressive and best match human epistemology.  Unfortunately, the most recently developed languages have begun to turn away from these disciplined principles, doing so in the name of freedom of expression, and ease of learning the language.  I believe reducing the structured discipline may allow short term gains, but can only result in an illusion of simplicity that will lead to a disastrous increase in complexity, and return us to systems of software that cannot be understood by either other developers, or the original authors.

The Python language is now the dominant language in use in academia and most scientific research endeavors.  Although it is nominally an object-oriented language, it includes some major simplifications, the most serious of which is the “freedom” from needing to declare the class of variables that are used before using them, and instead inferring their type from the assignments made to them.  Declaration of the class of an object before assigning to it is not even an option in Python.  Furthermore, a single variable can be reused in the same section of code to represent objects of different classes, as long as when the variable is used in a function call or assignment statement it is of the expected class at the time the statement is made.  Further eroding the object-oriented formalism, even the attributes of a class are not defined when the class is defined, but can be created at any time by code either inside or outside of the class.

Although Python is depicted as an object-oriented language, and allows the use of classes and inheritance to organize its structure, a close look at how this is done in the Python language reveals that the object-oriented framework is only an illusion in this language.  When working inside a class, the class attributes cannot be directly accessed, but rather must be designated by stating “this” as the object containing the attribute to be accessed.  Stranger still, each function defined in the class must have a first parameter called “this” with which to pass the object itself to the function, even though when the function is actually called, that parameter need not be explicitly listed.  These oddities strongly suggest that the language is only “faking” its object-oriented expression and is, at its base, a completely unstructured morass of data storage and instruction sequences.

Each of these simplifications or freedoms severely undermines the relationship between knowledge expression in the programming language and the proper functioning of human epistemology.  That nothing needs definition before its use, or equivalently, that its use completely defines its meaning, is an empiricist, and ultimately existentialist viewpoint of knowledge.  Existentialism in philosophy leads to a conclusion that the world cannot be understood because there is no meaning to be found.  This is the source of the “nausea” of Sartre accurately names as he attempts to face the world through this interpretation.  Is this really how we want to be designing software?

Leave a Reply

Your email address will not be published. Required fields are marked *