The Processing Unit as a Summoning Circle

When I began learning about machine learning, and the much-hyped “deep learning” subbranch in particular, I necessarily had to learn about the abstract realm of “Vector Space", where all the work of deep learning actually takes place. At around the same time, I was reading most of the backlog of Charles Stross’ “The Laundry Files” books, which deal with a version of cosmic-horror that's synthesised with computer science to create a geeky and compelling, cyberpunky take on the genre.

These two sources of inspiration left me wondering about the accidental prescience of Stross’ work. You see, when one is training many forms of machine-learning algorithm, and in particular when one is training a deep-learning neural-network (or related paradigms like boltzmann machines and deep belief networks, or self organising maps), what is actually happening is eerily similar to the mathematical occultism of the Laundry Files. And yet, it's unimpeachably mathematical as well. This sort of viewpoint is, uh, thought-provoking.

The basic form of neural network training, and the one covered by most tutorials for beginners, looks like this. You imagine to begin with that you have a dataset of something worth learning about, whether it's numerical data or a corpus of literature. And, that you have sufficient computational power to do something with it.

You begin by “Vectorising” your data- that is, you find a way to turn the data into arrays of numbers, usually of constant length, which retain some of the essential information about the data, so that a number-crunching system can make sense of it. For example, for numerical data one could just normalise the data and remember how to reverse the operation for the outputs. For text data, one can just directly use the word-frequencies, or one can transform the corpus word-by-word into special mappings of vectors (that can convey an occult wordly knowledge of their own, including dark urges that you might not share).

Then, a neural network with an architecture of your choosing is provided the data as input, and for each layer of the network the data and an “activation function” is applied using the weights stored on each neuron, in order to transform the input into an output. This is done for each “layer” until output data is obtained.

Here is where things get a bit freaky. For a given network, the weights stored in the neurons will begin as random values, chosen from a random distribution that facilitates learning… but with no actual information content of their own, yet. And, as the data passes through, no information is conveyed from the data into the neurons, either!

Instead, for the purposes of training you will have an “expected value”, a “target” or “correct” answer, that you expect the neurons to provide as final output. This is the output at the other end: you don't know what values the network should provide between its layers.

So.. how do the network's neurons ‘learn’? Well, they don't exactly. Rather, we determine how ‘wrong’ they network is from the expected answer, and using some calculus we are able to assign a ‘direction’ to the wrongness, and say ‘in abstract mathematical terms, the true answer is that way, so adjust your weights so that you might give an answer closer to the true one next time’.

Each time an input is provided that's different, the direction of wrongness will be unique and ideosyncratic to that input, so these training exercises are often done in batches and the directions towards the various true answers are sort of averaged over. At the end of each batch, the neural network's weights are all shifted in the direction that will make them more ‘correct’ next time.

To me, this is a very strange process. If one visualises this process, as many videos have done, one imagines the network popping into an abstract ‘Vector Space’ and getting a chance to see a bit of its surroundings and move closer to the correct answers each time. Common failure modes include the network falling into ‘local maxima’, meaning that all directions appear to give worse answers, and also runaway feedback loops that send the network ‘learning’ at wildly wrong velocities.

But when it works, it feels something Strossesque, like a summoning. You are not generating but discovering the solution. Somewhere out there, in vector space, there is a correct answer for every machine learning question, and the processes of machine learning are directing our constructed mind, our golems of numbers and abstract dimensionality, to seek them out. They can only find their solution if they have the right information, but also if they have the right structure, the right forms and geometries. And when we find the geometry, the inputs, and the solutions we need, we break the circle, we bind them, and we operate them as daemons on our computational substrate. Which, appropriately, may not even consist of a discrete computer anymore but an abstract aether of virtualised computational substrate.

A Solomon-style summoning circle

True to Stross’ work, these summonings take something from us every time. In exchange for this power, to seek out the perfect servitor and command it to do our bidding, we have to subordinate our model of the world to that of our daemons. Because, any mathematical abstraction of our world that can capture the true complexity of the world as an educated and intelligent human sees it, would be no easier to create than designing humanity itself and then rearing a human child to adulthood. If the daemon can only work with a data model that excludes our common weal, then its summoners will discount, excuse, or disrupt that common weal in order to feed the daemon. If the daemon can best categorise those people, or those artworks, or those hot-takes which are familiar to it, then its substrate will become optimised to exclude novelty which might confuse the daemon. If the daemon can only spin straw into gold when the straw is stolen, then the daemon's keepers will steal rapaciously.

These daemons can optimise, but only if we optimise first. We must pay the toll to open the mathematical gates by first reducing ourselves into digestible vectors, and only then can we profit by what arrives through those gates. I am excited and scared by it. I summon these models, these daemons, and I feel that there is so much good that we can do if we use them well. But I can't get it out of my head, that I'm not so much building as summoning these mathematical servants, and that perhaps I've let the wrong one in.

 Share, if you like. Fediverse sharing is preferred, though.