What the heart of AI language models can teach us about our cognition process
Effects occur over material, energetic and informational means. All these types of influences can result in a change of state. We can primarily compare and contrast material-energetic vs data-driven ways of inducing changes. The material-energetic effects are deterministic, dictated by laws of nature. On the other hand informational effects bring about their changes through probability distributions. Is the concept of probability at the heart of cognition, whether it be by a human or a machine?
Imagine someone walks across the street staring at her phone. If there is a traffic light right in front of her, she will hit it, and if she was running, she would end up falling onto the floor. She has no choice about it and that would be the consequence. For the given material and energetic state, another necessarily follows. However, if person A tells person B “lay on the floor”, there is certain probability x that person B will respond “yes sir” and do the prescribed action, and another probability y that person B will say “no way” and refuse. Moreover, if person A tells person B “teleport to Mars in 1 second”, no amount of wishful thinking will make it possible. Therefore, and this is an important point, we can say that data influences bring about changes upon material-energetic states via probability distributions that are constrained by laws dictating material-energetic changes. The curvature of space-time will cause the apple to fall from Newton’s hand, but the data load from within and from without Newton’s mind will induce him to open or close his hand. Here lies a subtle difference between the nature of information and everything else that is not information. Information has two aspects: affecting aspect constraint by laws of nature, as well as cognitive understanding of the cybernetic entity that processes the information that effectively has no limits.
We can extend this analysis onto the linguistic properties and in the process, hopefully, understand a little bit more about how the human mind works. Partially inspired by how linguistic AI models work, I believe that human mind and thought processes are essentially language models acting as autoregressive likelihood machines similar to the way information works by bringing about energetic-material state changes via probability distributions. While in the above paragraph we considered the relation between data and “real world from without”, the same applies to “other data from within”.
In this respect, the concept of local milieu is really important. James Bond in the blockbuster movie Skyfall goes through a word association test, where he is asked to say a word that pops into his mind after hearing a proceeding word.
The initial prompt provides the context for Bond’s mind language model which produces probability for a set of words. The word with the highest probability gets chosen. Hence, similarly to the way information brings about changes to material-energetic states limited by laws of nature, the next word prediction of a language model is “constraint” by what model deems probable and this is the heart of language models. Because the next word is predicted based on past words, this kind of model is considered to be autoregressive.
The idea of local context is fundamental is language models. In the Skyfall movie example, we see an application of the mind model. When a computer language model is trained we are essentially developing contextual word associations depending on what words “live next to each other” and the various types of AI model architectures vary depending on what defines a context and how far back and/or forward we look when we consider a given word. For James Bond in the movie, the associations are “day … wasted”, “gun … shoot”, “agent … provocateur”, “woman … provocatrice”, “heart … target”, “bird … sky”, “M … bi***”, “sunlight … swim”, “moonlight … dance”, “murder … employment”, “country … England”, “skyfall … done”. As a fun experiment I asked ChatGPT to come up with its own associations using the same prompts that were received by James Bond: “day … sunrise”, “gun … weapon”, “agent … undercover”, “woman … empowerment”, “heart … beat”, “bird … flight”, “M … message”, “sunlight … radiance”, “moonlight … glow”, “murder … tragedy”, “country … nation”, “skyfall … apocalypse”. What does that tell us about the state of mind of ChatGPT vs James Bond?
Inspiration reading list:
- Stanisław Lem, Summa Technologiae, Chapter 7: “Creating Worlds”