Thoughts on the Chinese Room Problem

An interesting talking point in contrast to the Turing test is the Chinese Room Problem. It postulates a sealed room in which a person with no ability to read Chinese receives handwritten messages in Chinese. The room they are in is filled with books that contain possible responses to sentences in Chinese. The room's occupant is instructed with enough information to match the characters of input given to them, and lookup the corresponding output. They then copy the characters onto a sheet of paper and slip that pass that message out of the room. Assume there number of books of responses is large enough to contain a response for any possible input. The languages was selected (by an English speaker/reader) since Chinese is entirely foreign to them. The Spanish Room or German Room Problem wouldn't be quite as hard since the languages have similar enough origins that people can often get a sense of the meaning. With no background in Chinese writing, it seems unlikely an English speaker would interpret much, if anything at all, from inspecting the messages.

In this way, the occupant of the room is an intelligent person mechanically carrying out a conversation between the person passing messages into the room and the author of the books. Their contribution adds no intelligence to the conversation. They are a simple liaison.

Now suppose a person on the outside of the room is told they have the opportunity to engage in a handwritten conversation, however, they are told there is a 50% chance the room contains the occupant described above, and a 50% chance the room contains a Chinese person who will answer as they see fit.

Does this remind you of the Turing Test? It should, as its considered by some to be a novel critique of the Turing test. If a computer could contain a large enough database of worthwhile responses, would we consider it intelligent? In theory, if a room with a large enough library of response books existed and could reliably respond in a way that it's conversation partners could not distinguish from a real person, we would say it passes the Turing test. But then how can we consider shelves of dead trees with ink on them to be intelligence?

First off, to have a useful definition about the Chinese Room Problem, we need to separate intelligence from consciousness. If such a room could be created and fool the Turing tester, I still wouldn't be convinced the room has attained the property called consciousness. Would I call it intelligent? That's a matter of one's formal definition of intelligence.

The Chinese Room Problem highlights the seemingly gray area between truly understanding something, rote implementation of a novel solution, and simulation. Deep down, I think that should people of the future come to deeper understandings of consciousness than we have today, they may look back on this idea the way we look back on Xeno's paradox. On its surface, it seems like a reasonable and perplexing idea. But after one or two calculus lessons, it becomes immediately clear exactly what information was lacked by older civilizations.

Conversations have context. No chat bot that treats conversations as a process in which an input can be discretely mapped to an output with no reference to the past is going to win the Turing Test. Granted, a really well designed memory-less responder might make some very good and seemingly intelligent responses to certain inputs. But the lack of that system to incorporate information from the earlier points in the conversation will quickly reveal it for the fraud it is.

Thus, for the Chinese Room Problem to be interesting to me, we have to slightly alter the definition. We need to assume that the occupant doesn't just take an input message and find it's corresponding output. In order to be convincing, we need a room in which the occupant must assemble a history of all the correspondence with one individual up until the most recent utterance. That concatenated string must be looked up in the reference books, and an appropriate response to the conversation should be returned.

Interestingly enough, my slight change does not make the problem more difficult to solve! In either case, the room needs to have some response to every possible input, so any string of any length is a valid input. Thus, both are uncountable sets, thus both have the same cardinality: \aleph_0.

One might argue that a space constraint exists since every possible string needs a response. However, when considering the set of all possible strings, most of them would be regarded as gibberish (e.g. "Tze abdiefose f39fke"). To which the Chinese response books could be allowed the convention of having some responses map to "Respond with stock answer #394" which might commonly translate to "I don't understand what you're saying."

The important observation about the switch to a conversation based response system is that now some notion of memory can be simulated. A key property of intelligence is the ability to quickly alter one's internal representation of the world. If a friend tells me a story about their neighbor Richard (whom I've never met), I presume that for the duration of that conversation, the name Richard applies only to the neighbor unless otherwise noted. I accept "Dick" as an implicit replacement for "Richard". I infer a great deal of implicit information from the statement "With his divorce getting ugly, we figure he won't be around much longer." Did I infer his estranged spouse might be plotting a kidnapping? No, I infer there's some financial constraints, mostly likely causing Richard to need to sell the house next door to my friend. In the universe of all possible responses, one could be crafted by a presumably intelligent Chinese author, which takes all implicit and explicit information into account. Once put to paper, the seemingly intelligent response need only be mechanically retrieved and returned.

Even attempts to break the system such as saying "Let's end the current conversation and role play as if I was Dale Cooper from Twin Peaks and you're Fox Mulder and we're meeting to compare case files", could reasonably be scripted by an intelligent author, again, in the space of all possible strings.

So while the theoretical limit mentioned earlier might not seem like a constraint, there are some physical constraints at work. The number of possible conversations responses grows exponentially with the length of the input conversation. While the responses could be indexed efficiently and thus only require the use of binary search to retrieve, they have to be retrieved from some persistent storage. That storage has to take up space. So at least by modern standards, a fully enumerated set of Chinese Room response books would need to be exponentially large. Perhaps there's some physical limit about the maximum conversation length given that we want a Room that doesn't require a significant amount of the space to exist and energy to power it. Perhaps intelligence has something to do with the manner in which we compress this information efficiently while still being able to rapidly adopt new information.

In order to rapidly adopt new information, the fully verbose version of the room would need an intelligent agent to have pondered and pre-written every possible response. Exactly when did that enumeration process take place? Could the intelligent response writer have completed a full library of responses in a practical amount of time? Although these are relatively hand-wavvy arguments, I do feel that the very existence of the postulated room is highly dubious.

But if we take it as an axiom that this Room exists, is the Room intelligent? It certainly embeds intelligence. Have you ever mastered an activity to the point that it's essentially unconscious for you to reproduce it? Do we regard guitarists who've mastered a particular composition to the point that they daydream while their muscle memory plucks away on auto-pilot? Do public speakers who read their pre-composed speeches off a teleprompter mean their words any less because they were drafted yesterday and read aloud today? Will my Youtube videos be any less informative when I'm no longer alive to check their view counts?

I think the Chinese Room Problem illuminates virtually nothing about intelligence itself, aside from providing yet another example of how a behavior can be regarded as intelligence but be the result of thinking that took place long before the execution of what those thoughts decided should be done or said. What the Chinese Room Problem does do is highlight an apparent paradox when we consider whether or not something is conscious. I'm far from satisfied with my understanding of what consciousness really means. Although I believe there are many human beings significantly more knowledgeable on the subject than I am, I remain unconvinced any of them have a full depth of understanding of the phenomenon either.

Consciousness does seem to be a gradient, not a binary property. I once heard the comedian Ari Shaffir eloquently describe excessively drunk people by saying "oh, they're not pressing record right now." Surely what those individuals are experiences is somewhere in the middle of my current experience and that of some dead trees with ink on them.

So clearly the intelligence anyone ascribes to the conversation they have with the Room occupant (whether human or book form) really comes down to the content of the responses. I'm imagining a situation in which I want to converse with a person who has absolutely no background in computer science, and I want to teach them the concept of PSPACE. I believe every human being has the potential to understand this idea - not just the academic elites. However, the state of the literature is a bit jargon-y and hard to traverse for the average traveler. Additionally, explaining this concept really needs to be build upon more fundamental concepts. I don't think someone could gain an understanding of PSPACE without first knowing about Turing Machines and Big O analysis. By transitivity from my previous statements, these are concepts I think everyone can learn.

Many years ago, long before Linhda and I met, I thought an excellent way to break the advice on a first date was opening with a discussion of PSPACE-Completeness. Strangely, I often got responses of women saying things like "Yeah, I'm not interested in this topic", "I prefer to leave the academic stuff at school and unwind on the weekends", or "Oh, something has come up, I'm going to need to leave". For those willing to play along, the discussion was a two way street in which my conversation partners started to adopt the domain specific terminology I used as they came to understand it. They asked questions which were insightful about what they had learned or not learned so far. Through our back and forth utterances, I was able to update my mental state of the world to include my assessment of the knowledge they had gained.

Does that invalidate the intelligence of the the dismissive agents? Of course not. But in that specific instance, it did mean I couldn't measure their ability to learn. Perhaps the conversation would turn to music, and an exchange of preferences would lead to recommendations or commentary on genres and groups. Even this more common topic bears the same key property - as conversation agents exchange information, they start to reply with utterances that demonstrate their ability to adopt new information into their mental state and leverage the common knowledge they share that both agents understand the same concept.

While the idea of "observing first hand that an agent has the capacity to learning and retain information" is not an explicit requirement of the Turing Test. I conjecture that most judges in the Turing Test are implicitly or explicitly considering this idea central to their evaluation.

What, for me, the Chinese Room Problem provides is a novel thought experiment stressing that consciousness is a phenomenon related to but independent of intelligence. It's something we either don't have a good definition/understanding of today or perhaps something impossible to define with precision. The measure of intelligence need not be synchronized in time to be demonstrable. If a dialog with pre-computed responses found in books is able to engage in interesting conversation, the mechanical nature of the process is a distraction. We should really be investigating the source which generated all the volumes and found a way to encode intelligence into the Room.