The Theory of Formal Languages


listen on castbox.fmlisten on google podcastslisten on player.fmlisten on pocketcastslisten on podcast addictlisten on tuninlisten on Amazon Musiclisten on Stitcher

--:--
--:--


2018-04-06

The Theory of Formal Languages

In this episode, Kyle and Linhda discuss the theory of formal languages. Any language can (theoretically) be a formal language. The requirement is that the language can be rigorously described as a set of strings which are considered part of the language. Those strings are any combination of alphabet characters in the given language.

These concepts are relevant to artificial intelligence in that we’d like to consider what sort of machine is required to be able to recognize each language. Recognizing a language involves being able to determine whether or not a given string is a part of that language.

For example, if we consider a subset of the English language that is only lower case, then we could enumerate all the strings, which might look something like this:

aaaaaaaaaaaa
aaaaaaaaaaab
aaaaaaaaaaac
...
data skeptic
data skeptid
...
zzzzzzzzzzzz

We discuss “sheep” language, which takes the form (described here as a regular expression): ba+, meaning that all valid utterances start with b and must be followed by at least one a. This trivial language requires only a trivial finite state machine with four states. Kyle says 2 states in the episode, but it’s really 4.

stack for a memory. Clearly, that’s limiting, but it’s just enough to meet our needs for this other language.

I hope that it’s quite obvious such a machine is hopeless for recognizing English sentences. This is sometimes a point of confusion, because a PDA is totally capable of recognizing the general grammar of an English sentence. The process of diagraming a sentence into noun phrases, verb phrases, and further nestings down to the individual parts of speech can be accomplished by a PDA. But recognizing a true English sentence from a gibberish sentence that fits the general form is not possible with a PDA. For that, you need a Turing machine which is capable of computing anything which is intuitively computable.

There are other levels in the hierarchy of formal languages, but FSA < PDA < TM covers the (arguably) most important there.