Simple NLP – Language Invention

Simple NLP – 2

Author: Jerry Li

Previous Blog: A Brief History – the Beginning

Linguistics – Language Invention

If I tell you that language can be invented, you probably won’t be surprised. You see all the time examples of how some parts of a language are changed, like new phrases being invented on social media. You and your friends may have some secret phrases referring to a memorable moment. J. R. R. Tolkien, the author of The Lord of the Rings, constructed languages that only exists in his fictional world, like Elvish, Dwarvish, and others. It’s really cool for one single person to be able to develop such a rich system of languages just for his novels, but what’s even more amazing is that Tolkien is not the only one with the ability to invent a language. In fact, everyone can, if they are put under the right environment at the right age.

Elvish on the One Ring from The Lord of the Rings

The Ring

What is language?

Before delving into the story of how a language can be created, let’s first talk about what exactly is language. If language is just a system for communication, then do other animals have language? Dogs bark to each other. Birds can sing. Bees have dancing as well as chemicals to tell others about food sources or enemies that could be miles away. So, what’s the difference? It is widely believed that only humans, with the largest brain to body weight ratio in all species, has the ability to learn and invent such a complex communication system. Even though some animals, like Kanzi and Chantek, learned human sign languages after training, their language ability only compares to that of a 4-year-old child. Sentience is the prerequisite of using a language. A language is a system expressive enough to communicate our sophisticated thoughts.

Chantek the orangutan who learned American Sign Language


Is Language Innate?

Humans have lots of things other animals do not enjoy. We can cook, write, do Math, farm, etc. Those are mostly considered technologies instead of innate human abilities. Is language one of those technologies discovered by someone through accident or is it more like the ability to walk – innate to all humans without formal teaching? So far, the innateness of language ability is still under debate among scientists, but there are a few evidences pointing out that at least some knowledge about language is written in our brain and not acquired through learning.

Universal Traits in All Human Languages

Isn’t that kind of amazing that no matter where we are born, we all walk more or less the same way? This fact indicates that there is something universal about how muscles and the brain work together when we walk. Similarly, for languages, if we can find some universal traits that are shared by all languages, then it is an indication that at least something is written in our gene about the language we use.

The Ethnologue catalogue of world languages, which is one of the best linguistic resources, says that there are around 6909 living languages in the world (From Number of languages). All of them are different in one way or another. For instance, the grammar structure of a sentence is not necessarily the same in different languages. In Japanese, the verb is put at the end of the sentence, but in English, the verb will be put between the subject and the object. Here’s an example:

I ate an apple.

私 は リンゴ を 食べた。

I (Topic marker) apple (object marker) ate.

As another example, according to Linguistics Society, in Welsh, the usual order is for the verb to come first, followed by the subject, followed in turn by the object:

The student bought the book.

prynodd y myfyriwr y llyfr

bought the student the book

Pronunciations of a language seems quite arbitrary as well, proven by the fact that different languages usually call animals by different names. In addition, words were probably pronounced differently centuries ago than they are now. Grammar is no exception. If you have ever learned grammar or tries to explain grammar to a non-native speaker, you’ll soon find out that some of it too seems arbitrary. “That’s just how English works.” I was often told when I learned English.

So out of the 6909 living languages, and even more dead ones, are there any traits shared by all of them?

There are. In fact, scientists have found a quite a few. Here are four of them:

  • All languages have nouns, verbs, objects, and pronouns (like I, we, they).
  • All languages have at least two vowels (vowels are like a, i, u, e, o etc.).
  • All languages have at least three sizes of grammatical units: word, phrase, and clause.
  • “If a language distinguishes dual number (a grammatical category indicating “two”) in pronouns, it also distinguishes plural number.” (From

Those facts seems to suggest that the basic structure of a human language is written in our gene. This is the central idea to Universal grammar proposed by Noam Chromsky, one of the most famous linguists in history. Even more, when brand new languages are invented, they follow the same commonalities.

Inventing a Language

In the 16th century, not long after the (re-)discovery of the New World by Columbus, the brutal slave trade that abducted 10 million Africans began. Those unfortunate African slaves who would spend the rest of their lives on American plantations came from all over Africa and did not speak the same language among themselves. Indeed, slaves from different regions or tribes were placed together exactly because they cannot communicate with one another – no communication, no rebellion. To work with other slaves and receive orders from their masters, most first-generation African slaves picked up pieces of language from the slave masters. These were short phrases, words, and sentence fragments that had limited vocabulary and no unified grammar structure.

This kind of broken language is called a pidgin. Pidgin languages can be found everywhere around the world when two or more groups of people interact with one another without a common language base. Words are borrowed from other languages and are adapted to serve new purposes.

According to, in Hawaiian pidgin language, the word “brah” (which is also used in contemporary English slangs) means “brother” and the word “cockaroach” means “to steal”.

Back to the American plantations. When slaves married, usually also between two people speaking different mother tongue, the couple communicated with each other and with their children using a pidgin. All the children growing up listening to pidgin then did something that fascinated linguists.

When the children heard those fragmented words and phrases, they spontaneously tried to fill in the missing grammar parts. As an example, if their parents’ pidgin sentence seemed to be missing an object that was implied, the kids would fill them in. If the parents did not know the word for some object, the kids tried to reuse and combine other words. According to The Language Instinct, simple verbs in pidgin language such as “go”, “stay”, and “came” are used systematically in Hawaiian Creole grammar as auxiliaries, prepositions, case markers, and relative pronouns. Moreover, “The English past tense ending -ed may have evolved from the verb do: He hammered was originally something like He hammer-did.”

Furthermore, when the slaves’ children got together, their languages started to merge and form a new language. What if the kids didn’t like how a word sound to them? They just came up with a new one and started using it. If they found a grammar to be counterintuitive, they spoke in whatever grammar that felt right to them. This kind of language is called creole – language spoken by people whose mother tongue is pidgin. Those children invented their own brand new language within a generation, with their unique set of words, new grammar, and new group of people to speak it.

Image of a slave family
  • Those children of slaves are the inventors of new languages. *

    Slave Family

Creole language has made a deep impact on how we use language today. Some linguists believe that the Black English widely spoken today among African Americans, also known as Black Vernacular English(BVE), was probably an English-based creole language. Phrases like “Don’t nobody know the answer, Ain’t nothing going on.” is grammatically correct in BVE, but not in standard English. Creole Languages are excellent examples of how languages can be borrowed, created, and adapted constantly. And guess what, all Creole languages also adhere to the same set of traits common to all other human languages, even though most of their inventors did not even go to elementary school.

In the next blog of the Linguistics series, I plan to show some interesting facts about children learning language, which will provide some insights to how human perceive language and how we can use that information for NLP research. If you have topics that you’d like to read about, just let me know. Thanks for reading!


The Language Instinct

History of African American English in the U.S.

Pidgin language example

Nigerian Pidgin Wiki

Image source:

The Ring


Slave family



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s