FlowBasedProgramming | RecentChanges | Preferences

The following is from TomLaudeman, except some comments by PaulMorrison (marked --pm):

I've got a working, very small, Definitionary in English and Spanish, as well as crude HaikuTalk encoder and decoder. Incidentally, all written in Deft and using Postgres as the backend database.

HaikuTalk depends on people's willingness to mutate their typical symantic concepts to a documented, fundamental system. The Definitionary helps in this process because "writing" in HaikuTalk is essentially stringing together definitions. Aside from some forcing very similar definitions together, a definition is capable of encapsulating cultural bias. Also, HaikuTalk will allow several definitions to be selected for each term. The encoding is numeric (I'm currently using the primary key of the definitions.) When the document is decoded, the words chosen are highest ranked for the given primary definition. In my current demo you put the mouse over a word, and the definitions pop up. It works amazingly well, but I haven't started encoding grammar.

A minimal grammar results in sentences that are simple and haiku-like. The nagging question is that given all cultures, "Is there a minimal grammar that everyone will understand?". I use "understand" broadly. If everyone naturally understands with little or no learning curve, that's ideal. If a few unusual cultures have to read the HaikuTalk grammar documentation to understand, that's fine as long as they can come to this understanding quickly (e.g. within hours). HaikuTalk will almost certainly require readers and writers to do some training. If nothing else, certain cultural hints are going to be required.

Do the following sentences work in all languages? Remember that the writer chooses the proper definition for each term. "list" is "create a line by line annotation" and "for sale" is "available for purchase, usually by exchange of money".

You desire sell art? Please list art for sale. Please list price of each item.


I desire visit your city. I enjoy walking in forests and parks. Can you guide several tours of forests? What forests are near your city?

I'm hoping that there are only a dozen or so things the grammar is required to encapsulate. Special cultural references may or may not be practical. I suppose the Definitionary could capture some of these.

Incidentally, I'm not limiting HaikuTalk to one-to-one language translation. Once the document is encoded, it could be decoded in an language for which a Definitionary exists. A web page could be rendered into any of several languages, dynamically.

This section from PaulMorrison

I followed the Whorf link in Wikipedia (which I hadn't know about), and this led me to Dan Moonhawk Alford - this stuff may very well rekindle my passion for linguistics - I am thinking that we couldn't have done this kind of research before the Internet came of age! Actually Moonhawk's stuff seems to address your question "Is there a minimal grammar that everyone will understand?". In one of his papers, he talks about how the S --> NP + PredP? pattern breaks down in some of the N. American Indian languages: http://www.enformy.com/moonhawk-nurturing01.htm. However, if we restrict "everyone" to Indo-European speakers, we may have a more tractable problem. If we try to include, say, Chinese, that will be a bigger stretch, but maybe a special dialect of haiku-talk could be developed. Somewhere in my collection I have a translation of a story by Frederick Pohl from English to Chinese and back to English. Suffice it to say it's nearly incomprehensible, but that may be because of their habit of putting two (simple) words together to make, say, abstract concepts. However, the good news is that Chinese tends to be SVO (subject - verb - object), like most Indo-European languages. It also has no grammar to speak of - it's all syntax - so actually your sample sentences might translate quite well into Chinese.

The examples you give make me think of the phenomenon of pidgins - it used to be thought that pidgins would magically evolve into complete languages, usually referred to as creoles. It is now thought that the best known creoles (e.g. Haitian) are basically West African syntax with French-drived vocabulary - see about the second half of http://www.c2.com/cgi/wiki?CreoleLanguage. So the analogy with pidgins is more apt - the classical pidgin evolved in China, combining English words with Chinese syntax, so it's quite a good match with what you are trying to do, I think. There is a good source for stuff about pidgins at http://en.wikipedia.org/wiki/Pidgin. Interestingly, I did a search for "pidgin" in the c2 wiki, and found some interesting discussions: http://www.c2.com/cgi/wiki?CulturallyNeutralLanguage, http://www.c2.com/cgi/wiki?ModifiersBeforeOrAfterNouns, and http://www.c2.com/cgi/wiki?SignLanguage, among others, all of which seem to nibble around the area you are interested in!

All this reinforces my experience that humans have a hard time switching syntaxes but an easy time learning new vocabularies. I have been speaking French at a certain level since I was 8, and I still can't remember which adjectives go in front of the noun and which go after - which is what http://www.c2.com/cgi/wiki?ModifiersBeforeOrAfterNouns is about! It might be interesting if you can figure out a way to avoid syntax altogether! E.g.

Can you guide several tours of forests?

becomes something like

[Interrogative] able you [plural] tour [more than one] [place relation] forest [more than one] [interrogative].

Please list art for sale.

[Polite request to ...] {present for human consumption} artwork [more than one] [using attribute] {available for purchase}.

Just wool-gathering :-)

You might be interested in looking at my genealogical tree - it's on http://www.jpaulmorrison.com/cgi-bin/geneweb?b=rodkers - the software supports 26 languages - just click on one of the flags to change languages. This software was written by a chap in France and supports a (small) set of words and phrases - mostly he is substituting whole sentences, on the one hand, or individual words on the other, but within this limitation I think he has done an amazing job! Or maybe this is the trick that lets him avoid the pitfalls of syntax!

Which also reminds me of various attempts to come up with an international language - there is a good source of these on http://www2.cmp.uea.ac.uk/~jrk/conlang.html. I feel that, if we restrict the universe of discourse, don't try to support discussions of philosophy or poetry, and maybe avoid some of the "wilder" languages, we can probably do a pretty good job! BTW I notice that there have been some attempts to restrict English for special uses - see http://www2.cmp.uea.ac.uk/~jrk/conlang.html#Reforms. I'd avoid Basic English, though - that turned out to be even harder than regular English for non-English speakers!

Have you looked at the work being done on the Semantic Web - SemanticWeb? Does your work tie in with this? --pm

I strongly prefer not to limit HaikuTalk (HT) to European langauges. I'd also like it to be flexible and rich enough for poetry and philosophical discourse. It's interesting that you bring these up, since Noah and I use them as the more difficult examples.

My current solution is to impose some kind of standard structure on HT. Perhaps it would better be called 'mark up' and probably amounts to syntax more than grammar. If this seems workable, then a more sophisticated approach would parameterized decoding of HT on a per language basis so that the decoded prose looks more natural.

Your rewritten versions of my examples are somewhat along the lines I was thinking. There is another constraint in all this. The sentences have to be easy to type, and that leads me to a very terse markup-command vocabulary. Some things like phrases are easy. Just surround them with double quotes.

So far, I've thought about prepositions that describe location.

man behind-door.

In English, the man is behind the door. The tricky part is that even with a linkage "behind-door", it is possible to understand the positions as reversed. The only definitive solution is to look it up in the HT standards reference, and memorize it. I'm hoping that there are less than a dozen of these ambiguities. (Incidenly, I'll probably get the decoder to handle rudiments like capitlizing first the first letter of the first word of European languages.)

Using the double quotes and hyphenation, we can write a couple slightly more complex examples, both of which are identical:

man-"on-floor" behind-door.

behind-door man-"on-floor".

So perhaps this is what you mean by syntax without grammar.

So far, the examples don't have verbs. I think English speakers who know about the word-linking can understand all the following. It may also be easy for HT to rearrange the word order to sound more natural for each langauge.

man is behind-door. is man behind-door. is behind-door man. man door-behind is.

The second sentence sounds like an interrogative, but there's no question mark, so we know it is declarative. I guess I need to start writing down the HT rules, since we're now up to 3 or 4.

- Hyphen links a preposition/object, adjective/noun, adverb/verb. - Double quotes surround phrases. - ? signifies a question

man runs-fast. tall-man walks-fast.

I'm pretty sure HT can easily rearrange linked pairs as appropriate for each language, as well as modified S/V/O order, and render the above (roughly) in Spanish as:

hombre-alto camina-rapidamente.

Perhaps this could be called "grammar rendering" or "decode grammar" where HT makes an attempt to create a pleasant looking sentence in a given language.

Do you think these solutions are workable? What is the next roadblock? I don't have a good answer for handling a, an, the. I'm kinda thinking disallow them, and encourage people to be more specific.

Bad: twice a week Good: twice during each week

Ugh. There's a problem. "Jane, clean twice during each week". I'm not a lingustics person. In fact, I have a tenuous grasp of grammar, so this stuff is difficult for me.

I am reassured. I came to pretty much these same conclusions a while back. Top contenders numerically are Mandarin Chinese, Hindi, and Spanish (based on my recollection). English, European langauges, Russian, and Japanese are economically or politically important.

The demo will be English and Spanish, and the first few hundred words will be predominantly about cars since that will allow me to rewrite all my car-based content in HaikuTalk. I speak some Spanish, and there's a fair size Spanish speaking audience for my web sites in the Americas.

I'd like to form a big picture of the requirements for HaikuTalk, but it is true that the easy stuff will be implemented first.

FlowBasedProgramming | RecentChanges | Preferences
This page is read-only - contact owner for a password | View other revisions
Last edited February 18, 2005 1:21 pm by PaulMorrison (diff)