Wednesday, February 29, 2012

Review of top 1000 words. Espo.

D - I wrote out and analysed the top 1000 most common English words. Then translated all of them into Espo. Never say "Esper***o". It attracts the spamming zealots.
The top 100 words were very illuminating. It pretty much summed up everyday human concerns.
Words - verbs - say, think, know, like, make, take.
I -he - it. Which -when -what. Well- good - great. Time - man - thought - day - house- life. Hand and eye. Water and oil.

I thought word frequency followed Zipf's Law, but it follows another one even more closely.

"... in the Brown Corpus, the word "the" is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). True to Zipf's Law, the second-place word "of" accounts for slightly over 3.5% of words... Zipf himself proposed that neither speakers nor hearers using a given language want to work any harder than necessary to reach understanding, and the process that results in approximately equal distribution of effort leads to the observed Zipf distribution."

D - there are many elements of natural language, spoken and written orthography, which are already optimized after a fashion.

D - but Benford's Law seems to come closer.

"Benford's law, also called the first-digit law, states that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. According to this law, the first digit is 1 about 30% of the time, and larger digits occur as the leading digit with lower and lower frequency, to the point where 9 as a first digit occurs less than 5% of the time. (see image)"

D - translating the 1000 English words to Espo took roughly 12 hours. Early on, closed class function words are highly represented. Particularly prepositions, articles and conjunctions. Later on, most words are standard lexical entries.

D - observations: the evaluative words, particularly for comparative and superlative, are assigned special short forms. For example, good - better -best. Bad - worse -worst. These are closely followed by similar words for spatial size and their metaphorical equivalents for time.

D - here are a few pet peeves about Espo.
- This = tiu c*i (and these)
- Any = ajna, iu, tiu aux alia
- Better - pli bona (and best)
- Under - mal-supre de
- Except - escepte de (C is a tongue-twister)
- Worse is, well, worse (pli mal-bona)
- Otherwise- se ne
- Beside - flanke de
- Although - malgraux de.
Big Z was on to a good thing with modular construction. But a word used often also needs to be short. And only one word. It needed more effort.

D - thoughts. I think a very limited and basic Somali-style preposition system could benefit from optional detail from the MELTS acronym system - Math, Space-Time, Logic-ethics. This core approach also affords us the brevity to prevent excessive wordiness in basic concepts.

D - English is full of misleading cues to lead the unwary astray. Take "stranger", for example. One might initially assume that it is the comparative version of "strange" and that there may exist a verb "to strange". Nope.

I have much to think about. The only vocabulary I have any interest in developing would involve the concepts underlying these top 1000 English words. And to do so with more clarity and brevity than Espo has. Brevity for Espo was impossible, even with their systemic / derivational approach, once they settled on familiar Euro-derived roots. For example, "iras (to go)" never involves the IR part devoid of some vowel-cored suffix. Because of this, simply listing a single consonant in a taxonomic fashion was possible. Iras could have been the root 'r, so "to go" could be 'ri. Instead, we immediately have the onerous burden to use at least 2 syllables for even the most rudimentary of verbs. This is particularly disappointing regarding modal or primary verbs.
There are also multiple examples of largely redundant homonyms in Espo where there was no need of them. It was simply not designed with economy of lexical entries in mind.

S'ok. I'll do better.

1 comment:

dino snider said...

Also, to this day, I still cannot reliably recall Espo interrogatives. I hope to beat that issue with my MELTS core acronym - Math, Space-Time, Logic-Ethics. It'll get incorporated into many closed class word categories.