Monday, August 3, 2009

study of kipf's law

http://www.sciencedaily.com/releases/2009/07/090730073915.htm

" The law of brevity, proposed by the American philologist George K. Zipf, along with others, shows that the most frequently-used words are the shortest ones.

...when dolphins move on the surface of the water they tend to perform the most simple movements, in the same way that humans tend to use words made up of less letters when they are speaking or writing, in so-called "linguistic economy".

The research study includes the case of Oscar Wilde's novel The Picture of Dorian Gray. The most-used word is the three-letter article "the", while other larger ones, such as "responsibilities" are hardly found at all."

----------------------

http://en.wikipedia.org/wiki/Most_common_words_in_English

D: "A" and "I" are in the top 10 most common words and are only one letter.
We do not see a two-syllable word until #61 for "people".

This pressure for brevity is problematic for Decimese.
My approach to core vocabulary closed-class function words is pretty wordy - at least until certain consonant cluster data-compression approaches are used.

Take, for example, the pronoun "I".
Single, first person, pronoun.
There are 3 pieces of information there.
When we contrast he, she and it, we also find masculine/feminine and human/not categories.
Essentially, we now have FIVE discrete facts.
Add a pair-plural category, and we now have SIX.
Being able to include human/not and masc./fem. aspects to all the pronouns might be nice - particularly if we ever see sentient AIs, LOL.

How to compress such a complex approach into one syllable?
Easy.
Basic word form: CV (consonant then vowel) + ending.
Consonant clusters L/R, W/Y. Plus limited vowel diphthongs.

e.g. http://en.wikipedia.org/wiki/Diphthong#English

Initially, I had hoped to have ten vowel sounds.
Then I realized that was hopeless for an international language.
I was forced to reduce this to five - pretty much as per Esperanto.
I still like the idea that 'long vowels' may be optional for advanced speakers, while word particles or additional syllables can stand in within the same role.

Here is a proposed basic design for a Decimese pronoun.
1) single/ plural (math concept) - plus 'pair' dual plural concept.
2) masc./fem. (one can use this with animals, or even objects if desired)
3) in/out (space concept, or math object manifold concept)
4) near/far (space concept)
5) human/not (one can humanize something if desired)
Some interesting options result from this approach.
One could say I BUT
a) not human
b) masculine (I, man! <:)

Core syllable: CV (plus ending consonant or 'cap syllable')
C plus LRWY plus V plus vowel diphthong plus ending.

There would be pressure to minimize detail, once the subject has been described.
You / plural, or they, would likely become you.
She becomes it.
We becomes I.

The H-sound serves all manner of special functions, creating special-duty syllbles in the form H plus vowel.
This was borrowed from Ygyde.

Closed class function words could use reserved syllable and word forms.
CV is the most common word form.
For vocabulary needs, I stick to CV plus (nasal consonant ending).
This means that any CV word is by defintion a function word.
The objection would initially seem to be a loss of clear word boundaries.

I meant.
Me meant.
Memeant
Meme...

But in Decimese this is not true.
Nasal consonants always imply a word-final consonant position.
Yes, this is a shameless attempt to cater to the Chinese.
The word-initial consonant is clearly differentiated from word-middle position by the voiced/voiceless distinction.
(Chinese could use heavy aspiration in lieu of this.)
E.g. P/B pair. Bam. Not pam. Bapam. But not pabam.
Note: this IS culturally biased.
English, I think, tend to devoice word-middle-position consonants.
Whereas the French do the opposite.
So yup, once again we have a cultural bias.
I know.
English is king, and Chinese is the rising star.
French was fighting a losing battle against Esperanto to maintain its pre-eminence in diplomacy a century ago.

Back to pronoun construction.
CV.
5 possible word-initial consonants. 5 vowels.
25 possible permutations.
4 possible consonant clusters.
??? possibly 5 possible vowel diphthongs.
OK, let's narrow this down.
Only ONE word-initial consonant designated for pronouns.

1 x 5 x 5... 25.
Because LRWY consonant clusters are mutually exclusive without additional syllables, we would want various mutually exclusive states described.
At first blush, I think single/plural, plural/dual and collective noun might be a good default.
With about 5 additional vowel diphthongs, once again, we want mutually exclusive.
Something we *could* do, though it sounds complex, is allows some vowel diphthongs and/or consonant clusters to denote compound conditions.
E.g. single AND masculine. Plural and feminine.
HE. And THEY (plural of she).

Ceqli is willing to sacrifice some brevity for clarity.
Go. I.
Zi. You.
Gozi. ... We.

Some languages have we-but-not-you or we-but-not-he designations.
Again, a variant prime number system could work.
(see much earlier entry).

If I just wanted to map English pronouns then my job becomes much easier.
The he/she/it difference does a sloppy job of identifying the subject.
Some dogs get called she.
Some cars get called she.
Only third-person gets a gender identifier.
It makes this optional.
And plural hides it again.
Sticking to spatial/math concepts, we then have strictly optional gender and human indicators.
Suddenly, calling a bitch (female dog) 'she' or 'it' ceases to be sloppy.
A category for living/not and adult/not is also useful.
Dog. Bitch. Curr. Puppy/ dog. Plural/not.

At public speeches, the speaker will often use the term 'ladies and gentlemen'.
Analyze.
Plural-human-adult-honorific, males same.
Six syllables.
Plural-human-adult-honorific, second-person (out, near).

Note the implications for vocabulary building of this word-particle approach.
Dog, bitch, curr. One word, plus some endlessly recycled core concepts.
A pair of jeans. A whole bunch jeans. Dual plural, plural.
A murder of crows, a heard of cattle (which is not clearly derived from cow).
Plural crow. Plural cow.
Steer, stallion, et al.
Just masculine, adult.
Hmm. Adult / not denotes living inherently.
Dog, puppy. Cat, kitten.
Person, people. Human indicator.
She/ dog. A bitch, living indicator.
She/ car. Female, no living or human indicator.

Well that's enough for now.

1 comment: