Tuesday, February 3, 2009

computer speech translation, a computer interlingua


Word perfect

"Around the turn of this century it really was appalling – IBM's ViaVoice at that time seemed to have a 90% error rate. But nine years later Dragon Naturally Speaking 10 is for me more than 95% accurate. And when it's wrong, it's usually my fault.

AIST's idea could make such software even more powerful, by increasing the speed and accuracy with which you can dictate long and difficult words and common phrases."

D: the idea that we need to make auditorially distinct sounds for the computer is seeming antiquated.

However, the idea of a grammatical structure that parses easily remains valid.

I read the VOS is the clearest for computers. Note this is also what human pantomime suggests is the most intuitive for humans. (If you want references to websites and studies, I suggest you peruse my older blogs.)

I found a very thoughtful computer interlingua proposal. It is called "lexical semantics of a machine translation interlingua".


Note how closely it resembles a modern Creole-esque IAL.

D: section 25 has useful design principles.

With the above in mind, we can state several general guidelines for word design:

    "1. Start with simple, common verbs and adjectives.  Isolate their
root concepts and apply it to every classifier. Appropriate suffixes
should be used when related verbs have different argument structures
(e.g. "to say" vs. "to tell"). In the process, a very large number
of less common concepts will be automatically derived. This
principle also applies to numeric, deictic, tense-aspect, and modal

2. Keep in mind the inherent difference between basic state concepts
and modal concepts. When in doubt, always test new concepts to
determine if they are modal.

3. If there's difficulty defining a basic state or modality, or if
it has limited usefulness when combined with most classifiers, it is
very likely that the state is not very basic. When this occurs,
postpone derivation until later. You may be able to "accidentally"
derive it from a different root.

4. Always be suspicious of roots that represent energetic states.
Many of these concepts can actually be derived from non-energetic
states that end up being much more productive."

D: see their proposal for kin relationships.
It is well thought out (Section 25.4).

You will note a distinct taxonomic trend in vocabulary design, but without
the excess associated with languages by Wilkin and that of Ro.

D: this site tremendous potential for designing the core vocabulary for a
human interlang.
Being arbitrarily computer optimized, it is necessarily cultural neutral.
It is much more methodical, however.
The main problem with a taxonomic language design has always been that only
one minimal pair is present to prevent misunderstanding. Context is NOT an
aid, since the two words sound so similar in the same category.
Essentially, the trade off becomes
1) easier to learn, can guess general meaning but
2) less clear once in use at colloquial speeds.
Again, my HIOXian letter system should point out phoneme combinations that will
cause particular problems.

D: the emphasis on reducing the number of primitives (basic morphemes) needed
inspired me. I applied that tactic with closed class "function words" in English.
It will form the basis for the "function words" of Decimese (why am I calling it
that still when I have more than 5 consonant pairs now?).
For example, English has the pronouns I, we, you (plural implied), he/she/it and they.
Esperanto touches upon the idea of modular pronouns with the pair of il and ili.
If we parse about English pronouns, we end up with the following core concepts:
1) distance. inside, close and far ( a distinction of some languages )
2) quantity. single and plural.
3) gender, with neuter, masculine and feminine.
Well, why not build the pronouns in modular fashion from these concepts?
Decimese attempts to mitigate the one-minimal-pair clarity issue of taxonomic design
by using the syllable of CV, not C or V letter as the core unit.
This, in turn, requires shorthand version to address the issue of lost brevity.
Get something, lose something. The challenge is to finesse the concepts so the overall
language is more than a zero-sum-game of design elements, where all seem to be a
comparable set of pros and cons.
In the case of pronouns, a taxonomic system, or even a compound -concept approach
would require the brevity of the one grapheme to one phoneme taxonomy.

I think I'd be willing to confess that Esperanto, in its spotty and hazy and sporadic
fashion has touched upon most if not all clever language innovations.

If I stand taller one day, it will only be since I stand on the shoulders of that
giant Zamenhof.

(Out of time, not proofread. Apologies.)

No comments: