Wednesday, December 23, 2009

blog entry 99 5/8. how to get computer to 'meet us in middle' for translating

The Dutch language has many combinations of words whose features cannot be explained by simply looking at the qualities of the individual words. The meaning of 'missing the boat', for instance, isn't always the same as 'being too late to catch the boat'. This type of word combination doesn't pose problems to people, but linguistic computer systems, such as speech recognition software or programmes preparing automatic summaries, just don't recognise these expressions.

Grégoire prepared a list of about 5000 unpredictable word combinations. She divided them up into different classes on the basis of their structure. She looked at the rules of singular and plural; for example you can't 'take to your heel', just 'take to your heels', and 'take to those heels' doesn't work either. Grouping together various classes of word combinations can minimise the amount of manual work to incorporate the list into a computer system and it means that the list can be used for many different systems.
D: notably, this idiom is also difficult to new immigrants and other ESL students.

So any strategy to aid the computer will also aid humans.

A most difficult thing to do is to refrain from using idiom.
I find this nearly impossible, in practice.
English has already been etched into my mind, and this playful and cultural phrasing is part of it.
I think speaking less English than I know may be as difficult as learning more.

Idiom: an expression, word, or phrase that has figurative meaning — its implication comprehended only through common use; whereas the literal definition of the idiom, itself, does not communicate its meaning as a figurative usage.

ace in the hole CAN, UK, USA A hidden advantage or resource kept in reserve until needed
Achilles' heel Global A person's weak spot
across the board Global Applies to everyone or everything
against the grain...

Many idiomatic expressions are based upon conceptual metaphors such as "time as a substance", "time as a path", "love as war", and "up is more"; the metaphor is essential, not the idioms.

D: thus my interest in Decimese having overt optional explicit indicators of context.

English: spatial: forward.
Time: forward.
Wouldn't a time/space indicator be nice.

Out of interest, I developed a geometrical representation of pronouns yesterday.
Pretty simple. It just indicated a circle. In it and we have first person. One dot in centre and we have I.
More off -centre but inside and we have we. And so on.
Laying bare such concepts sans picture but using word-phoneme-lexemes highlights this aspect hidden in English.
D: sweet - many chapters from a book on idiom called "Metaphors We Live By".
I'll hafta read that.

I've been reading over Toki Pona recently. That means the good language.
It is by translator Sonja Kisa in Toronto. It received some media coverage.
She's quite the character!
Anyway, TP only has um 130 words.
Then it relies heavily on compounding to express nuance.
Because these compound nouns are defined in detail as standard, that means learning a whole lotta
multiple-word lexemes after the initial 130 words.
In some respects this resembles Ogden's Basic English. It had a basic vocabulary of 850 words.
The key concept here is that of metronym.
A word that captures a whole class of words would qualify. "Thing" or "item", e.g..

So in many respects, we are simply delaying the need to memorize vocabulary.

I'm really enjoying TP.
It explores just how minimal a language can be, and still function.
It shops around in natural language for 'simplifications' in grammar- then uses ALL of them.

The phoneme inventory is extremely well thought out, being nearly universal.
The only thing she could do that remains is to reduce vowel sounds from 5 to 3 - AUI.

Now I am sooo bad at languages that I am still finding learning TP hard.
My roomie is learning it too. He is a natural-language polyglot, and has guffawed at some of the simple parts.
But we are gonna practice speaking it in our household.
I am using 'cheats' to learn the vocabulary.
For example, NASA means, among other things, 'crazy'.
How did I remember it? You'd hafta to be CRAZY to wanna go in space. NASA goes into space.
NA - NAry
SA - SAne.
Many other words, I recall with naughty memory aids. Sex is always more memorable, since it is taboo.
Sex should be used as often as possible. <:

PIPI and LILI suggest use of pidgin reduplication, in these cases to indicate small.
PIPI - insect. LILI- little.

There are alotta tongue-in-cheek in-jokes in the word names, I think.
Ike - as in Nixon- I think J said it means to lie haha.
"I am NOT a crook!"

I've read some Tao. I hadda keep rereading Tsu to understand him, so didn't get that far yet.
Toki Pona is supposed to express Tao philosophy.

Maybe it is having a good impact on me.
I forgave a coworker some minor, old, and ultimately un-memorable slight from years ago.
Doesn't seem much, but it is a start.
I have found recently that I have only been hurting myself with my grudges. Maybe it's time to let them go.
My roomie told me yesterday that "forgiveness is a selfish act" ... I think he's right.

Happy Holidays!

No comments: