Wednesday, February 11, 2009

viseme based language revisited.

(Grr. Lousy blog interface keeps cutting off text...)

Many years ago, I posted my first attempt at an aux-lang.
I did not know what else to call it. It IS based on lip-reading after all.
At the time, I decided to kill 2 birds with 1 stone and attempted to
express math notation in a written format with brevity.
I only completed 1/10th of the language.
Since then I have decided Visemese would be better as a universal
deaf interlang.
After all, ASL is not compatible with various other forms of sign

Yesterday I chipped away for a coupla hours at the cafe on revisions.
Amazing how the accumulated knowledge and insight of a few years
will make any earlier project feel positively adolescent!
D: I stuck to the core 13 visemes used by Disney animation.
These are more visually clear to see. There are actually more, 20
in fact.

                  // English examples
SP_VISEME_0 = 0, // silence
SP_VISEME_1, // ae, ax, ah
SP_VISEME_2, // aa
SP_VISEME_3, // ao
SP_VISEME_4, // ey, eh, uh
SP_VISEME_5, // er
SP_VISEME_6, // y, iy, ih, ix
SP_VISEME_7, // w, uw
SP_VISEME_8, // ow
SP_VISEME_9, // aw
SP_VISEME_10, // oy
SP_VISEME_11, // ay
SP_VISEME_12, // h
SP_VISEME_13, // r
SP_VISEME_14, // l
SP_VISEME_15, // s, z
SP_VISEME_16, // sh, ch, jh, zh
SP_VISEME_17, // th, dh
SP_VISEME_18, // f, v
SP_VISEME_19, // d, t, n
SP_VISEME_20, // k, g, ng
SP_VISEME_21, // p, b, m

D: I used the 13 Disney visemes.
Ignoring vowels for now, I used
hkg,ng - fv - l - pbm - r - szdtn - sh,ch - th - w.
As you can see, many common letters and sounds overlap.
Initially I picked the following.

the consonant order
is B,D,Ch,L,R,Th,V,W.

D: This time, I looked at how common various phonemes
were internationally.
I looked at lists by UPSID studies, Morneau and Sapir.
I looked at age-of-onset
for English speaking children. I looked at phoneme
frequency in spoken English.
Using a checklist system chart, and the English
frequency/age of onset for tie-
breaking, I now have the following revised list.
P/B and M - K/G and NG (*H), FV, TD and N, SH/CH - L, R,
W, TH/TH and a coupla
dual vowel sounds ending in Y or W.

OK, why did I list them like that?
My Decimese Chinese-based syllable rules.
If we allow syllables to end in nasal consonants, then
N, NG, and M are distinct.
I included voiced/voiceless pairs since Decimese again
differentiates between them
based on syllable and word position (start, mid, final).

A few observations:
1) TH is rare internationally. Having said that, being a
viseme it is easily taught
2) the loss of the S/Z pair is very unfortunate
3) the F/V pair is somewhat more rare
4) in the basic version, no Y or H.

D: there is some disagreement about what constitutes a

For the deaf community, which does not hear phonemes,
spoken language recognition relies entirely on lip reading.
samples base speech recognition on 18 speech postures.
Some of these
mouth postures show very subtle differences that a hearing
may not see.

So, the Disney 12 and the lip reading 18 are a good

place to start.

D: so the ceiling for visemes is 18-20.

D: thoughts on the SAPI list of 20 (top).

- the choice between a vowel and W and/or Y is


-visemese will not contain many vowels.

But notice something?

The phonemes almost perfectly overlap with my selections

for Decimese, based largely on Mandarin (2 nasal consonants)

or Cantonese (3) rules.

If we allow the option to swap out S/Z for Th/TH (voiced,

voiceless), then we are back in business!

Nary a one (caveat: that I am aware of) has considered

the needs of the deaf in designing an aux-lang.

2. If you’re at a restaurant, can you lip read the people at the next
table and understand their conversation?

Probably not. Lipreading from the side (i.e. lipreading half of the face),

is usually only effective in close proximity, and with full knowledge of

the subject matter. In live situations, even a little distance and facial

obscurity, coupled with ignorance of the subject matter, will most likely

render lipreading ineffective.

D: Natural languages are not designed with visemes in mind, though

studies suggest seeing the mouth is part of language comprehension.

The more phonemes a language has, the more sounds will be either

'silent' (invisible?) or identical to another one as a viseme.

A subtly modified Decimese phonology is likely about as diverse a

phonology as lip-reading allows.

A universal aux-lang for lipreading would be the "Esperanto of the

deaf world". Or could be.

Note: I don't mean "deaf" to be pejorative. If you ask a deaf person

how they wish to be addressed, in my experience they wish to be

called deaf. Not hard of hearing, or hearing impaired. I'm sure some

folks with some hearing left would prefer that.

I personally have a bit of trouble filtering out background noise, some

low-level auditory processing issue. My hearing is normal, however.

To summarize, we once again see a benefit of the basic Decimese

syllable construction rules.

1) CV(nasal ending)

2) taking care when trying to increase the phonemes available via

voiced/voiceless pairs

3) taking care with consonant clusters, keeping to the minimal

LRWY options.

4) the strength of the M, N, NG final position in a syllable/word.

D: Let us consider what PABAM would look like to a lip reader.

PB and M and all identical to lip-reading.

It becomes a generic CVCVC.

Allowing for time lag between words, it would parse as CVCVC.

As such, it can only be C1(1st of pair) V C1(2nd of pair) V (related

C1 nasal consonant).

There would be some chance with rapid speech and no time lag that

the preceding and following word would cause confusion. Still, this is a pretty robust system!


No comments: