There is no inherent reason why the vocal output of an English speaker should be any different from a person speaking Chinese. However, the SII (and AI) are based on which sounds are linguistically distinctive or important in that language. The frequency band importance of a Chinese speaker has greater value in the lower frequencies than for English because Chinese relies more on pitch changes in the lower-frequency vowels. This article explores possible programming considerations related to different languages.

Marshall Chasin, AuD, MSc, Reg. CASLPO. Doctor of Audiology, Musicians’ Clinics of Canada, Toronto.

The intelligibility of any speech sound is a highly complex series of processes that includes a potential synthesis of auditory information, visual information, context, room acoustics, and central cortical perception. Under­stand­ably, as the listening environment becomes more adverse, there is increased reliance on visual and contextual information.

There have been several ingenious approaches used in the field to quantify the degree of intelligibility, such as measures of the percentage of audible speech cues. These include the Articu­lation Index (AI)1 and, more recently, the Speech Intelligibility Index (SII). Many implementations of the SII are based on the calculations of ANSI,2 and Studebaker and Sherbecoe,3 where various frequency bands are assigned a speech importance value (Figure 1). The figure shows that the greatest values for intelligibility are found in the mid- and high-frequency ranges. This is well known clinically, and client complaints of reduced speech intelligibility are addressed by increasing the gain in the higher frequency regions. The SII standard includes band importance functions for nonsense syllables; words from the CID-W22, NU6, and Diagnostic Rhyme Test; short passages of easy reading material; and the SPIN monosyllables (personal communication, William A. Cole, 2008).

The Limits of the SII

The SII has been used widely and is frequently used as a tool, both in research and in the clinical assessment of hearing aids (eg, Audioscan Verifit). Simply stated, the SII can be thought of as the percentage of speech cues that are audible—the greater the audibility, the greater the chance of understanding the speech signal. Constraints of a damaged cochlea, of course, will limit the amount of amplification such that 100% audibility can rarely be achieved.

The SII tells most of the story but is far from the entire picture. In spoken sentences, which have a contextual basis (unlike individual syllables or words), more intelligibility importance is found in the lower frequency vowels than for the higher frequency consonant sounds.4 The SII as implemented in some clinical real ear measurement devices is typically based on English and not other languages. Finally, the SII also provides no direct information on important syntactic items in a sentence that may have inherently low speaking intensity. That is, whereas the SII can provide information on the various frequency importance bands (and thereby which phonemes or speech sounds are important), it does not provide information on word level and sentence level cues that may be very important as well.

Word and Sentence Level Differences in Various Languages

Other than phoneme (sound) level differences that can be manifested as different SIIs for different languages of the world, there are both word level and sentence level differences that may affect the specification of gain and output in a hearing aid. An example of a word level issue is Japanese, where a typical word may consist of a consonant-vowel-consonant (or CVC) structure. In order for the quieter consonant following the more intense vowel to be audible, there should be a sufficient rapid compression release time, and a clinical suggestion is to implement a quicker release time for speakers of Japanese versus a fitting for someone with a similar audiometric loss who is a speaker of English. For example, if a person is bilingual, one program can be set for English and another for Japanese (with a shorter release time on the compression system).

FIGURE 1. Based on nonsense syllables, words, and short passages, the band importance function of the SII is shown. Note that about 50% of the speech cues derive from the 1000 Hz and 2000 Hz bands (0.2135 + 0.2827). Adapted from Table B.3 (Octave band importance functions for various speech tests), ANSI S3.5 (1997).3

We can return to Japanese for an example of a sentence level difference that would not be apparent on the SII. Japanese, like most languages that have a subject-object-verb (SOV) word order, has “post-positions” rather than prepositions that are found in SVO languages, such as English. Postpositions (eg, in, on, under, behind) in Japanese may be sentence final and, as such, would be of low intensity.

There is nothing specific to Japanese about the last word in a sentence being of low intensity—this is a normal characteristic of all speech where we simply are running out of air at the end of a sentence, so the last word or two is of lower intensity. Linguistically, postpositions can be very important, and if they are not as audible because of their position in a sentence, this could have ramifications for intelligibility, as well. A clinical solution to this potential problem would be setting the WDRC circuitry to generate more gain for quieter sounds than for an “English program.”

Byrne and his colleagues5 examined the speech spectra of 12 different languages for both men and women and found that a universal long-term speech spectrum “is suggested as being applicable, across languages, for many purposes including use in hearing aid prescription procedures and in the Articulation Index” (page 2108). This is an understandable result since speakers of any language are human—all with a 17-18 cm long vocal tract, lips, teeth, nose, and tongue.

There is no inherent reason why the vocal output of an English speaker should be any different from a person speaking Chinese. However, the SII (and AI) are based on which sounds are linguistically distinctive or important to the speaker of that language. The frequency band importance of a Chinese speaker would have greater value in the lower frequencies than for English because Chinese is a tonal language—a language in which linguistically meaningful differences can be heard as pitch changes on the lower frequency vowels.

This, in fact, is the case as shown by Wong et al.6 It is important to note that the long-term speech spectrum of Chinese is the same as that of English, but it is the various band importance functions (SII) of Chinese that would suggest that more hearing aid amplification would be required in the lower frequencies than for a speaker of English.

Phonemic Differences in Languages

“Phonemic differences” refer to what is linguistically distinctive or important from one language to another. In the example above, a tonal language like Chinese means that it is more important for a listener of Chinese to be able to hear the tonal differences on the lower frequency vowels than an English listener. A change in tone can result in a different meaning in Chinese, but not in English—tone is linguistically distinctive in Chinese but not in English.

TABLE 1. This is a summary of the linguistically important features and how these may affect a change from an “English program.”

In general, one can say that if nasals are linguistically more important (distinctive) in a language (eg, Portuguese), then more gain should be specified in the 125-2000 Hz region where nasals have their greatest energy spectrographically. The same frequency region is important for tonal languages and timed languages since the tone or the time-lengthening (or morae) is manifested on the lower frequency vowels and nasals. In languages where palatalization is important (eg, Russian), the important frequency region is from 3000 to 3500 Hz, and in languages where retroflexion is important (eg, Mandarin Chinese), the important frequency region is from 2700 to 3000 Hz.

In SOV languages (eg, Japanese and Hindi), more gain at low intensity input levels for WDRC should be specified to ensure audibility of sentence final postpositions. In Arabic (Semitic) languages, there are many high-frequency consonants (velars, uvulars, and pharyngeal fricatives) that indicate a need for more high-frequency gain than would be specified for an “English program.” In contrast, with CVC restricted languages (eg, Japanese and Viet­namese), more rapid release times should be specified for the compression system than for English. These suggested changes from an “English program” for a non-English listener program are summarized in Table 1.

A grammar is a statement of the rules of a language that includes information about the word order and inflections (syntax), the structure of words (morphology), the sound patterns (phonology), the actual listing of the sounds of that language (phonetics), and a dictionary (lexicon). The grammars and salient linguistically distinctive features of 10 languages are examined and shown in Table 2.

TABLE 2. “X” indicates a linguistically important feature for that language and changes can be implemented as a deviation from an “English program.” PREP indicates that, while Somali is an SVO language, it does not have postpositions. SYLL indicates that Spanish is a syllabic language and may benefit from shorter release times than for an “English program.” French has no “X” marks, implying that it is not significantly different from English for setting hearing aids.

The suggested changes to a hearing aid fitting for non-English languages are merely suggestions regarding the “direction” of the change. The magnitude of the spectral changes will probably be on the order of 5-8 dB, but until more information is available, this should be thought of only as a first approximation. The statements made in this article are empirical, and as more information becomes available, they may change. The suggestions are based on linguistic arguments only, and this entire area of language differences may undergo significant “fine tuning” as more empirical and linguistic research is performed.

The SII has great significance for the hearing aid field and many of the linguistic elements mentioned would be manifested as differences in the SII of other languages. However, not all of the linguistically important properties of a language would be observed in a language-specific SII.

Some Unanswered Questions

There are a number of unanswered questions in this relatively new area of study. Many of these questions are empirical and some can be the result of careful calculations. Clearly more research needs to be done. Here are several questions that are central to this issue:

  • Just because a language has more linguistically distinctive cues in a certain frequency region than English, does it follow that listeners of that language will benefit from more gain in this frequency region?
  • What are the calculated SIIs for different languages?
  • What is the magnitude of any change from an “equivalent” English program? For example, if the language is tonal in nature, does this imply a 5 dB low frequency increase or a different amount?
  • Would an across-the-board increase in the 2700-3500 Hz frequency region be beneficial for all languages? This would optimize the audibility of sounds that are palatalized and retroflexed, but would this be detrimental for English?
  • How much quicker should the release time be for WDRC for those languages that have a rigid CVC structure as compared with English?
  • Even though the important spectral region of nasals, timed-languages, and tonal-languages is the same (125-2000 Hz), do all three of these types of languages require the same amount of additional gain in this region?
  • What is the role of nonauditory cues (eg, visual, contextual) in adverse listening situations?
  • How might an SII be altered for a language, given other information, such as context and visual information?

Acknowledgement

Parts of this paper were presented at the Bernafon-Canada annual meeting in Italy.

References

  1. French N, Steinberg J. Factors governing the intelligibility of speech sounds. J Acoust Soc Am. 1947;19:90-119.
  2. American National Standards Institute (ANSI). American National Standards for Calculation of the Articulation Index. ANSI S3.5, 1997. New York: ANSI; 1997.
  3. Studebaker GA, Sherbecoe RL. Frequency-importance and transfer functions for recorded CIC W-22 word lists. J Speech Hear Res. 1991;34:427–438.
  4. Kewley-Port D, Burkle TZ, Lee JH. Contribution of consonant versus vowel information to sentence intelligibility for young normal-hearing and elderly hearing-impaired listeners. J Acoust Soc Am. 2007;122(4):2365-2375.
  5. Byrne D, Dillon H, Tran K, et al. An international comparison of long-term average speech spectra. J Acous Soc Am. 1994;96(4):2108-2120.
  6. Wong LA, Ho A, Chua E, Soli SD. Development of the Cantonese speech intelligibility index. J Acoust Soc Am. 2007;121(4):2350–2361.

Correspondence can be addressed to or Marshall Chasin at .

Citation for this article:

Chasin M. How hearing aids may be set for different languages. Hearing Review. 2008;15(11):16-20.