Combining probability models and web mining models: A framework for proper name transliteration

Yilu Zhou, Feng Huang, Hsinchun Chen

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


The rapid growth of the Internet has created a tremendous number of multilingual resources. However, language boundaries prevent information sharing and discovery across countries. Proper names play an important role in search queries and knowledge discovery. When foreign names are involved, proper names are often translated phonetically which is referred to as transliteration. In this research we propose a generic transliteration framework, which incorporates an enhanced Hidden Markov Model (HMM) and a Web mining model. We improved the traditional statistical-based transliteration in three areas: (1) incorporated a simple phonetic transliteration knowledge base; (2) incorporated a bigram and a trigram HMM; (3) incorporated a Web mining model that uses word frequency of occurrence information from the Web. We evaluated the framework on an English-Arabic back transliteration. Experiments showed that when using HMM alone, a combination of the bigram and trigram HMM approach performed the best for English-Arabic transliteration. While the bigram model alone achieved fairly good performance, the trigram model alone did not. The Web mining approach boosted the performance by 79.05%. Overall, our framework achieved a precision of 0.72 when the eight best transliterations were considered. Our results show promise for using transliteration techniques to improve multilingual Web retrieval.

Original languageEnglish (US)
Pages (from-to)91-103
Number of pages13
JournalInformation Technology and Management
Issue number2
StatePublished - Jun 2008


  • Hidden Markov Model
  • Name transliteration
  • Web mining

ASJC Scopus subject areas

  • Information Systems
  • Communication
  • Business, Management and Accounting (miscellaneous)


Dive into the research topics of 'Combining probability models and web mining models: A framework for proper name transliteration'. Together they form a unique fingerprint.

Cite this