For instance, half-form glyphs used to construct conjunct consonants are typically separate code points in 8-bit fonts in Unicode they are represented by the full consonant followed by virama. Graphical fonts typically encode several features in ways highly incompatible with Unicode. Among the various legacy encodings for South Asian scripts, the most problematic are 8-bit fonts based on graphical principles (as opposed to the logical principles of Unicode). This paper describes the difficulties inherent in converting text in these encodings to Unicode. However, while Unicode has emerged as the favoured encoding system of corpus and computational linguists, most South Asian language data on the web uses one of a wide range of non-standard legacy encodings. Much electronic text in the languages of South Asia has been published on the Internet.
0 Comments
Leave a Reply. |