UNESCO.ORG | Education | Natural Sciences | Social & Human Sciences | Culture | Communication & Information

WebWorld

Graphic element

Communication and Information Activities

Graphic element

Multilingualism in Cyberspace

Language constitutes the foundation of communication and is fundamental to cultural and historical heritage.
Projects

Report on Multilingualism on the Internet

As one of the inputs to the World Summit on the Information Society (Geneva, 2003), the UNESCO Institute for Statistics is preparing a report regarding the status of multilingualism on the Internet, under Initiative B@bel.
In this report, special attention will be devoted to:
  • the change in the balance of languages on the internet over time,

  • the potential dominance or regression of English on the internet,

  • the exploration of some methodological work for assessing linguistic diversity on-line.
While a number of information sources contain information bearing on this issue, at present, no organization, business or research effort currently underway appears to be addressing these questions systematically. The organizations that have generated the most data about internet multilingualism are marketing firms and translation service providers which have primarily produced summaries of user demographics and estimates of the number of web pages in different languages. Such statistics are only a partial indicator of the patterns of internet multilingualism, and they do not directly address the questions of dominance that are at the heart of the inquiry.

As a communications medium, the internet is rather complex: it is large, decentralized in structure, offers varied communications modes, and is rapidly changing in all its dimensions. The size and decentralized structure of the internet complicate the sampling procedures that one must use to comprehensively survey language use. In addition, the structure of linkage among users, sites, countries, etc. becomes a central issue, since these linkages determine what an individual can and will access. Technical differences among communications modes require different survey methods for each one investigated. While comprehensive data archiving efforts are underway for the World-Wide Web, no such efforts appear to have been undertaken for interactive chat modes of communication, where multilingualism has a markedly different character.

Thus the analytical report will adopt a two-pronged approach:
  • First, it is necessary to survey what is known about the state of the world’s languages on the internet through existing sources, especially academic literatures, marketing reports, news releases, and technical reports of the internet’s structure, organization and function. Since many of these sources do not directly address questions of multilingualism and linguistic dominance, a critical review must be conducted to identify the best possible current understanding of the distribution of the world’s languages on the internet. Particular attention needs to be paid to the inter-connections of sites and countries, and potential effects on users’ experience of and exposure to different languages.

  • Second, it is necessary to survey the technical and challenges for a truly comprehensive survey of multilingualism on the internet, using automatic means. Technologies are available for automatic language identification, but these have never been used on the scale that would be required for a comprehensive survey of languages on the internet. In addition, since the languages of the world number some several thousand, with hundreds of languages in written form we cannot know in advance of the survey just what languages we might find. This diversity of languages also poses other problems of a technical nature in their automatic identification, as there are varying degrees of linguistic difference among any two languages, and any one language might have several different electronic forms in which it is regularly used. Hence, it cannot be predicted how well the existing language identification technologies will perform, or what direction they would need to be developed for this purpose, without further study. In addition, the sources of internet communication data for this analysis need to be located or developed.
In summary, this project will attempt to investigate these two avenues and the resulting report will serve as an input both for the General Conference (October 2003) as well as for the World Summit on the Information Society (December 2003).

>>
Photos >> Go to Photobank
Contact First Name Diane
Contact Last Name Stukel
Contact E-mail d.stukel@unesco.org