Released under CC-BY-NC-SA

HULLS 2011

Linguistics and the Open Web

Gordon P. Hemsley
Queens College

What is the Open Web?


The Open Web is not so much a concrete entity as it is a set of philosophies. As the name implies, the Open Web serves to promote openness and freedom on an international—and supranational—scale.

The Open Web is based on decentralization, transparency, and two-way communication. Every citizen of the Open Web has an equal right to participate in the development of both the content of the Open Web and the standards and protocols that power it.

What is the Open Web?


The Open Web comes down to three fundamental concepts:

  • the ability to publish content on the Web using open standards
  • the ability to code and implement the open standards that such content depends on
  • the ability to access and use that content developed using those open standards

What is the Open Web?


  • The Mozilla project began in 1998, when Netscape released the source code to its web browser.
  • The non-profit Mozilla Foundation was founded in 2003 to promote openness, innovation, and opportunity on the Web.
  • The release of Firefox 1.0 in 2004 reinvigorated the browser market and gave users an alternative to the stagnated yet mammoth Internet Explorer.
  • Nowadays, Firefox claims about a quarter of the worldwide browser marketshare and competes with Chrome, Safari, Opera, and even a rejuvenated Internet Explorer to make the Web a better place.
  • Firefox 4 has been localized into over 80 languages and dialects by volunteers from around the world, covering over 95% of the world’s Internet-using population!

What is the Open Web?


  • Wikipedia has its roots in Nupedia, a now-defunct online enclopedia that was curated by expert volunteers using an extensive peer-review process.
  • Wikipedia was created in 2001 when the writing of articles for Nupedia was found to be too slow. It was felt that an online encyclopedia which could be edited by anyone would speed up the process of writing articles.
  • Originally created only in English, Wikipedia was quickly expanded to include articles written in other languages—by the end of 2001, Wikipedia was available in at least 18 languages.
  • Over the past 10 years, Wikipedia has collected over 18 million articles, written by over 1.3 million people in 268 languages (some of which are endangered, extinct, or constructed)!

What is the Open Web?

Open source movement

  • The open source movement is an integral aspect of the Open Web which advocates providing free access to the source materials used to create a product or service.
  • Open source software is an important part of the open source movement, as it allows individuals to participate in creating and improving the products they use, in an open and transparent fashion.
  • Much of the software you use every day is based on open source software.

How does the Open Web relate to linguistics?


  • LaTeX is a high-quality typesetting program designed for creating technical and scientific documents. There are many packages available for typesetting a number of linguistics-related text, including IPA, tree diagrams, and glossing.
  • R is a powerful statistics programming evironment that is often used in linguistics to conduct statistical analyses and generate graphs.
  • phpSyntaxTree is a useful tool to create standalone tree diagrams on the fly. (Active development seems to have stopped, so I recently forked the code in the hopes of improving it.)
  • Ubiquity was a prototype add-on from Mozilla Labs that used natural language commands to instruct Firefox to perform various tasks, often interfacing with external services to do so.

How does the Open Web relate to linguistics?

Online data sources

  • The World Atlas of Language Structures (WALS) documents various grammatical features in languages used around the world. WALS currently has data about hundreds of features from about a third of the world’s languages—enough for a good number of studies, like the word order study that recently generated a lot of press.
  • Twitter is often used as a quick and easy corpus of modern language usage. While Twitter search only saves about a week’s worth of tweets, Twitter has allowed the Library of Congress to archive the entire database, dating back to 2006.
  • There are hundreds of open access journals in a variety of languages that provide the full text of quality peer-reviewed research articles.

How does the Open Web relate to linguistics?


  • Language Log is the foremost blog for linguists, discussing original research (often conducted as a Breakfast Experiment™) and offering commentary on topics ranging from published linguistic research to the media’s perception of language.
  • Separated by a Common Language is a blog, written by an American linguist living and teaching in the UK, that documents the differences between the Englishes spoken in each country.
  • Literal-Minded is a blog written by a linguist who thinks he takes things too literally. He often posits analysis about language usage he hears in everyday life.
  • John Wells’s Phonetic Blog is a blog written by a phonetician about “everything to do with phonetics”.
  • Check out the blog rolls on any of these blogs (especially Language Log’s) to find an endless supply of interesting posts to read.

What can I do to participate in the Open Web?

Join the discussion

  • Join Twitter and start communicating with your fellow linguists.
  • Create a blog and start writing about your observations and analysis on language.
  • Create a website or join to host your articles and essays and track what other people are writing.
  • Make comments on other people’s blogs to help further their thought process.

What can I do to participate in the Open Web?


  • Join an open source project like Mozilla and help localize it into your language.
  • Write new articles—or edit existing ones—on the Wikipedia of your choice.

What can I do to participate in the Open Web?


  • Most of the linguists I know or associate with were met online, usually via Twitter or their blogs.
  • Many of them I have never met in real life (though some are here today).

Thank You

Any questions?


Slideshow code written by Google for HTML5 Rocks and released under the Apache License (2.0).

Mozilla logo used in accordance with Mozilla Media Library guidelines.

Wikipedia logo used in accordance with Wikimedia Foundation Trademark Policy.

All other content released by Gordon P. Hemsley under the Creative Commons BY-NC-SA License (3.0).
Released under CC-BY-NC-SA