Document Actions

Resources on the web

There are many interesting pages for corpus linguists on the Internet. The following selection of links includes websites with information on corpora and corpus analysis tools and some free, web-accessible corpora.

Helpful websites

Please note: You will not be able to access all corpora linked to in these databases. The extent to which corpus data can be accessed varies greatly: while you may be allowed to download the corpus in its entirety in some cases, other websites will offer access to the data via a web interface only (you also may need to register as a user). Many corpora are not available unless a license fee is paid (in such cases, please check whether the Linguistics section owns a license).

  • information on corpus analysis software
    • Laurence Anthony's AntConc: AntConc for download, video tutorials that teach you how to use the software, links to online help (including a discussion forum for questions), documentation/manual, books/papers related to AntConc
    • WordSmith Tools (Oxford University Press & Lexical Analysis Software; Mike Scott): the Support section includes 'get-started guides' in a number of languages, answers to FAQs, online help and a link to the online WordSmith discussion group
  • help with quantitative data/statistics
    • Log-likelihood calculator (University of Lancaster)
    • Information on using statistics in "Einführung in die Korpuslinguistik: Praktische Grundlagen und Werkzeuge" [German]
    • Sample Size Calculator  - Helps you to determine how big your sample needs to be in order to precisely represent the corresponding population.

 Some free, web-accessible corpora

  • AusNC - The Australian National corpus (also includes ICE-Australia) - Requires an account. Limited access to some of the corpora (e.g. Monash, ICE-AUS)
  • BNC - The British National Corpus. Requires an Account.
  • CMSW - Corpus of Modern Scottish Writing, and SCOTS - Scottish Corpus of Texts and Speech. Free Download and full access. No registration required.
  • EEBO - Early English Books Online (British and American Books pusblished between 1475 and 1700). Full Access granted from JLU intranet.
  • FALKO - Fehlerannotiertes Lernerkorpus (can be searched using the ANNiS³ web interface). Free full access without registration.
  • OBC - The Old Bailey Corpus (spoken English in the 18th and 19th centuries). Full access with JLU Login.
  • SBC - The Santa Barbara Corpus of Spoken American English. Free Download, full access, no registration.
  • VOICE - The Vienna-Oxford International Corpus of English (1 million words of spoken English used as a lingua franca). Free download. Web-interface requires registration.
  • various corpora at corpus.byu.edu (site maintained by Mark Davies), e.g. the Corpus of Contemporary American English (COCA), the Corpus of Historical American English (COHA), the Corpus of Global Web-Based English (GloWbE), the Corpus of American Soap Operas etc. Limited number of daily queries & other limitations for free users. JLU does not have a commerical licence for BYU web interfaces.