The largest of its kind in the world, the Cambridge English Corpus (CEC) is a multi-billion word collection of texts taken from a huge range of sources. It comprises several smaller corpora which allow us to look at different areas of language research more specifically, so that we can gather the most useful and valuable insights.
Our language research is used to inform our course materials. In particular, we use it to:

  • ensure that the language taught in our courses is natural, authentic and up-to-date;
  • select the most useful, common words and phrases for a topic or level;
  • focus on certain groups of learners and find out what they find easy and more difficult;
  • analyse spoken language so that we can teach effective speaking and listening strategies.

Cambridge Reference Corpus

The Cambridge Reference Corpus is made up of samples of language taken from expert/proficient speakers of English from different varieties of English, such as American English and British English, totaling over 1.6 billion words. These samples have been taken from a wide range of sources, including newspapers, online language, books, magazines, radios, business meetings, everyday conversations and more.
This corpus allows us to find out what English is really like, how it’s used in the real world, and allows us to provide the most natural, authentic and up-to-date language in our course materials. This means that students are exposed to language that they will encounter when they use English to communicate, in whichever situation that might be.

Cambridge Learner Corpus

The Cambridge Learner Corpus (CLC) currently contains more than 50 million words taken from Cambridge English exam scripts submitted by over 220,000 students from 173 countries – and these numbers keep growing each year. The corpus is jointly built between Cambridge University Press and Cambridge Assessment English.
This corpus allows us to conduct internationally relevant and country-specific research into how learners use English differently to expert speakers, as well as allowing us to analyse the different types of mistakes that learners make and what they get right. Using this information, we can make sure we focus more on the areas of language that learners tend to find more difficult so they can avoid making the same mistakes.

Cambridge Corpus of Academic English

The Cambridge Corpus of Academic English (CamCAE) has been jointly built by Cambridge University Press and Cambridge Assessment English. We currently have around 400 million words allowing us to research the language features that make up Academic English and how they differ from everyday English. These insights are used to inform our Academic English course materials so that students are better prepared for writing and speaking in academic contexts.

Cambridge University Press