1、Types of corporaGeneral vs.specialized corporaWritten vs.spoken corporaSynchronic vs.diachronic corporaMonolingual vs.multilingual corporaComparable vs.parallel corporaNative vs.learner corporaSample vs.monitor corporaRaw vs.annotated corporaGeneral vs.specialized corporaGeneral corpora(通用语料库通用语料库)o
2、r reference corpora(参考语料库参考语料库):a wide coverage of different text categories or registers;represents language for general purposes.usu.:very large,millions of words.E.g.British National Corpus(BNC),Bank of English(BOE).specialized corpora(专用语料库专用语料库):texts from a particular variety of a language,e.g
3、from a particular dialect or from a particular subject area.Written vs.spoken corporaWritten corpora(笔语语料库)(笔语语料库):contain only written materials.(more)Spoken corpora(口语语料库)(口语语料库):contain transcribed texts of spoken language.(less)Synchronic vs.diachronic corporaSynchronic corpora(共时语料库)(共时语料库):ma
4、terials from a specific period of time.Diachronic corpora(历时语料库):(历时语料库):materials over a longer period of time.Monolingual vs.multilingual corporaMonolingual corpora(单语语料库):(单语语料库):texts in one language.Multilingual corpora(多语语料库):(多语语料库):texts in several different languages.Comparable vs.parallel
5、corporaComparable corpora(可比语料库):(可比语料库):texts from two or more languages which are similar in genre,topic,register etc.without,however,containing the same content.Parallel corpora(平行语料库)(平行语料库)(translation corpora)(翻译语料库):(翻译语料库):a corpus of original texts in one language and their translations int
6、o another(or several other languages)。探索“同一内容是如何用两种语言表达的”。Native vs.learner corporaNative speakers corpora(本族语语料库)(本族语语料库):texts from native speakers.Learner corpora(学习者语料库)(学习者语料库):texts from language learners.Sample vs.monitor corporaSample corpora(样本语料库样本语料库):as opposed to a monitor corpus,a samp
7、le corpus is of finite size and consists of text segments selected to provide a static picture of languageMonitor corpora(监控语料库监控语料库):monitor language change.It is regularly updated and open-ended.Raw vs.annotated corporaRaw corpora(生语料库)(生语料库):in raw states of plain text;without annotationsAnnotate
8、d corpora(标注语料库)(标注语料库):some external information is added to a corpus.e.g.information identifying the origin and nature of the text;tagging to show the word class of each word;parsing to show the sentence structure and the function of different elements in a sentence.one specific example,“gives”:third person singular present tense verbIn an annotated corpus,the form gives may be gives_VVZ,VVZ:it is a third person singular present tense(Z)form of a lexical verb(VV).Such annotation makes it quicker and easier to retrieve and analyze information about the language contained in the corpus.