欧美精品第欧美第12页,欧美国产日韩一区二区三区,欧美国产日韩二区

英語語法網英語詞匯網高考英語網中考英語網

精心組稿精巧編排精彩紛呈全心打造英語第一品牌！

	加入收藏
	網站地圖
	購點說明

您現在的位置：首頁 > 英語語法 > 英語語料庫 >

美國國家語料庫（ANC）介紹

作者：admin 文章來源：本站原創點擊數：更新時間：2011-11-16

熱 ★★★

【字體：小大】

說明:引用此文請注明出處,并務請保留后面的有效鏈接地址,謝謝！

美國國家語料庫（ANC）介紹

（歡迎收藏本頁）

■ANC = The American National Corpus美國國家語料庫

■http://www.anc.org/

美國國家語料庫（American National Corpus，ANC）是目前規模最大的關于美國英語使用現狀的語料庫，它包括從1990年起的各種文字材料、口頭材料的文字記錄。ANC已出版過兩個版本，第一個版本包含1,000萬口語和書面語美式英語詞匯，第二個版本則包含了2,200萬口語和書面語美式英語詞匯。

■The First Release of the ANC

The First Release of the ANC is a beta version. It contains over 10,000,000 words of written and spoken American English, annotated for lemma and part of speech. It is available for research and education for a nominal licensing fee from the Linguistic Data Consortium. Commercial users can obtain the corpus and gain rights to use it in commercial products by joining the ANC Consortium.

The texts included in the first 10 million words of the ANC are those that were first received. Therefore the corpus is not balanced. There has been no hand-validation of the XML tagging or the part of speech annotation tags. Headers are minimal, although they contain fairly complete information concerning domain, subdomain, subject, audience, and medium. Check the list of known bugs and caveats for a description of the limitations we are currently aware of.

One of the aims of releasing this first 10 million words is to get feedback from the community about its structure and annotation, so that modifications can be made, if necessary, for the final release of the full 100 million words. We therefore invite comments and bug reports from the community of ANC users. Please contact anc@cs.vassar.edu .

■The Second Release of the ANC

The Second Release of the American National Corpus contains over 22,000,000 words of written and spoken American English, annotated for lemma, part of speech, noun chunks, and verb chunks. Part of speech tags using the Penn tagset are included for all data in the Second Release, and many documents are also PoS-tagged using the Biber tagset.

The ANC Second Release is available for research and education for a nominal licensing fee from the Linguistic Data Consortium. Commercial users can obtain the corpus and gain rights to use it in commercial products by joining the ANC Consortium. Please consult the LDC Catalog entry for the ANC Second Release.

The First and Second Releases of the ANC include materials which have been acquired to date, and therefore the current release of the ANC is not balanced. There has been no hand-validation of the XML tagging or the annotation. Headers are typically minimal, although most contain complete information concerning domain, subdomain, subject, audience, and medium. Check the list of known bugs and caveats for a description of the limitations we are currently aware of.

One of the aims of the Second Release is to get feedback from the community about its structure and annotation, so that modifications can be made, if necessary, for the final release of the full 100 million words. We therefore invite comments and bug reports from the community of ANC users. Please contact anc@cs.vassar.edu.

■ANC address:

http://www.anc.org/

more corpus addresses:

/Article/201111/2702.html

引用地址:

文章錄入：admin 責任編輯：admin

上一篇文章：英國國家語料庫（BNC）介紹

下一篇文章：主要漢語語料庫

【發表評論】【加入收藏】【告訴好友】【打印此文】【關閉窗口】

網友評論：（只顯示最新10條。評論內容只代表網友觀點，與本站立場無關�。�

Copyright © 2007－2020 www.hz123456.com 英語語法網 All Rights Reserved 網站備案：湘ICP備15011209號-2
●地址：湖南省長沙市湘春路75號金地大廈8樓 ●值班編輯:陳老師 ●管理員QQ:1732027965 ●語法答疑群:247157188