打印本文 打印本文  關(guān)閉窗口 關(guān)閉窗口  
      美國國家語料庫(ANC)介紹
      作者:admin  文章來源:本站原創(chuàng)  點(diǎn)擊數(shù)  更新時間:2011-11-16  文章錄入:admin  責(zé)任編輯:admin



      美國國家語料庫(ANC)介紹

       

      (歡迎收藏本頁)

       

      ANC = The American National Corpus美國國家語料庫

      http://www.anc.org/ 

       

      美國國家語料庫(American National CorpusANC)是目前規(guī)模最大的關(guān)于美國英語使用現(xiàn)狀的語料庫,它包括從1990年起的各種文字材料、口頭材料的文字記錄。ANC已出版過兩個版本,第一個版本包含1,000萬口語和書面語美式英語詞匯,第二個版本則包含了2,200萬口語和書面語美式英語詞匯。

      The First Release of the ANC

      The First Release of the ANC is a beta version. It contains over 10,000,000 words of written and spoken American English, annotated for lemma and part of speech. It is available for research and education for a nominal licensing fee from the Linguistic Data Consortium. Commercial users can obtain the corpus and gain rights to use it in commercial products by joining the ANC Consortium.

      The texts included in the first 10 million words of the ANC are those that were first received. Therefore the corpus is not balanced. There has been no hand-validation of the XML tagging or the part of speech annotation tags. Headers are minimal, although they contain fairly complete information concerning domain, subdomain, subject, audience, and medium. Check the list of known bugs and caveats for a description of the limitations we are currently aware of.

      One of the aims of releasing this first 10 million words is to get feedback from the community about its structure and annotation, so that modifications can be made, if necessary, for the final release of the full 100 million words. We therefore invite comments and bug reports from the community of ANC users. Please contact anc@cs.vassar.edu .

      The Second Release of the ANC

      The Second Release of the American National Corpus contains over 22,000,000 words of written and spoken American English, annotated for lemma, part of speech, noun chunks, and verb chunks. Part of speech tags using the Penn tagset are included for all data in the Second Release, and many documents are also PoS-tagged using the Biber tagset.

      The ANC Second Release is available for research and education for a nominal licensing fee from the Linguistic Data Consortium. Commercial users can obtain the corpus and gain rights to use it in commercial products by joining the ANC Consortium. Please consult the LDC Catalog entry for the ANC Second Release.

      The First and Second Releases of the ANC include materials which have been acquired to date, and therefore the current release of the ANC is not balanced. There has been no hand-validation of the XML tagging or the annotation. Headers are typically minimal, although most contain complete information concerning domain, subdomain, subject, audience, and medium. Check the list of known bugs and caveats for a description of the limitations we are currently aware of.

      One of the aims of the Second Release is to get feedback from the community about its structure and annotation, so that modifications can be made, if necessary, for the final release of the full 100 million words. We therefore invite comments and bug reports from the community of ANC users. Please contact anc@cs.vassar.edu.

      ANC address:

      http://www.anc.org/

      more corpus addresses:

      /Article/201111/2702.html 

       

      打印本文 打印本文  關(guān)閉窗口 關(guān)閉窗口  
      主站蜘蛛池模板: 国色天香社区在线观看免费播放| 末成年ASS浓精PICS| 国产亚洲男人的天堂在线观看| 2021乱理片宅它网| 天天干视频网站| 中文字幕在线无码一区二区三区| 最新中文字幕免费视频| 亚洲欧美一区二区成人片| 男女边吃奶边做爽动态爽| 吃奶呻吟打开双腿做受视频| 韩国朋友夫妇:交换4| 国产电影麻豆入口| 5252色欧美在线男人的天堂| 大乳丰满人妻中文字幕日本 | 精品国产一区二区三区久久| 国产亚洲欧美日韩精品一区二区| 欧美激情另类自拍| 国产精品成人一区二区三区| 99久久精品费精品国产| 客厅餐桌椅子上波多野结衣| 中文字幕色婷婷在线精品中| 日本猛少妇色xxxxx猛交| 亚洲AV成人片色在线观看高潮| 欧美成人秋霞久久AA片| 亚洲精品国产啊女成拍色拍| 男女午夜性爽快免费视频不卡| 含羞草传媒旧版每天免费3次| 蜜臀91精品国产免费观看 | 国产强被迫伦姧在线观看无码| 1024手机看片基地| 国产精品无码久久av不卡| 97色伦图片97综合影院| 天天躁日日躁狠狠躁综合| 一级毛片在线完整免费观看| 成年免费大片黄在线观看下载 | 两个男gay的做污污的过程| 放荡的欲乱合集| 中文字幕日韩一区二区不卡| 无码一区二区三区在线| 久久久久久曰本av免费免费| 日本成人在线网站|