{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "BZPSH4VkK7J2" }, "source": [ "欢迎来到HanLP在线交互环境,这是一个Jupyter记事本,可以输入任意Python代码并在线执行。请点击左上角【Run】来运行这篇NLP教程。\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "XxPAiNwSK7J4" }, "source": [ "## 安装\n", "量体裁衣,HanLP提供**RESTful**(云端)和**native**(本地)两种API,分别面向轻量级和海量级两种场景。无论何种API何种语言,HanLP接口在语义上保持一致,你可以**任选一种**API来运行本教程。\n", "\n", "### 轻量级RESTful API\n", "\n", "仅数KB,适合敏捷开发、移动APP等场景。简单易用,无需GPU配环境,**强烈推荐**,秒速安装:\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "lgMa4kbfK7J5", "outputId": "5bb662d8-1665-4bcc-c517-70d1c4bc4837" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: hanlp_restful in /usr/local/lib/python3.7/dist-packages (0.0.7)\n", "Requirement already satisfied: hanlp-common in /usr/local/lib/python3.7/dist-packages (from hanlp_restful) (0.0.9)\n", "Requirement already satisfied: phrasetree in /usr/local/lib/python3.7/dist-packages (from hanlp-common->hanlp_restful) (0.0.8)\n" ] } ], "source": [ "!pip install hanlp_restful" ] }, { "cell_type": "markdown", "metadata": { "id": "N4G6GbNmK7J6" }, "source": [ "创建客户端,填入服务器地址:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "3XM9-3-oK7J6" }, "outputs": [], "source": [ "from hanlp_restful import HanLPClient\n", "HanLP = HanLPClient('https://www.hanlp.com/api', auth=None, language='zh') # auth不填则匿名,zh中文,mul多语种" ] }, { "cell_type": "markdown", "metadata": { "id": "pbeFH9jmK7J7" }, "source": [ "调用`parse`接口,传入一篇文章,得到HanLP精准的分析结果。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "mNJPvZ_3K7J7", "outputId": "4048d0d6-2dad-4582-e327-f99338f8f72b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"tok/fine\": [\n", " [\"2021年\", \"HanLPv2.1\", \"为\", \"生产\", \"环境\", \"带来\", \"次\", \"世代\", \"最\", \"先进\", \"的\", \"多\", \"语种\", \"NLP\", \"技术\", \"。\"],\n", " [\"阿婆主\", \"来到\", \"北京\", \"立方庭\", \"参观\", \"自然\", \"语义\", \"科技\", \"公司\", \"。\"]\n", " ],\n", " \"tok/coarse\": [\n", " [\"2021年\", \"HanLPv2.1\", \"为\", \"生产环境\", \"带来\", \"次世代\", \"最\", \"先进\", \"的\", \"多语种\", \"NLP\", \"技术\", \"。\"],\n", " [\"阿婆主\", \"来到\", \"北京立方庭\", \"参观\", \"自然语义科技公司\", \"。\"]\n", " ],\n", " \"pos/ctb\": [\n", " [\"NT\", \"NR\", \"P\", \"NN\", \"NN\", \"VV\", \"JJ\", \"NN\", \"AD\", \"JJ\", \"DEG\", \"CD\", \"NN\", \"NR\", \"NN\", \"PU\"],\n", " [\"NN\", \"VV\", \"NR\", \"NR\", \"VV\", \"NN\", \"NN\", \"NN\", \"NN\", \"PU\"]\n", " ],\n", " \"pos/pku\": [\n", " [\"t\", \"nx\", \"p\", \"vn\", \"n\", \"v\", \"b\", \"n\", \"d\", \"a\", \"u\", \"a\", \"n\", \"nx\", \"n\", \"w\"],\n", " [\"n\", \"v\", \"ns\", \"ns\", \"v\", \"n\", \"n\", \"n\", \"n\", \"w\"]\n", " ],\n", " \"pos/863\": [\n", " [\"nt\", \"w\", \"p\", \"v\", \"n\", \"v\", \"a\", \"nt\", \"d\", \"a\", \"u\", \"a\", \"n\", \"ws\", \"n\", \"w\"],\n", " [\"n\", \"v\", \"ns\", \"n\", \"v\", \"n\", \"n\", \"n\", \"n\", \"w\"]\n", " ],\n", " \"ner/msra\": [\n", " [[\"2021年\", \"DATE\", 0, 1], [\"HanLPv2.1\", \"ORGANIZATION\", 1, 2]],\n", " [[\"北京立方庭\", \"LOCATION\", 2, 4], [\"自然语义科技公司\", \"ORGANIZATION\", 5, 9]]\n", " ],\n", " \"ner/pku\": [\n", " [],\n", " [[\"北京立方庭\", \"ns\", 2, 4], [\"自然语义科技公司\", \"nt\", 5, 9]]\n", " ],\n", " \"ner/ontonotes\": [\n", " [[\"2021年\", \"DATE\", 0, 1], [\"HanLPv2.1\", \"ORG\", 1, 2]],\n", " [[\"北京立方庭\", \"FAC\", 2, 4], [\"自然语义科技公司\", \"ORG\", 5, 9]]\n", " ],\n", " \"srl\": [\n", " [[[\"2021年\", \"ARGM-TMP\", 0, 1], [\"HanLPv2.1\", \"ARG0\", 1, 2], [\"为生产环境\", \"ARG2\", 2, 5], [\"带来\", \"PRED\", 5, 6], [\"次世代最先进的多语种NLP技术\", \"ARG1\", 6, 15]], [[\"最\", \"ARGM-ADV\", 8, 9], [\"先进\", \"PRED\", 9, 10], [\"技术\", \"ARG0\", 14, 15]]],\n", " [[[\"阿婆主\", \"ARG0\", 0, 1], [\"来到\", \"PRED\", 1, 2], [\"北京立方庭\", \"ARG1\", 2, 4]], [[\"阿婆主\", \"ARG0\", 0, 1], [\"参观\", \"PRED\", 4, 5], [\"自然语义科技公司\", \"ARG1\", 5, 9]]]\n", " ],\n", " \"dep\": [\n", " [[6, \"tmod\"], [6, \"nsubj\"], [6, \"prep\"], [5, \"nn\"], [3, \"pobj\"], [0, \"root\"], [8, \"amod\"], [15, \"nn\"], [10, \"advmod\"], [15, \"rcmod\"], [10, \"assm\"], [13, \"nummod\"], [15, \"nn\"], [15, \"nn\"], [6, \"dobj\"], [6, \"punct\"]],\n", " [[2, \"nsubj\"], [0, \"root\"], [4, \"nn\"], [2, \"dobj\"], [2, \"conj\"], [9, \"nn\"], [9, \"nn\"], [9, \"nn\"], [5, \"dobj\"], [2, \"punct\"]]\n", " ],\n", " \"sdp\": [\n", " [[[6, \"Time\"]], [[6, \"Exp\"]], [[5, \"mPrep\"]], [[5, \"Desc\"]], [[6, \"Datv\"]], [[13, \"dDesc\"]], [[0, \"Root\"], [8, \"Desc\"], [13, \"Desc\"]], [[15, \"Time\"]], [[10, \"mDegr\"]], [[15, \"Desc\"]], [[10, \"mAux\"]], [[8, \"Quan\"], [13, \"Quan\"]], [[15, \"Desc\"]], [[15, \"Nmod\"]], [[6, \"Pat\"]], [[6, \"mPunc\"]]],\n", " [[[2, \"Agt\"], [5, \"Agt\"]], [[0, \"Root\"]], [[4, \"Loc\"]], [[2, \"Lfin\"]], [[2, \"ePurp\"]], [[8, \"Nmod\"]], [[9, \"Nmod\"]], [[9, \"Nmod\"]], [[5, \"Datv\"]], [[5, \"mPunc\"]]]\n", " ],\n", " \"con\": [\n", " [\"TOP\", [[\"IP\", [[\"NP\", [[\"NT\", [\"2021年\"]]]], [\"NP\", [[\"NR\", [\"HanLPv2.1\"]]]], [\"VP\", [[\"PP\", [[\"P\", [\"为\"]], [\"NP\", [[\"NN\", [\"生产\"]], [\"NN\", [\"环境\"]]]]]], [\"VP\", [[\"VV\", [\"带来\"]], [\"NP\", [[\"ADJP\", [[\"NP\", [[\"ADJP\", [[\"JJ\", [\"次\"]]]], [\"NP\", [[\"NN\", [\"世代\"]]]]]], [\"ADVP\", [[\"AD\", [\"最\"]]]], [\"VP\", [[\"JJ\", [\"先进\"]]]]]], [\"DEG\", [\"的\"]], [\"NP\", [[\"QP\", [[\"CD\", [\"多\"]]]], [\"NP\", [[\"NN\", [\"语种\"]]]]]], [\"NP\", [[\"NR\", [\"NLP\"]], [\"NN\", [\"技术\"]]]]]]]]]], [\"PU\", [\"。\"]]]]]],\n", " [\"TOP\", [[\"IP\", [[\"NP\", [[\"NN\", [\"阿婆主\"]]]], [\"VP\", [[\"VP\", [[\"VV\", [\"来到\"]], [\"NP\", [[\"NR\", [\"北京\"]], [\"NR\", [\"立方庭\"]]]]]], [\"VP\", [[\"VV\", [\"参观\"]], [\"NP\", [[\"NN\", [\"自然\"]], [\"NN\", [\"语义\"]], [\"NN\", [\"科技\"]], [\"NN\", [\"公司\"]]]]]]]], [\"PU\", [\"。\"]]]]]]\n", " ]\n", "}\n" ] } ], "source": [ "doc = HanLP.parse(\"2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。阿婆主来到北京立方庭参观自然语义科技公司。\")\n", "print(doc)" ] }, { "cell_type": "markdown", "metadata": { "id": "w4E8Kn_nK7J8" }, "source": [ "#### 可视化\n", "输出结果是一个可以`json`化的`dict`,键为[NLP任务名](https://hanlp.hankcs.com/docs/data_format.html#naming-convention),值为分析结果。关于标注集含义,请参考[《语言学标注规范》](https://hanlp.hankcs.com/docs/annotations/index.html)及[《格式规范》](https://hanlp.hankcs.com/docs/data_format.html)。我们购买、标注或采用了世界上量级最大、种类最多的语料库用于联合多语种多任务学习,所以HanLP的标注集也是覆盖面最广的。通过`doc.pretty_print`,可以在等宽字体环境中得到可视化,你需要取消换行才能对齐可视化结果。我们已经发布HTML环境的可视化,在Jupyter Notebook中自动对齐中文。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 575 }, "id": "GZ79la4LK7J8", "outputId": "b9bd5dc0-52f9-4b42-93fd-7c4e49214ace" }, "outputs": [ { "data": { "text/html": [ "
Dep Tree     
──────────── 
 ┌─────────► 
 │┌────────► 
 ││┌─►┌───── 
 │││  │  ┌─► 
 │││  └─►└── 
┌┼┴┴──────── 
││       ┌─► 
││  ┌───►└── 
││  │    ┌─► 
││  │┌──►├── 
││  ││   └─► 
││  ││   ┌─► 
││  ││┌─►└── 
││  │││  ┌─► 
│└─►└┴┴──┴── 
└──────────► 
Token     
───────── 
2021年     
HanLPv2.1 
为         
生产        
环境        
带来        
次         
世代        
最         
先进        
的         
多         
语种        
NLP       
技术        
。         
Relati 
────── 
tmod   
nsubj  
prep   
nn     
pobj   
root   
amod   
nn     
advmod 
rcmod  
assm   
nummod 
nn     
nn     
dobj   
punct  
PoS 
─── 
NT  
NR  
P   
NN  
NN  
VV  
JJ  
NN  
AD  
JJ  
DEG 
CD  
NN  
NR  
NN  
PU  
Tok       
───────── 
2021年     
HanLPv2.1 
为         
生产        
环境        
带来        
次         
世代        
最         
先进        
的         
多         
语种        
NLP       
技术        
。         
NER Type         
──────────────── 
───►DATE         
───►ORGANIZATION 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
Tok       
───────── 
2021年     
HanLPv2.1 
为         
生产        
环境        
带来        
次         
世代        
最         
先进        
的         
多         
语种        
NLP       
技术        
。         
SRL PA1      
──────────── 
───►ARGM-TMP 
───►ARG0     
◄─┐          
  ├►ARG2     
◄─┘          
╟──►PRED     
◄─┐          
  │          
  │          
  │          
  ├►ARG1     
  │          
  │          
  │          
◄─┘          
             
Tok       
───────── 
2021年     
HanLPv2.1 
为         
生产        
环境        
带来        
次         
世代        
最         
先进        
的         
多         
语种        
NLP       
技术        
。         
SRL PA2      
──────────── 
             
             
             
             
             
             
             
             
───►ARGM-ADV 
╟──►PRED     
             
             
             
             
───►ARG0     
             
Tok       
───────── 
2021年     
HanLPv2.1 
为         
生产        
环境        
带来        
次         
世代        
最         
先进        
的         
多         
语种        
NLP       
技术        
。         
PoS    3       4       5       6       7       8       9 
─────────────────────────────────────────────────────────
NT ───────────────────────────────────────────►NP ───┐   
NR ───────────────────────────────────────────►NP────┤   
P ───────────┐                                       │   
NN ──┐       ├────────────────────────►PP ───┐       │   
NN ──┴►NP ───┘                               │       │   
VV ──────────────────────────────────┐       │       │   
JJ ───►ADJP──┐                       │       ├►VP────┤   
NN ───►NP ───┴►NP ───┐               │       │       │   
AD ───────────►ADVP──┼►ADJP──┐       ├►VP ───┘       ├►IP
JJ ───────────►VP ───┘       │       │               │   
DEG──────────────────────────┤       │               │   
CD ───►QP ───┐               ├►NP ───┘               │   
NN ───►NP ───┴────────►NP────┤                       │   
NR ──┐                       │                       │   
NN ──┴────────────────►NP ───┘                       │   
PU ──────────────────────────────────────────────────┘   

Dep Tree     
──────────── 
         ┌─► 
┌┬────┬──┴── 
││    │  ┌─► 
││    └─►└── 
│└─►┌─────── 
│   │  ┌───► 
│   │  │┌──► 
│   │  ││┌─► 
│   └─►└┴┴── 
└──────────► 
Tok 
─── 
阿婆主 
来到  
北京  
立方庭 
参观  
自然  
语义  
科技  
公司  
。   
Relat 
───── 
nsubj 
root  
nn    
dobj  
conj  
nn    
nn    
nn    
dobj  
punct 
Po 
── 
NN 
VV 
NR 
NR 
VV 
NN 
NN 
NN 
NN 
PU 
Tok 
─── 
阿婆主 
来到  
北京  
立方庭 
参观  
自然  
语义  
科技  
公司  
。   
NER Type         
──────────────── 
                 
                 
◄─┐              
◄─┴►LOCATION     
                 
◄─┐              
  │              
  ├►ORGANIZATION 
◄─┘              
                 
Tok 
─── 
阿婆主 
来到  
北京  
立方庭 
参观  
自然  
语义  
科技  
公司  
。   
SRL PA1  
──────── 
───►ARG0 
╟──►PRED 
◄─┐      
◄─┴►ARG1 
         
         
         
         
         
         
Tok 
─── 
阿婆主 
来到  
北京  
立方庭 
参观  
自然  
语义  
科技  
公司  
。   
SRL PA2  
──────── 
───►ARG0 
         
         
         
╟──►PRED 
◄─┐      
  │      
  ├►ARG1 
◄─┘      
         
Tok 
─── 
阿婆主 
来到  
北京  
立方庭 
参观  
自然  
语义  
科技  
公司  
。   
Po    3       4       5       6 
────────────────────────────────
NN───────────────────►NP ───┐   
VV──────────┐               │   
NR──┐       ├►VP ───┐       │   
NR──┴►NP ───┘       │       │   
VV──────────┐       ├►VP────┤   
NN──┐       │       │       ├►IP
NN  │       ├►VP ───┘       │   
NN  ├►NP ───┘               │   
NN──┘                       │   
PU──────────────────────────┘   
" ], "text/plain": [ "" ] }, "metadata": { "tags": [] }, "output_type": "display_data" } ], "source": [ "doc.pretty_print()" ] }, { "cell_type": "markdown", "metadata": { "id": "WIKyCLQJK7J9" }, "source": [ "#### 申请秘钥\n", "由于服务器算力有限,匿名用户每分钟限2次调用。如果你需要更多调用次数,[建议申请免费公益API秘钥auth](https://bbs.hanlp.com/t/hanlp2-1-restful-api/53)。" ] }, { "cell_type": "markdown", "metadata": { "id": "PcZAZopQK7J9" }, "source": [ "### 海量级native API\n", "\n", "依赖PyTorch、TensorFlow等深度学习技术,适合**专业**NLP工程师、研究者以及本地海量数据场景。要求Python 3.6以上,支持Windows,推荐*nix。可以在CPU上运行,推荐GPU/TPU。\n", "\n", "无论是Windows、Linux还是macOS,HanLP的安装只需一句话搞定。" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "bjRdHxl1K7J-", "outputId": "659d7920-c857-4eb8-f45f-dba84366688a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: hanlp in /usr/local/lib/python3.7/dist-packages (2.1.0a54)\n", "Requirement already satisfied: sentencepiece>=0.1.91torch>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from hanlp) (0.1.96)\n", "Requirement already satisfied: toposort==1.5 in /usr/local/lib/python3.7/dist-packages (from hanlp) (1.5)\n", "Requirement already satisfied: alnlp in /usr/local/lib/python3.7/dist-packages (from hanlp) (1.0.0rc27)\n", "Requirement already satisfied: hanlp-common>=0.0.9 in /usr/local/lib/python3.7/dist-packages (from hanlp) (0.0.9)\n", "Requirement already satisfied: hanlp-downloader in /usr/local/lib/python3.7/dist-packages (from hanlp) (0.0.23)\n", "Requirement already satisfied: hanlp-trie>=0.0.2 in /usr/local/lib/python3.7/dist-packages (from hanlp) (0.0.2)\n", "Requirement already satisfied: transformers>=4.1.1 in /usr/local/lib/python3.7/dist-packages (from hanlp) (4.9.1)\n", "Requirement already satisfied: termcolor in /usr/local/lib/python3.7/dist-packages (from hanlp) (1.1.0)\n", "Requirement already satisfied: pynvml in /usr/local/lib/python3.7/dist-packages (from hanlp) (11.0.0)\n", "Requirement already satisfied: phrasetree in /usr/local/lib/python3.7/dist-packages (from hanlp-common>=0.0.9->hanlp) (0.0.8)\n", "Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers>=4.1.1->hanlp) (3.0.12)\n", "Requirement already satisfied: sacremoses in /usr/local/lib/python3.7/dist-packages (from transformers>=4.1.1->hanlp) (0.0.45)\n", "Requirement already satisfied: tokenizers<0.11,>=0.10.1 in /usr/local/lib/python3.7/dist-packages (from transformers>=4.1.1->hanlp) (0.10.3)\n", "Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from transformers>=4.1.1->hanlp) (21.0)\n", "Requirement already satisfied: huggingface-hub==0.0.12 in /usr/local/lib/python3.7/dist-packages (from transformers>=4.1.1->hanlp) (0.0.12)\n", "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.7/dist-packages (from transformers>=4.1.1->hanlp) (5.4.1)\n", "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers>=4.1.1->hanlp) (2019.12.20)\n", "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers>=4.1.1->hanlp) (4.41.1)\n", "Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers>=4.1.1->hanlp) (2.23.0)\n", "Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from transformers>=4.1.1->hanlp) (4.6.1)\n", "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers>=4.1.1->hanlp) (1.19.5)\n", "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from huggingface-hub==0.0.12->transformers>=4.1.1->hanlp) (3.7.4.3)\n", "Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->transformers>=4.1.1->hanlp) (2.4.7)\n", "Requirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (from alnlp->hanlp) (1.9.0+cu102)\n", "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers>=4.1.1->hanlp) (3.5.0)\n", "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers>=4.1.1->hanlp) (1.24.3)\n", "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers>=4.1.1->hanlp) (3.0.4)\n", "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers>=4.1.1->hanlp) (2.10)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers>=4.1.1->hanlp) (2021.5.30)\n", "Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers>=4.1.1->hanlp) (1.0.1)\n", "Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers>=4.1.1->hanlp) (7.1.2)\n", "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers>=4.1.1->hanlp) (1.15.0)\n" ] } ], "source": [ "!pip install hanlp -U" ] }, { "cell_type": "markdown", "metadata": { "id": "dHhIRwgqK7J-" }, "source": [ "#### 加载模型\n", "HanLP的工作流程是先加载模型,模型的标示符存储在`hanlp.pretrained`这个包中,按照NLP任务归类。" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "KHY6bsG_K7J-", "outputId": "208c12b6-2702-4ee7-a03a-f053b7ad3479" }, "outputs": [ { "data": { "text/plain": [ "{'CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_BASE_ZH': 'https://file.hankcs.com/hanlp/mtl/close_tok_pos_ner_srl_dep_sdp_con_electra_base_20210111_124519.zip',\n", " 'CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH': 'https://file.hankcs.com/hanlp/mtl/close_tok_pos_ner_srl_dep_sdp_con_electra_small_20210111_124159.zip',\n", " 'NPCMJ_UD_KYOTO_TOK_POS_CON_BERT_BASE_CHAR_JA': 'https://file.hankcs.com/hanlp/mtl/npcmj_ud_kyoto_tok_pos_ner_dep_con_srl_bert_base_char_ja_20210517_225654.zip',\n", " 'OPEN_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_BASE_ZH': 'https://file.hankcs.com/hanlp/mtl/open_tok_pos_ner_srl_dep_sdp_con_electra_base_20201223_201906.zip',\n", " 'OPEN_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH': 'https://file.hankcs.com/hanlp/mtl/open_tok_pos_ner_srl_dep_sdp_con_electra_small_20201223_035557.zip',\n", " 'UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_MT5_SMALL': 'https://file.hankcs.com/hanlp/mtl/ud_ontonotes_tok_pos_lem_fea_ner_srl_dep_sdp_con_mt5_small_20210228_123458.zip',\n", " 'UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_XLMR_BASE': 'https://file.hankcs.com/hanlp/mtl/ud_ontonotes_tok_pos_lem_fea_ner_srl_dep_sdp_con_xlm_base_20210602_211620.zip'}" ] }, "execution_count": 6, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "import hanlp\n", "hanlp.pretrained.mtl.ALL # MTL多任务,具体任务见模型名称,语种见名称最后一个字段或相应语料库" ] }, { "cell_type": "markdown", "metadata": { "id": "WDT3Hks0K7J_" }, "source": [ "调用`hanlp.load`进行加载,模型会自动下载到本地缓存。自然语言处理分为许多任务,分词只是最初级的一个。与其每个任务单独创建一个模型,不如利用HanLP的联合模型一次性完成多个任务:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4Cj8a73rK7J_", "outputId": "a92ac736-6e61-4949-8d35-56c773faf950" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [] } ], "source": [ "HanLP = hanlp.load(hanlp.pretrained.mtl.CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_BASE_ZH)" ] }, { "cell_type": "markdown", "metadata": { "id": "pBqH_My8K7J_" }, "source": [ "## 多任务批量分析\n", "客户端创建完毕,或者模型加载完毕后,就可以传入一个或多个句子进行分析了:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "B58npfkHK7J_", "outputId": "69fed02d-39cb-4b4c-d2c8-d0edc25970ea" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"tok/fine\": [\n", " [\"2021年\", \"HanLPv2.1\", \"为\", \"生产\", \"环境\", \"带来\", \"次\", \"世代\", \"最\", \"先进\", \"的\", \"多\", \"语种\", \"NLP\", \"技术\", \"。\"],\n", " [\"阿婆主\", \"来到\", \"北京\", \"立方庭\", \"参观\", \"自然\", \"语义\", \"科技\", \"公司\", \"。\"]\n", " ],\n", " \"tok/coarse\": [\n", " [\"2021年\", \"HanLPv2.1\", \"为\", \"生产\", \"环境\", \"带来\", \"次世代\", \"最\", \"先进\", \"的\", \"多语种\", \"NLP\", \"技术\", \"。\"],\n", " [\"阿婆主\", \"来到\", \"北京立方庭\", \"参观\", \"自然语义科技公司\", \"。\"]\n", " ],\n", " \"pos/ctb\": [\n", " [\"NT\", \"NR\", \"P\", \"NN\", \"NN\", \"VV\", \"JJ\", \"NN\", \"AD\", \"JJ\", \"DEG\", \"CD\", \"NN\", \"NR\", \"NN\", \"PU\"],\n", " [\"NN\", \"VV\", \"NR\", \"NR\", \"VV\", \"NN\", \"NN\", \"NN\", \"NN\", \"PU\"]\n", " ],\n", " \"pos/pku\": [\n", " [\"t\", \"nx\", \"p\", \"vn\", \"n\", \"v\", \"b\", \"n\", \"d\", \"a\", \"u\", \"a\", \"n\", \"nx\", \"n\", \"w\"],\n", " [\"n\", \"v\", \"ns\", \"ns\", \"v\", \"n\", \"n\", \"n\", \"n\", \"w\"]\n", " ],\n", " \"pos/863\": [\n", " [\"nt\", \"w\", \"p\", \"v\", \"n\", \"v\", \"a\", \"nt\", \"d\", \"a\", \"u\", \"a\", \"n\", \"ws\", \"n\", \"w\"],\n", " [\"n\", \"v\", \"ns\", \"n\", \"v\", \"n\", \"n\", \"n\", \"n\", \"w\"]\n", " ],\n", " \"ner/msra\": [\n", " [[\"2021年\", \"DATE\", 0, 1], [\"HanLPv2.1\", \"WWW\", 1, 2]],\n", " [[\"北京\", \"LOCATION\", 2, 3], [\"立方庭\", \"LOCATION\", 3, 4], [\"自然语义科技公司\", \"ORGANIZATION\", 5, 9]]\n", " ],\n", " \"ner/pku\": [\n", " [],\n", " [[\"北京立方庭\", \"ns\", 2, 4], [\"自然语义科技公司\", \"nt\", 5, 9]]\n", " ],\n", " \"ner/ontonotes\": [\n", " [[\"2021年\", \"DATE\", 0, 1], [\"HanLPv2.1\", \"ORG\", 1, 2]],\n", " [[\"北京立方庭\", \"FAC\", 2, 4], [\"自然语义科技公司\", \"ORG\", 5, 9]]\n", " ],\n", " \"srl\": [\n", " [[[\"2021年\", \"ARGM-TMP\", 0, 1], [\"HanLPv2.1\", \"ARG0\", 1, 2], [\"为生产环境\", \"ARG2\", 2, 5], [\"带来\", \"PRED\", 5, 6], [\"次世代最先进的多语种NLP技术\", \"ARG1\", 6, 15]], [[\"最\", \"ARGM-ADV\", 8, 9], [\"先进\", \"PRED\", 9, 10], [\"技术\", \"ARG0\", 14, 15]]],\n", " [[[\"阿婆主\", \"ARG0\", 0, 1], [\"来到\", \"PRED\", 1, 2], [\"北京立方庭\", \"ARG1\", 2, 4]], [[\"阿婆主\", \"ARG0\", 0, 1], [\"参观\", \"PRED\", 4, 5], [\"自然语义科技公司\", \"ARG1\", 5, 9]]]\n", " ],\n", " \"dep\": [\n", " [[6, \"tmod\"], [6, \"nsubj\"], [6, \"prep\"], [5, \"nn\"], [3, \"pobj\"], [0, \"root\"], [8, \"amod\"], [15, \"nn\"], [10, \"advmod\"], [15, \"rcmod\"], [10, \"assm\"], [13, \"nummod\"], [15, \"nn\"], [15, \"nn\"], [6, \"dobj\"], [6, \"punct\"]],\n", " [[2, \"nsubj\"], [0, \"root\"], [4, \"nn\"], [2, \"dobj\"], [2, \"conj\"], [9, \"nn\"], [9, \"nn\"], [9, \"nn\"], [5, \"dobj\"], [2, \"punct\"]]\n", " ],\n", " \"sdp\": [\n", " [[[6, \"Time\"]], [[6, \"Exp\"]], [[5, \"mPrep\"]], [[5, \"Desc\"]], [[6, \"Datv\"]], [[13, \"dDesc\"]], [[0, \"Root\"], [8, \"Desc\"], [13, \"Desc\"]], [[15, \"Time\"]], [[10, \"mDegr\"]], [[15, \"Desc\"]], [[10, \"mAux\"]], [[8, \"Quan\"], [13, \"Quan\"]], [[15, \"Desc\"]], [[15, \"Nmod\"]], [[6, \"Pat\"]], [[6, \"mPunc\"]]],\n", " [[[2, \"Agt\"], [5, \"Agt\"]], [[0, \"Root\"]], [[4, \"Loc\"]], [[2, \"Lfin\"]], [[2, \"ePurp\"]], [[8, \"Nmod\"]], [[9, \"Nmod\"]], [[9, \"Nmod\"]], [[5, \"Datv\"]], [[5, \"mPunc\"]]]\n", " ],\n", " \"con\": [\n", " [\"TOP\", [[\"IP\", [[\"NP\", [[\"NT\", [\"2021年\"]]]], [\"NP\", [[\"NR\", [\"HanLPv2.1\"]]]], [\"VP\", [[\"PP\", [[\"P\", [\"为\"]], [\"NP\", [[\"NN\", [\"生产\"]], [\"NN\", [\"环境\"]]]]]], [\"VP\", [[\"VV\", [\"带来\"]], [\"NP\", [[\"ADJP\", [[\"NP\", [[\"ADJP\", [[\"JJ\", [\"次\"]]]], [\"NP\", [[\"NN\", [\"世代\"]]]]]], [\"ADVP\", [[\"AD\", [\"最\"]]]], [\"VP\", [[\"JJ\", [\"先进\"]]]]]], [\"DEG\", [\"的\"]], [\"NP\", [[\"QP\", [[\"CD\", [\"多\"]]]], [\"NP\", [[\"NN\", [\"语种\"]]]]]], [\"NP\", [[\"NR\", [\"NLP\"]], [\"NN\", [\"技术\"]]]]]]]]]], [\"PU\", [\"。\"]]]]]],\n", " [\"TOP\", [[\"IP\", [[\"NP\", [[\"NN\", [\"阿婆主\"]]]], [\"VP\", [[\"VP\", [[\"VV\", [\"来到\"]], [\"NP\", [[\"NR\", [\"北京\"]], [\"NR\", [\"立方庭\"]]]]]], [\"VP\", [[\"VV\", [\"参观\"]], [\"NP\", [[\"NN\", [\"自然\"]], [\"NN\", [\"语义\"]], [\"NN\", [\"科技\"]], [\"NN\", [\"公司\"]]]]]]]], [\"PU\", [\"。\"]]]]]]\n", " ]\n", "}\n" ] } ], "source": [ "doc = HanLP(['2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。', '阿婆主来到北京立方庭参观自然语义科技公司。'])\n", "print(doc)" ] }, { "cell_type": "markdown", "metadata": { "id": "tvuxfWPYK7J_" }, "source": [ "## 可视化\n", "输出结果是一个可以`json`化的`dict`,键为[NLP任务名](https://hanlp.hankcs.com/docs/data_format.html#naming-convention),值为分析结果。关于标注集含义,请参考[《语言学标注规范》](https://hanlp.hankcs.com/docs/annotations/index.html)及[《格式规范》](https://hanlp.hankcs.com/docs/data_format.html)。我们购买、标注或采用了世界上量级最大、种类最多的语料库用于联合多语种多任务学习,所以HanLP的标注集也是覆盖面最广的。通过`doc.pretty_print`,可以在等宽字体环境中得到可视化,你需要取消换行才能对齐可视化结果。我们已经发布HTML环境的可视化,在Jupyter Notebook中自动对齐中文。" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 575 }, "id": "M8WxTdlAK7KA", "outputId": "a027a302-74d8-48c9-b30d-45ebf8741c1e" }, "outputs": [ { "data": { "text/html": [ "
Dep Tree     
──────────── 
 ┌─────────► 
 │┌────────► 
 ││┌─►┌───── 
 │││  │  ┌─► 
 │││  └─►└── 
┌┼┴┴──────── 
││       ┌─► 
││  ┌───►└── 
││  │    ┌─► 
││  │┌──►├── 
││  ││   └─► 
││  ││   ┌─► 
││  ││┌─►└── 
││  │││  ┌─► 
│└─►└┴┴──┴── 
└──────────► 
Token     
───────── 
2021年     
HanLPv2.1 
为         
生产        
环境        
带来        
次         
世代        
最         
先进        
的         
多         
语种        
NLP       
技术        
。         
Relati 
────── 
tmod   
nsubj  
prep   
nn     
pobj   
root   
amod   
nn     
advmod 
rcmod  
assm   
nummod 
nn     
nn     
dobj   
punct  
PoS 
─── 
NT  
NR  
P   
NN  
NN  
VV  
JJ  
NN  
AD  
JJ  
DEG 
CD  
NN  
NR  
NN  
PU  
Tok       
───────── 
2021年     
HanLPv2.1 
为         
生产        
环境        
带来        
次         
世代        
最         
先进        
的         
多         
语种        
NLP       
技术        
。         
NER Type 
──────── 
───►DATE 
───►WWW  
         
         
         
         
         
         
         
         
         
         
         
         
         
         
Tok       
───────── 
2021年     
HanLPv2.1 
为         
生产        
环境        
带来        
次         
世代        
最         
先进        
的         
多         
语种        
NLP       
技术        
。         
SRL PA1      
──────────── 
───►ARGM-TMP 
───►ARG0     
◄─┐          
  ├►ARG2     
◄─┘          
╟──►PRED     
◄─┐          
  │          
  │          
  │          
  ├►ARG1     
  │          
  │          
  │          
◄─┘          
             
Tok       
───────── 
2021年     
HanLPv2.1 
为         
生产        
环境        
带来        
次         
世代        
最         
先进        
的         
多         
语种        
NLP       
技术        
。         
SRL PA2      
──────────── 
             
             
             
             
             
             
             
             
───►ARGM-ADV 
╟──►PRED     
             
             
             
             
───►ARG0     
             
Tok       
───────── 
2021年     
HanLPv2.1 
为         
生产        
环境        
带来        
次         
世代        
最         
先进        
的         
多         
语种        
NLP       
技术        
。         
PoS    3       4       5       6       7       8       9 
─────────────────────────────────────────────────────────
NT ───────────────────────────────────────────►NP ───┐   
NR ───────────────────────────────────────────►NP────┤   
P ───────────┐                                       │   
NN ──┐       ├────────────────────────►PP ───┐       │   
NN ──┴►NP ───┘                               │       │   
VV ──────────────────────────────────┐       │       │   
JJ ───►ADJP──┐                       │       ├►VP────┤   
NN ───►NP ───┴►NP ───┐               │       │       │   
AD ───────────►ADVP──┼►ADJP──┐       ├►VP ───┘       ├►IP
JJ ───────────►VP ───┘       │       │               │   
DEG──────────────────────────┤       │               │   
CD ───►QP ───┐               ├►NP ───┘               │   
NN ───►NP ───┴────────►NP────┤                       │   
NR ──┐                       │                       │   
NN ──┴────────────────►NP ───┘                       │   
PU ──────────────────────────────────────────────────┘   

Dep Tree     
──────────── 
         ┌─► 
┌┬────┬──┴── 
││    │  ┌─► 
││    └─►└── 
│└─►┌─────── 
│   │  ┌───► 
│   │  │┌──► 
│   │  ││┌─► 
│   └─►└┴┴── 
└──────────► 
Tok 
─── 
阿婆主 
来到  
北京  
立方庭 
参观  
自然  
语义  
科技  
公司  
。   
Relat 
───── 
nsubj 
root  
nn    
dobj  
conj  
nn    
nn    
nn    
dobj  
punct 
Po 
── 
NN 
VV 
NR 
NR 
VV 
NN 
NN 
NN 
NN 
PU 
Tok 
─── 
阿婆主 
来到  
北京  
立方庭 
参观  
自然  
语义  
科技  
公司  
。   
NER Type         
──────────────── 
                 
                 
───►LOCATION     
───►LOCATION     
                 
◄─┐              
  │              
  ├►ORGANIZATION 
◄─┘              
                 
Tok 
─── 
阿婆主 
来到  
北京  
立方庭 
参观  
自然  
语义  
科技  
公司  
。   
SRL PA1  
──────── 
───►ARG0 
╟──►PRED 
◄─┐      
◄─┴►ARG1 
         
         
         
         
         
         
Tok 
─── 
阿婆主 
来到  
北京  
立方庭 
参观  
自然  
语义  
科技  
公司  
。   
SRL PA2  
──────── 
───►ARG0 
         
         
         
╟──►PRED 
◄─┐      
  │      
  ├►ARG1 
◄─┘      
         
Tok 
─── 
阿婆主 
来到  
北京  
立方庭 
参观  
自然  
语义  
科技  
公司  
。   
Po    3       4       5       6 
────────────────────────────────
NN───────────────────►NP ───┐   
VV──────────┐               │   
NR──┐       ├►VP ───┐       │   
NR──┴►NP ───┘       │       │   
VV──────────┐       ├►VP────┤   
NN──┐       │       │       ├►IP
NN  │       ├►VP ───┘       │   
NN  ├►NP ───┘               │   
NN──┘                       │   
PU──────────────────────────┘   
" ], "text/plain": [ "" ] }, "metadata": { "tags": [] }, "output_type": "display_data" } ], "source": [ "doc.pretty_print()" ] }, { "cell_type": "markdown", "metadata": { "id": "_B2HDiZgK7KA" }, "source": [ "## 指定任务\n", "简洁的接口也支持灵活的参数,任务越少,速度越快。如指定仅执行分词:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "id": "9Mnys4t2K7KA", "outputId": "88d72a72-c095-4f6d-df0b-d881887087ce" }, "outputs": [ { "data": { "text/html": [ "
阿婆主 来到 北京 立方庭 参观 自然 语义 科技 公司 。
" ], "text/plain": [ "" ] }, "metadata": { "tags": [] }, "output_type": "display_data" } ], "source": [ "HanLP('阿婆主来到北京立方庭参观自然语义科技公司。', tasks='tok').pretty_print()" ] }, { "cell_type": "markdown", "metadata": { "id": "s5RkVkVkK7KA" }, "source": [ "### 执行粗颗粒度分词" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "id": "5R_PwELlK7KA", "outputId": "5ce2c037-eb44-481f-9de2-dc0d4122e7c4" }, "outputs": [ { "data": { "text/html": [ "
阿婆主 来到 北京立方庭 参观 自然语义科技公司 。
" ], "text/plain": [ "" ] }, "metadata": { "tags": [] }, "output_type": "display_data" } ], "source": [ "HanLP('阿婆主来到北京立方庭参观自然语义科技公司。', tasks='tok/coarse').pretty_print()" ] }, { "cell_type": "markdown", "metadata": { "id": "pTrajkHEK7KB" }, "source": [ "### 执行分词和PKU词性标注" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "id": "kkkgVKFqK7KB", "outputId": "e9f9879b-47ce-459a-e089-923de1c6436c" }, "outputs": [ { "data": { "text/html": [ "
阿婆主/n 来到/v 北京/ns 立方庭/ns 参观/v 自然/n 语义/n 科技/n 公司/n 。/w
" ], "text/plain": [ "" ] }, "metadata": { "tags": [] }, "output_type": "display_data" } ], "source": [ "HanLP('阿婆主来到北京立方庭参观自然语义科技公司。', tasks='pos/pku').pretty_print()" ] }, { "cell_type": "markdown", "metadata": { "id": "YLLTVY0RK7KB" }, "source": [ "### 执行粗颗粒度分词和PKU词性标注" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "id": "5qSlqbcfK7KB", "outputId": "66944459-bc22-4bd9-e4af-4d2aba9316f3" }, "outputs": [ { "data": { "text/html": [ "
阿婆主/n 来到/v 北京立方庭/ns 参观/v 自然语义科技公司/n 。/w
" ], "text/plain": [ "" ] }, "metadata": { "tags": [] }, "output_type": "display_data" } ], "source": [ "HanLP('阿婆主来到北京立方庭参观自然语义科技公司。', tasks=['tok/coarse', 'pos/pku'], skip_tasks='tok/fine').pretty_print()" ] }, { "cell_type": "markdown", "metadata": { "id": "3nNojvHiK7KB" }, "source": [ "### 执行分词和MSRA标准NER" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 225 }, "id": "tTVoEPiAK7KB", "outputId": "b8dc8c24-3392-4712-d1b6-e2dc8b7710e8" }, "outputs": [ { "data": { "text/html": [ "
Tok 
─── 
阿婆主 
来到  
北京  
立方庭 
参观  
自然  
语义  
科技  
公司  
。   
NER Type        
────────────────
                
                
───►LOCATION    
───►LOCATION    
                
◄─┐             
  │             
  ├►ORGANIZATION
◄─┘             
                
" ], "text/plain": [ "" ] }, "metadata": { "tags": [] }, "output_type": "display_data" } ], "source": [ "HanLP('阿婆主来到北京立方庭参观自然语义科技公司。', tasks='ner/msra').pretty_print()" ] }, { "cell_type": "markdown", "metadata": { "id": "uG2wYTfmK7KB" }, "source": [ "### 执行分词、词性标注和依存句法分析" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 225 }, "id": "WXl6f7zyK7KC", "outputId": "8671e0e4-d0c3-40f4-a4db-ba9aaec225ab" }, "outputs": [ { "data": { "text/html": [ "
Dep Tree     
──────────── 
         ┌─► 
┌┬────┬──┴── 
││    │  ┌─► 
││    └─►└── 
│└─►┌─────── 
│   │  ┌───► 
│   │  │┌──► 
│   │  ││┌─► 
│   └─►└┴┴── 
└──────────► 
Tok 
─── 
阿婆主 
来到  
北京  
立方庭 
参观  
自然  
语义  
科技  
公司  
。   
Relat 
───── 
nsubj 
root  
nn    
dobj  
conj  
nn    
nn    
nn    
dobj  
punct 
Po
──
NN
VV
NR
NR
VV
NN
NN
NN
NN
PU
" ], "text/plain": [ "" ] }, "metadata": { "tags": [] }, "output_type": "display_data" } ], "source": [ "doc = HanLP('阿婆主来到北京立方庭参观自然语义科技公司。', tasks=['pos', 'dep'])\n", "doc.pretty_print()" ] }, { "cell_type": "markdown", "metadata": { "id": "ocxM3LsGK7KC" }, "source": [ "转换为CoNLL格式:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "NtKmSB_0K7KC", "outputId": "cc9245b3-32c2-4d35-88a8-a7d91127eca7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\t阿婆主\t_\tNN\t_\t_\t2\tnsubj\t_\t_\n", "2\t来到\t_\tVV\t_\t_\t0\troot\t_\t_\n", "3\t北京\t_\tNR\t_\t_\t4\tnn\t_\t_\n", "4\t立方庭\t_\tNR\t_\t_\t2\tdobj\t_\t_\n", "5\t参观\t_\tVV\t_\t_\t2\tconj\t_\t_\n", "6\t自然\t_\tNN\t_\t_\t9\tnn\t_\t_\n", "7\t语义\t_\tNN\t_\t_\t9\tnn\t_\t_\n", "8\t科技\t_\tNN\t_\t_\t9\tnn\t_\t_\n", "9\t公司\t_\tNN\t_\t_\t5\tdobj\t_\t_\n", "10\t。\t_\tPU\t_\t_\t2\tpunct\t_\t_\n" ] } ], "source": [ "print(doc.to_conll())" ] }, { "cell_type": "markdown", "metadata": { "id": "PNBo-kETK7KC" }, "source": [ "### 执行分词、词性标注和短语成分分析" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 225 }, "id": "Ja8dib6XK7KC", "outputId": "a972f5bb-ae23-47a9-cd9f-6070a5b39f50" }, "outputs": [ { "data": { "text/html": [ "
Tok 
─── 
阿婆主 
来到  
北京  
立方庭 
参观  
自然  
语义  
科技  
公司  
。   
Po    3       4       5       6 
────────────────────────────────
NN───────────────────►NP ───┐   
VV──────────┐               │   
NR──┐       ├►VP ───┐       │   
NR──┴►NP ───┘       │       │   
VV──────────┐       ├►VP────┤   
NN──┐       │       │       ├►IP
NN  │       ├►VP ───┘       │   
NN  ├►NP ───┘               │   
NN──┘                       │   
PU──────────────────────────┘   
" ], "text/plain": [ "" ] }, "metadata": { "tags": [] }, "output_type": "display_data" } ], "source": [ "doc = HanLP('阿婆主来到北京立方庭参观自然语义科技公司。', tasks=['pos', 'con'])\n", "doc.pretty_print()" ] }, { "cell_type": "markdown", "metadata": { "id": "Mg3DhvjhK7KC" }, "source": [ "#### 将短语结构树以bracketed形式打印" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "kE8iBZNUK7KC", "outputId": "79e2a72d-e473-41ca-c054-9595a4dd5971" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(TOP\n", " (IP\n", " (NP (NN 阿婆主))\n", " (VP\n", " (VP (VV 来到) (NP (NR 北京) (NR 立方庭)))\n", " (VP (VV 参观) (NP (NN 自然) (NN 语义) (NN 科技) (NN 公司))))\n", " (PU 。)))\n" ] } ], "source": [ "print(doc['con']) # str(doc['con'])会将短语结构列表转换为括号形式" ] }, { "cell_type": "markdown", "metadata": { "id": "MfleaY_pK7KC" }, "source": [ "关于标注集含义,请参考[《语言学标注规范》](https://hanlp.hankcs.com/docs/annotations/index.html)及[《格式规范》](https://hanlp.hankcs.com/docs/data_format.html)。我们购买、标注或采用了世界上量级最大、种类最多的语料库用于联合多语种多任务学习,所以HanLP的标注集也是覆盖面最广的。\n", "\n", "## 多语种支持\n", "总之,可以通过tasks参数灵活调用各种NLP任务。除了中文联合模型之外,你可以在文档中通过找到许多其他语种的模型,比如日语:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "oJP8dvfvK7KD", "outputId": "2262ccdb-7cf5-4859-8d6c-18300e54c22e" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [] } ], "source": [ "ja = hanlp.load(hanlp.pretrained.mtl.NPCMJ_UD_KYOTO_TOK_POS_CON_BERT_BASE_CHAR_JA)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 991 }, "id": "3WPvCbH2K7KD", "outputId": "46a9435d-ed5b-47ef-99c6-71d7ee0fc6e8" }, "outputs": [ { "data": { "text/html": [ "
Dep Tree       
────────────── 
           ┌─► 
┌─────────►├── 
│          └─► 
│   ┌────────► 
│   │┌───────► 
│   ││     ┌─► 
│   ││┌───►├── 
│   │││    └─► 
│   │││┌─────► 
│   ││││┌────► 
│   │││││┌───► 
│   ││││││┌──► 
│   │││││││┌─► 
│┌─►└┴┴┴┴┴┴┼── 
││         └─► 
││         ┌─► 
││      ┌─►├── 
││      │  └─► 
└┴──────┴┬┬┬── 
         ││└─► 
         │└──► 
         └───► 
Token     
───────── 
2021      
年         
、         
HanLPv2.1 
は         
次         
世代        
の         
最         
先端        
多         
言語        
NLP       
技術        
を         
本番        
環境        
に         
導入        
し         
ます        
。         
Relation 
──────── 
nummod   
obl      
punct    
compound 
case     
compound 
nmod     
case     
compound 
compound 
compound 
compound 
compound 
obj      
case     
compound 
obl      
case     
root     
aux      
aux      
punct    
PoS 
─── 
NUM 
CL  
PU  
NPR 
P   
N   
N   
P   
N   
N   
NUM 
N   
N   
N   
P   
N   
N   
P   
VB  
VB0 
AX  
PU  
Tok       
───────── 
2021      
年         
、         
HanLPv2.1 
は         
次         
世代        
の         
最         
先端        
多         
言語        
NLP       
技術        
を         
本番        
環境        
に         
導入        
し         
ます        
。         
NER Type     
──────────── 
◄─┐          
◄─┴►DATE     
             
───►ARTIFACT 
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
Tok       
───────── 
2021      
年         
、         
HanLPv2.1 
は         
次         
世代        
の         
最         
先端        
多         
言語        
NLP       
技術        
を         
本番        
環境        
に         
導入        
し         
ます        
。         
SRL PA1  
──────── 
         
         
         
         
         
───►修飾   
╟──►PRED 
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
Tok       
───────── 
2021      
年         
、         
HanLPv2.1 
は         
次         
世代        
の         
最         
先端        
多         
言語        
NLP       
技術        
を         
本番        
環境        
に         
導入        
し         
ます        
。         
SRL PA3  
──────── 
         
         
         
         
         
         
         
         
◄─┐      
◄─┴►修飾   
         
╟──►PRED 
         
         
         
         
         
         
         
         
         
         
Tok       
───────── 
2021      
年         
、         
HanLPv2.1 
は         
次         
世代        
の         
最         
先端        
多         
言語        
NLP       
技術        
を         
本番        
環境        
に         
導入        
し         
ます        
。         
SRL PA4  
──────── 
         
         
         
         
         
◄─┐      
  │      
  │      
  ├►修飾   
  │      
◄─┘      
◄─┐      
◄─┴►ノ    
╟──►PRED 
         
         
         
         
         
         
         
         
Tok       
───────── 
2021      
年         
、         
HanLPv2.1 
は         
次         
世代        
の         
最         
先端        
多         
言語        
NLP       
技術        
を         
本番        
環境        
に         
導入        
し         
ます        
。         
SRL PA5  
──────── 
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
───►修飾   
╟──►PRED 
         
         
         
         
         
Tok       
───────── 
2021      
年         
、         
HanLPv2.1 
は         
次         
世代        
の         
最         
先端        
多         
言語        
NLP       
技術        
を         
本番        
環境        
に         
導入        
し         
ます        
。         
SRL PA6  
──────── 
◄─┐      
  ├►時間   
◄─┘      
◄─┐      
◄─┴►ガ    
◄─┐      
  │      
  │      
  │      
  │      
  ├►ヲ    
  │      
  │      
  │      
◄─┘      
◄─┐      
  ├►ニ    
◄─┘      
╟──►PRED 
         
         
         
Tok       
───────── 
2021      
年         
、         
HanLPv2.1 
は         
次         
世代        
の         
最         
先端        
多         
言語        
NLP       
技術        
を         
本番        
環境        
に         
導入        
し         
ます        
。         
PoS    3         4        5       6       7       8 
────────────────────────────────────────────────────
NUM──┐                                              
CL ──┴►NUMCLP──────── ───────────────────►NP ───┐   
PU ──────── ───────── ──────────────────────────┤   
NPR───►NP ─────┐                                │   
P ───────── ───┴►──── ───────────────────►PP────┤   
N ───┐                                          │   
N ───┴►NP ─────┐                                │   
P ───────── ───┴►PP ────┐                       │   
N ───────── ─────────   │                       │   
N ────►NP ──────►CONJP──┤                       │   
NUM──────── ─────────   ├►NML ──┐               │   
N ───────── ─────────   │       │               ├►IP
N ───────── ───────── ──┘       ├►NP ───┐       │   
N ───────── ───────── ──────────┘       ├►PP────┤   
P ───────── ───────── ──────────────────┘       │   
N ───┐                                          │   
N ───┴►NP ─────┐                                │   
P ───────── ───┴►──── ───────────────────►PP────┤   
VB ──────── ───────── ──────────────────────────┤   
VB0──────── ───────── ──────────────────────────┤   
AX ──────── ───────── ──────────────────────────┤   
PU ──────── ───────── ──────────────────────────┘   

Dep Tree       
────────────── 
           ┌─► 
┌─────────►├── 
│          └─► 
│      ┌─────► 
│      │┌────► 
│      ││┌───► 
│      │││┌──► 
│      ││││┌─► 
│   ┌─►└┴┴┴┼── 
│   │      └─► 
│   │      ┌─► 
│   │   ┌─►└── 
│   │   │  ┌─► 
│   │┌─►└──┼── 
│   ││     └─► 
│┌─►└┴─────┬── 
││         └─► 
││        ┌──► 
││        │┌─► 
││   ┌─►┌┬┼┼── 
││   │  │││└─► 
││   │  ││└──► 
││   │  │└───► 
││   │  └────► 
││   │     ┌─► 
└┴───┴────┬┼── 
          │└─► 
          └──► 
Toke 
──── 
奈須   
きのこ  
は    
1973 
年    
11   
月    
28   
日    
に    
千葉   
県    
円空   
山    
で    
生まれ  
、    
ゲーム  
制作   
会社   
「    
ノーツ  
」    
の    
設立   
者    
だ    
。    
Relation 
──────── 
compound 
nsubj    
case     
compound 
compound 
compound 
compound 
nummod   
obl      
case     
compound 
nmod     
compound 
obl      
case     
acl      
punct    
compound 
compound 
nmod     
punct    
compound 
punct    
case     
compound 
root     
cop      
punct    
PoS 
─── 
NPR 
NPR 
P   
NUM 
CL  
NUM 
CL  
NUM 
CL  
P   
NPR 
NPR 
NPR 
NPR 
P   
VB  
PU  
N   
N   
N   
PUL 
NPR 
PUR 
P   
N   
N   
AX  
PU  
Tok  
──── 
奈須   
きのこ  
は    
1973 
年    
11   
月    
28   
日    
に    
千葉   
県    
円空   
山    
で    
生まれ  
、    
ゲーム  
制作   
会社   
「    
ノーツ  
」    
の    
設立   
者    
だ    
。    
NER Type         
──────────────── 
◄─┐              
◄─┴►PERSON       
                 
◄─┐              
  │              
  │              
  ├►DATE         
  │              
◄─┘              
                 
◄─┐              
  │              
  ├►LOCATION     
◄─┘              
                 
                 
                 
                 
                 
                 
                 
───►ORGANIZATION 
                 
                 
                 
                 
                 
                 
Tok  
──── 
奈須   
きのこ  
は    
1973 
年    
11   
月    
28   
日    
に    
千葉   
県    
円空   
山    
で    
生まれ  
、    
ゲーム  
制作   
会社   
「    
ノーツ  
」    
の    
設立   
者    
だ    
。    
SRL PA1  
──────── 
         
         
         
         
         
         
         
         
         
         
◄─┐      
◄─┴►ノ?   
         
╟──►PRED 
         
         
         
         
         
         
         
         
         
         
         
         
         
         
Tok  
──── 
奈須   
きのこ  
は    
1973 
年    
11   
月    
28   
日    
に    
千葉   
県    
円空   
山    
で    
生まれ  
、    
ゲーム  
制作   
会社   
「    
ノーツ  
」    
の    
設立   
者    
だ    
。    
SRL PA2  
──────── 
◄─┐      
  ├►ガ    
◄─┘      
◄─┐      
  │      
  │      
  ├►時間   
  │      
  │      
◄─┘      
◄─┐      
  │      
  ├►デ    
  │      
◄─┘      
╟──►PRED 
         
         
         
         
         
         
         
         
         
         
         
         
Tok  
──── 
奈須   
きのこ  
は    
1973 
年    
11   
月    
28   
日    
に    
千葉   
県    
円空   
山    
で    
生まれ  
、    
ゲーム  
制作   
会社   
「    
ノーツ  
」    
の    
設立   
者    
だ    
。    
SRL PA3  
──────── 
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
◄─┐      
◄─┴►ノ    
╟──►PRED 
         
         
         
         
         
         
         
         
Tok  
──── 
奈須   
きのこ  
は    
1973 
年    
11   
月    
28   
日    
に    
千葉   
県    
円空   
山    
で    
生まれ  
、    
ゲーム  
制作   
会社   
「    
ノーツ  
」    
の    
設立   
者    
だ    
。    
SRL PA4  
──────── 
◄─┐      
  ├►ガ    
◄─┘      
         
         
         
         
         
         
         
         
         
         
         
         
         
         
◄─┐      
  │      
  │      
  ├►ヲ    
  │      
  │      
◄─┘      
╟──►PRED 
         
         
         
Tok  
──── 
奈須   
きのこ  
は    
1973 
年    
11   
月    
28   
日    
に    
千葉   
県    
円空   
山    
で    
生まれ  
、    
ゲーム  
制作   
会社   
「    
ノーツ  
」    
の    
設立   
者    
だ    
。    
SRL PA5  
──────── 
◄─┐      
  ├►ガ    
◄─┘      
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
╟──►PRED 
         
         
Tok  
──── 
奈須   
きのこ  
は    
1973 
年    
11   
月    
28   
日    
に    
千葉   
県    
円空   
山    
で    
生まれ  
、    
ゲーム  
制作   
会社   
「    
ノーツ  
」    
の    
設立   
者    
だ    
。    
PoS    3         4       5       6       7       8       9       10      11
───────────────────────────────────────────────────────────────────────────
NPR──┐                                                                     
NPR──┴►NP ─────┐                                                           
P ───────── ───┴────────────────────────────────────────────────►PP ───┐   
NUM──┐                                                                 │   
CL ──┴►NUMCLP──┐                                                       │   
NUM──┐         │                                                       │   
CL ──┴►NUMCLP──┼►NP ───┐                                               │   
NUM──┐         │       │                                               │   
CL ──┴►NUMCLP──┘       ├►PP ───┐                                       │   
P ───────── ───────────┘       │                                       │   
NPR──┐                         │                                       │   
NPR──┴►PP ─────┐               │                                       │   
NPR────────    ├►NP ───┐       ├────────────────────────────────►IP────┤   
NPR──────── ───┘       ├►PP────┤                                       │   
P ───────── ───────────┘       │                                       │   
VB ──────── ───────────────────┘                                       ├►IP
PU ──────── ───────────────────────────────────────────────────────────┤   
N ───┐                                                                 │   
N ───┴►NP ──────►PRN ──┐                                               │   
N ───────── ───────────┴►NP ────►PRN ──┐                               │   
PUL──────── ───────────────────────────┤                               │   
NPR──────── ───────────────────────────┼►NP ───┐                       │   
PUR──────── ───────────────────────────┘       ├►PP ───┐               │   
P ───────── ───────────────────────────────────┘       ├►IP ───┐       │   
N ───────── ───────────────────────────────────────────┘       ├►NP────┤   
N ───────── ───────────────────────────────────────────────────┘       │   
AX ──────── ───────────────────────────────────────────────────────────┤   
PU ──────── ───────────────────────────────────────────────────────────┘   
" ], "text/plain": [ "" ] }, "metadata": { "tags": [] }, "output_type": "display_data" } ], "source": [ "ja(['2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。',\n", " '奈須きのこは1973年11月28日に千葉県円空山で生まれ、ゲーム制作会社「ノーツ」の設立者だ。',]).pretty_print()" ] }, { "cell_type": "markdown", "metadata": { "id": "NifrOGlNK7KD" }, "source": [ "以及支持[130种语言](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/mtl.html#hanlp.pretrained.mtl.UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_MMINILMV2L6)的多语种联合模型:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "ae-4j5sbK7KD", "outputId": "2777cc5d-c1c5-4091-b754-0c220dafea8a" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [] }, { "data": { "text/html": [ "
Dep Tree   
────────── 
       ┌─► 
    ┌─►├── 
    │  └─► 
    │  ┌─► 
┌┬┬─┴──┴── 
│││  ┌───► 
│││  │┌──► 
│││  ││┌─► 
││└─►└┴┴── 
││    ┌──► 
││    │┌─► 
│└───►└┴── 
└────────► 
Token            
──────────────── 
In               
2021             
,                
HanLPv2.1        
delivers         
state-of-the-art 
multilingual     
NLP              
techniques       
to               
production       
environments     
.                
Relation 
──────── 
case     
obl      
punct    
nsubj    
root     
amod     
amod     
compound 
obj      
case     
compound 
obl      
punct    
Lemma            
──────────────── 
in               
2021             
,                
HANlpv2.1        
deliver          
state-of-the-art 
multilingual     
NLP              
technique        
to               
production       
environment      
.                
PoS   
───── 
ADP   
NUM   
PUNCT 
PROPN 
VERB  
ADJ   
ADJ   
PROPN 
NOUN  
ADP   
NOUN  
NOUN  
PUNCT 
Tok              
──────────────── 
In               
2021             
,                
HanLPv2.1        
delivers         
state-of-the-art 
multilingual     
NLP              
techniques       
to               
production       
environments     
.                
NER Type        
─────────────── 
                
───►DATE        
                
───►WORK_OF_ART 
                
                
                
                
                
                
                
                
                
Tok              
──────────────── 
In               
2021             
,                
HanLPv2.1        
delivers         
state-of-the-art 
multilingual     
NLP              
techniques       
to               
production       
environments     
.                
SRL PA1      
──────────── 
◄─┐          
◄─┴►ARGM-TMP 
             
───►ARG0     
╟──►PRED     
             
             
             
             
◄─┐          
  ├►ARG2     
◄─┘          
             
Tok              
──────────────── 
In               
2021             
,                
HanLPv2.1        
delivers         
state-of-the-art 
multilingual     
NLP              
techniques       
to               
production       
environments     
.                
PoS      3       4       5       6
──────────────────────────────────
ADP ───────────┐                  
NUM ────►NP ───┴────────►PP ───┐  
PUNCT──────────────────────────┤  
PROPN───────────────────►NP────┤  
VERB ──────────────────┐       │  
ADJ ───┐               │       │  
ADJ    │               │       │  
PROPN  ├────────►NP────┼►VP────┼►S
NOUN ──┘               │       │  
ADP ───────────┐       │       │  
NOUN ──┐       ├►PP ───┘       │  
NOUN ──┴►NP ───┘               │  
PUNCT──────────────────────────┘  

Dep Tree      
───────────── 
          ┌─► 
┌────────►├── 
│         └─► 
│┌───────►┌── 
││        └─► 
││        ┌─► 
││   ┌───►├── 
││   │    └─► 
││   │┌─────► 
││   ││┌────► 
││   │││┌───► 
││   ││││┌──► 
││   │││││┌─► 
││┌─►└┴┴┴┴┼── 
│││       └─► 
│││       ┌─► 
│││    ┌─►├── 
│││    │  └─► 
└┴┴────┴─┬┬── 
         │└─► 
         └──► 
Token     
───────── 
2021      
年         
、         
HanLPv2.1 
は         
次         
世代        
の         
最         
先端        
多         
言語        
NLP       
技術        
を         
本番        
環境        
に         
導入        
します       
。         
Relation 
──────── 
nummod   
obl      
punct    
nsubj    
case     
compound 
nmod     
case     
compound 
compound 
compound 
compound 
compound 
obj      
case     
compound 
obl      
case     
root     
aux      
punct    
Lemma     
───────── 
2021      
年         
、         
HANLPV2.1 
は         
次         
世代        
の         
最         
先端        
多         
言語        
NLP       
技術        
を         
本番        
環境        
に         
導入        
します       
。         
PoS   
───── 
NUM   
NOUN  
PUNCT 
NOUN  
ADP   
NOUN  
NOUN  
ADP   
NOUN  
NOUN  
NOUN  
NOUN  
NOUN  
NOUN  
ADP   
NOUN  
NOUN  
ADP   
VERB  
AUX   
PUNCT 
Tok       
───────── 
2021      
年         
、         
HanLPv2.1 
は         
次         
世代        
の         
最         
先端        
多         
言語        
NLP       
技術        
を         
本番        
環境        
に         
導入        
します       
。         
NER Type 
──────── 
◄─┐      
◄─┴►DATE 
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
Tok       
───────── 
2021      
年         
、         
HanLPv2.1 
は         
次         
世代        
の         
最         
先端        
多         
言語        
NLP       
技術        
を         
本番        
環境        
に         
導入        
します       
。         
PoS      3       4       5       6       7       8       9 
───────────────────────────────────────────────────────────
NUM ───────────────────────────────────────────────────┐   
NOUN ──────────────────────────────────────────────────┤   
PUNCT──────────────────────────────────────────────────┤   
NOUN ──────────────────────────────────────────────────┤   
ADP ───────────────────────────┐                       │   
NOUN ──────────────────────────┤                       │   
NOUN ──────────────────────────┤                       │   
ADP ───────────────────────────┼►VP ────►VP ────►IP────┤   
NOUN ───►ADJP──┐               │                       │   
NOUN ───►ADJP──┴►ADJP──┐       │                       │   
NOUN ───────────►ADJP──┴►ADJP──┘                       ├►IP
NOUN ──┐                                               │   
NOUN   ├►NP ───┐                                       │   
NOUN ──┘       ├►NP ───┐                               │   
ADP ───────────┘       │                               │   
NOUN ──────────────────┼►NP ───┐                       │   
NOUN ──────────────────┘       ├►NP ───┐               │   
ADP ────────────────────►PP ───┘       │               │   
VERB ──┐                               ├────────►NP────┤   
AUX ───┴────────────────────────►VP ───┘               │   
PUNCT──────────────────────────────────────────────────┘   

Dep Tree     
──────────── 
         ┌─► 
   ┌────►└── 
   │┌──────► 
   ││   ┌──► 
   ││   │┌─► 
   ││┌─►└┴── 
┌┬─┴┴┴────── 
││  ┌──────► 
││  │    ┌─► 
││  │┌──►└── 
││  ││   ┌─► 
││  ││┌─►└── 
││  │││  ┌─► 
│└─►└┴┴──┴── 
└──────────► 
Token     
───────── 
2021      
年         
HanLPv2.1 
为         
生产        
环境        
带来        
次世代       
最         
先进的       
多         
语种        
NLP       
技术        
。         
Relation  
───────── 
nummod    
nmod:tmod 
nsubj     
case      
nmod      
obl       
root      
nmod      
advmod    
amod      
nummod    
nmod      
nmod      
obj       
punct     
Lemma     
───────── 
2021      
年         
HANlpv2.1 
为         
生产        
环境        
带来        
次世代       
最         
先进的       
多         
语种        
NLP       
技术        
。         
PoS   
───── 
NUM   
NOUN  
X     
ADP   
NOUN  
NOUN  
VERB  
NOUN  
ADV   
ADJ   
NUM   
NOUN  
X     
NOUN  
PUNCT 
Tok       
───────── 
2021      
年         
HanLPv2.1 
为         
生产        
环境        
带来        
次世代       
最         
先进的       
多         
语种        
NLP       
技术        
。         
NER Type   
────────── 
◄─┐        
◄─┴►DATE   
───►PERSON 
           
           
           
           
           
           
           
           
           
           
           
           
Tok       
───────── 
2021      
年         
HanLPv2.1 
为         
生产        
环境        
带来        
次世代       
最         
先进的       
多         
语种        
NLP       
技术        
。         
SRL PA1      
──────────── 
◄─┐          
◄─┴►ARGM-TMP 
             
             
             
             
╟──►PRED     
             
             
             
             
             
             
             
             
Tok       
───────── 
2021      
年         
HanLPv2.1 
为         
生产        
环境        
带来        
次世代       
最         
先进的       
多         
语种        
NLP       
技术        
。         
PoS      3       4       5       6       7       8 
───────────────────────────────────────────────────
NUM ───┐                                           
NOUN ──┴────────────────────────────────►NP ───┐   
X ──────────────────────────────────────►NP────┤   
ADP ───────────┐                               │   
NOUN ──┐       ├────────────────►PP ───┐       │   
NOUN ──┴►NP ───┘                       │       │   
VERB ──────────────────────────┐       ├►VP────┤   
NOUN ───────────►ADJP──┐       │       │       │   
ADV ────►ADVP──┐       │       ├►VP ───┘       ├►IP
ADJ ────►ADJP──┴►ADJP──┤       │               │   
NUM ────►QP ───┐       ├►NP ───┘               │   
NOUN ───►NP ───┴►NP────┤                       │   
X ─────┐               │                       │   
NOUN ──┴────────►NP ───┘                       │   
PUNCT──────────────────────────────────────────┘   
" ], "text/plain": [ "" ] }, "metadata": { "tags": [] }, "output_type": "display_data" } ], "source": [ "from hanlp.utils.torch_util import gpus_available\n", "if gpus_available(): # 建议在GPU上运行XLMR_BASE,否则运行mini模型\n", " mul = hanlp.load(hanlp.pretrained.mtl.UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_XLMR_BASE)\n", "else:\n", " if 'ja' in globals(): # Binder内存只有2G,释放已加载的模型\n", " del ja\n", " mul = hanlp.load(hanlp.pretrained.mtl.UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_MMINILMV2L6)\n", "mul(['In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environments.',\n", " '2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。',\n", " '2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。']).pretty_print()" ] }, { "cell_type": "markdown", "metadata": { "id": "0QV_93CjK7KD" }, "source": [ "你可以在下面输入你想执行的代码~" ] } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [], "name": "tutorial.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 1 }