{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 原理" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$Score = \\sum_{n=1}^N \\{ \\sum_{all\\ w_1...w_n\\ that\\ co-occur} Info(w_1...w_n) / \\sum_{all\\ w_1...w_n\\ in\\ sys\\ output} (1) · \\exp [\\beta \\log^2(min\\{\\frac{L_{sys}}{\\overline{L_{ref}}}, 1\\})] \\} $$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$Info(w_1...w_n) = \\log_2 \\frac{the\\ \\#\\ of\\ occurrences\\ of\\ w_1...w_{n-1}}{the\\ \\#\\ of\\ occurrences\\ of\\ w_1...w_{n}}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "其中:\n", "- β 是一个常数(经验阈值),使得 Lhyp/Lref = 2/3 时,β 使得长度惩罚系数为 0.5\n", "- $\\overline{L_{ref}}$ 是参考译文的平均长度" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "关于 β 的计算如下:\n", "\n", "$e^{\\beta \\ln^2(2/3)} = 1/2$,两边同时取 ln 为底对数:\n", "\n", "$\\beta = \\frac{\\ln(1/2)}{\\ln^2(2/3)}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 参考 Code" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "hypothese1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which','ensures', 'that', 'the', 'military', 'always','obeys', 'the', 'commands', 'of', 'the', 'party']\n", "hypothese2 = ['It', 'is', 'to', 'insure', 'the', 'troops','forever', 'hearing', 'the', 'activity', 'guidebook','that', 'party', 'direct']\n", "reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that','ensures', 'that', 'the', 'military', 'will', 'forever','heed', 'Party', 'commands']\n", "reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which','guarantees', 'the', 'military', 'forces', 'always','being', 'under', 'the', 'command', 'of', 'the','Party']\n", "reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the','army', 'always', 'to', 'heed', 'the', 'directions','of', 'the', 'party']" ] }, { "cell_type": "code", "execution_count": 199, "metadata": {}, "outputs": [], "source": [ "import math\n", "import fractions\n", "from collections import Counter\n", "\n", "import nltk\n", "from nltk.util import ngrams" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# 计算 nist 分值\n", "def sentence_nist(references, hypothesis, n=5):\n", " \"\"\"\n", " Calculate NIST score from\n", " George Doddington. 2002. \"Automatic evaluation of machine translation quality\n", " using n-gram co-occurrence statistics.\" Proceedings of HLT.\n", " Morgan Kaufmann Publishers Inc. http://dl.acm.org/citation.cfm?id=1289189.1289273\n", "\n", " DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU\n", " score. The official script used by NIST to compute BLEU and NIST score is\n", " mteval-14.pl. The main differences are:\n", "\n", " - BLEU uses geometric mean of the ngram overlaps, NIST uses arithmetic mean.\n", " - NIST has a different brevity penalty\n", " - NIST score from mteval-14.pl has a self-contained tokenizer\n", "\n", " Note: The mteval-14.pl includes a smoothing function for BLEU score that is NOT\n", " used in the NIST score computation.\n", "\n", " >>> hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',\n", " ... 'ensures', 'that', 'the', 'military', 'always',\n", " ... 'obeys', 'the', 'commands', 'of', 'the', 'party']\n", "\n", " >>> hypothesis2 = ['It', 'is', 'to', 'insure', 'the', 'troops',\n", " ... 'forever', 'hearing', 'the', 'activity', 'guidebook',\n", " ... 'that', 'party', 'direct']\n", "\n", " >>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',\n", " ... 'ensures', 'that', 'the', 'military', 'will', 'forever',\n", " ... 'heed', 'Party', 'commands']\n", "\n", " >>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which',\n", " ... 'guarantees', 'the', 'military', 'forces', 'always',\n", " ... 'being', 'under', 'the', 'command', 'of', 'the',\n", " ... 'Party']\n", "\n", " >>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',\n", " ... 'army', 'always', 'to', 'heed', 'the', 'directions',\n", " ... 'of', 'the', 'party']\n", "\n", " >>> sentence_nist([reference1, reference2, reference3], hypothesis1) # doctest: +ELLIPSIS\n", " 3.3709...\n", "\n", " >>> sentence_nist([reference1, reference2, reference3], hypothesis2) # doctest: +ELLIPSIS\n", " 1.4619...\n", "\n", " :param references: reference sentences\n", " :type references: list(list(str))\n", " :param hypothesis: a hypothesis sentence\n", " :type hypothesis: list(str)\n", " :param n: highest n-gram order\n", " :type n: int\n", " \"\"\"\n", " return corpus_nist([references], [hypothesis], n)" ] }, { "cell_type": "code", "execution_count": 203, "metadata": {}, "outputs": [], "source": [ "# 计算长度惩罚因子\n", "# 经验值,使得在 hyp_len/ref_len = 2/3 时,长度惩罚因子值为 0.5\n", "def nist_length_penalty(ref_len, hyp_len):\n", " \"\"\"\n", " Calculates the NIST length penalty, from Eq. 3 in Doddington (2002)\n", "\n", " penalty = exp( beta * log( min( len(hyp)/len(ref) , 1.0 )))\n", "\n", " where,\n", "\n", " `beta` is chosen to make the brevity penalty factor = 0.5 when the\n", " no. of words in the system output (hyp) is 2/3 of the average\n", " no. of words in the reference translation (ref)\n", "\n", " The NIST penalty is different from BLEU's such that it minimize the impact\n", " of the score of small variations in the length of a translation.\n", " See Fig. 4 in Doddington (2002)\n", " \"\"\"\n", " ratio = hyp_len / ref_len\n", " if 0 < ratio < 1:\n", " ratio_x, score_x = 2/3, 1/2 # 原 ratio_x 为 1.5\n", " beta = math.log(score_x) / (math.log(ratio_x) ** 2)\n", " return math.exp(beta * math.log(ratio) ** 2)\n", " else: # ratio <= 0 or ratio >= 1\n", " return max(min(ratio, 1.0), 0.0)" ] }, { "cell_type": "code", "execution_count": 189, "metadata": {}, "outputs": [], "source": [ "def corpus_nist(list_of_references, hypotheses, n=5):\n", " \"\"\"\n", " Calculate a single corpus-level NIST score (aka. system-level BLEU) for all\n", " the hypotheses and their respective references.\n", "\n", " :param references: a corpus of lists of reference sentences, w.r.t. hypotheses\n", " :type references: list(list(list(str))), [[[], []...]]\n", " :param hypotheses: a list of hypothesis sentences\n", " :type hypotheses: list(list(str)), [[]]\n", " :param n: highest n-gram order\n", " :type n: int\n", " \"\"\"\n", " # Before proceeding to compute NIST, perform sanity checks.\n", " assert len(list_of_references) == len(\n", " hypotheses\n", " ), \"The number of hypotheses and their reference(s) should be the same\"\n", "\n", " # Collect the ngram coounts from the reference sentences.\n", " ngram_freq = Counter()\n", " total_reference_words = 0\n", " for (references) in list_of_references: # For each source sent, there's a list of reference sents.\n", " for reference in references:\n", " # For each order of ngram, count the ngram occurrences.\n", " for i in range(1, n + 1):\n", " ngram_freq.update(ngrams(reference, i))\n", " total_reference_words += len(reference)\n", "\n", " # Compute the information weights based on the reference sentences.\n", " # Eqn 2 in Doddington (2002):\n", " # Info(w_1 ... w_n) = log_2 [ (# of occurrences of w_1 ... w_n-1) / (# of occurrences of w_1 ... w_n) ]\n", " information_weights = {}\n", " for _ngram in ngram_freq: # w_1 ... w_n\n", " _mgram = _ngram[:-1] # w_1 ... w_n-1\n", " # From https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/mteval-v13a.pl#L546\n", " # it's computed as such:\n", " # denominator = ngram_freq[_mgram] if _mgram and _mgram in ngram_freq \n", " # else denominator = total_reference_words\n", " # information_weights[_ngram] = -1 * math.log(ngram_freq[_ngram]/denominator) / math.log(2)\n", " #\n", " # Mathematically, it's equivalent to the our implementation:\n", " if _mgram and _mgram in ngram_freq:\n", " numerator = ngram_freq[_mgram]\n", " else:\n", " # 赋予单个词高权重\n", " numerator = total_reference_words\n", " information_weights[_ngram] = math.log(numerator / ngram_freq[_ngram], 2)\n", " \n", " # Micro-average.\n", " # 计算 n = 「1-5」时,分子分母的值(分子、分母各 5 个值)\n", " nist_precision_numerator_per_ngram = Counter()\n", " nist_precision_denominator_per_ngram = Counter()\n", " l_ref, l_sys = 0, 0\n", " # For each order of ngram.\n", " for i in range(1, n + 1):\n", " # Iterate through each hypothesis and their corresponding references.\n", " for references, hypothesis in zip(list_of_references, hypotheses):\n", " hyp_len = len(hypothesis)\n", "\n", " # Find reference with the best NIST score.\n", " nist_score_per_ref = []\n", " for reference in references:\n", " _ref_len = len(reference)\n", " # Counter of ngrams in hypothesis.\n", " hyp_ngrams = (\n", " Counter(ngrams(hypothesis, i)) if len(hypothesis) >= i else Counter()\n", " )\n", " ref_ngrams = (\n", " Counter(ngrams(reference, i)) if len(reference) >= i else Counter()\n", " )\n", " # 取交集,取交集会取到较小者\n", " ngram_overlaps = hyp_ngrams & ref_ngrams\n", " # Precision part of the score in Eqn 3\n", " _numerator = sum(\n", " information_weights[_ngram] * count for _ngram, count in ngram_overlaps.items()\n", " )\n", " _denominator = sum(hyp_ngrams.values())\n", " _precision = 0 if _denominator == 0 else _numerator / _denominator\n", " nist_score_per_ref.append(\n", " (_precision, _numerator, _denominator, _ref_len)\n", " )\n", " # Best reference.\n", " # max(数组),从第一个元素开始比较\n", " # 优先级:比值 > 分子 > 分母 > 长度\n", " precision, numerator, denominator, ref_len = max(nist_score_per_ref)\n", " nist_precision_numerator_per_ngram[i] += numerator\n", " nist_precision_denominator_per_ngram[i] += denominator\n", " l_ref += ref_len\n", " l_sys += hyp_len\n", "\n", " # Final NIST micro-average mean aggregation.\n", " nist_precision = 0\n", " for i in nist_precision_numerator_per_ngram:\n", " precision = (\n", " nist_precision_numerator_per_ngram[i]\n", " / nist_precision_denominator_per_ngram[i]\n", " )\n", " nist_precision += precision\n", " # Eqn 3 in Doddington(2002)\n", " return nist_precision * nist_length_penalty(l_ref, l_sys)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 过程" ] }, { "cell_type": "code", "execution_count": 190, "metadata": {}, "outputs": [], "source": [ "# Reference Ngram Freq\n", "ngram_freq= Counter()\n", "total_reference_words = 0\n", "for (references) in [[reference1, reference2, reference3]]:\n", " for reference in references:\n", " for i in range(1, 5 + 1):\n", " ngram_freq.update(ngrams(reference, i))\n", " total_reference_words += len(reference)" ] }, { "cell_type": "code", "execution_count": 191, "metadata": {}, "outputs": [], "source": [ "# Information Weights\n", "information_weights = {}\n", "for _ngram in ngram_freq:\n", " _mgram = _ngram[:-1] \n", " if _mgram and _mgram in ngram_freq:\n", " numerator = ngram_freq[_mgram]\n", " else:\n", " numerator = total_reference_words\n", " information_weights[_ngram] = math.log(numerator / ngram_freq[_ngram], 2)" ] }, { "cell_type": "code", "execution_count": 192, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "3.3709935957649324" ] }, "execution_count": 192, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corpus_nist([[reference1, reference2, reference3]], [hypothese1])" ] }, { "cell_type": "code", "execution_count": 195, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.3685866013071895" ] }, "execution_count": 195, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 下面以 hypothese1 为例来说明:\n", "\n", "nist_score = (51.71/18 + 6.75/17 + 1.58/16 + 0/15 + 0/14) * nist_length_penalty(16*3+18+18, 18*5)\n", "nist_score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1-Gram\n", "\n", "| gram | hyp | ref1 | overlaps | ref2 | overlaps | ref3 | overlaps |\n", "| -------- | ------ | ------- | --------- | ------- | --------- | ------- | --------- |\n", "| It | 1 | 1 | 4.06\\*1 | 1 | 4.06\\*1 | 1 | 4.06\\*1 |\n", "| is | 1 | 1 | 4.06\\*1 | 1 | 4.06\\*1 | 1 | 4.06\\*1 |\n", "| a | 1 | 1 | 5.64\\*1 | 0 | 0 | 0 | 0 |\n", "| guide | 1 | 1 | 4.64\\*1 | 0 | 0 | 1 | 4.64\\*1 |\n", "| to | 1 | 1 | 4.64\\*1 | 0 | 0 | 1 | 4.64\\*1 |\n", "| action | 1 | 1 | 5.64\\*1 | 0 | 0 | 0 | 0 |\n", "| which | 1 | 0 | 0 | 1 | 5.64\\*1 | 0 | 0 |\n", "| ensures | 1 | 1 | 5.64\\*1 | 0 | 0 | 0 | 0 |\n", "| that | 1 | 2 | 4.64\\*1 | 0 | 0 | 0 | 0 |\n", "| the | 3 | 1 | 2.47\\*1 | 4 | 2.47\\*3 | 4 | 2.47\\*3 |\n", "| military | 1 | 1 | 4.64\\*1 | 1 | 4.64\\*1 | 0 | 0 |\n", "| always | 1 | 0 | 0 | 1 | 4.64\\*1 | 1 | 4.64\\*1 |\n", "| obeys | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| commands | 1 | 1 | 5.64\\*1 | 0 | 0 | 0 | 0 |\n", "| of | 1 | 0 | 0 | 1 | 4.64\\*1 | 1 | 4.64\\*1 |\n", "| party | 1 | 0 | 0 | 0 | 0 | 1 | 5.64\\*1 |\n", "| **SUM** | **18** | **16**(len) | **51.71** | 18(len) | 35.09 | 16(len) | 39.73 |\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2-Gram\n", "\n", "| gram | hyp | ref1 | overlaps | ref2 | overlaps | ref3 | overlaps |\n", "| --------------- | ------ | ------- | -------- | ------- | -------- | ------- | -------- |\n", "| It is | 1 | 1 | 0\\*1 | 1 | 0\\*1 | 1 | 0\\*1 |\n", "| is a | 1 | 1 | 1.58\\*1 | 0 | 0 | 0 | 0 |\n", "| a guide | 1 | 1 | 0\\*1 | 0 | 0 | 0 | 0 |\n", "| guide to | 1 | 1 | 1\\*1 | 0 | 0 | 0 | 0 |\n", "| to action | 1 | 1 | 1\\*1 | 0 | 0 | 0 | 0 |\n", "| action which | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| which ensures | 1 | 0 | 0 | 1 | 0 | 0 | 0 |\n", "| ensures that | 1 | 1 | 0\\*1 | 0 | 0 | 0 | 0 |\n", "| that the | 1 | 1 | 1\\*1 | 0 | 0 | 0 | 0 |\n", "| the military | 1 | 1 | 2.17\\*1 | 1 | 2.17\\*1 | 0 | 0 |\n", "| military always | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| always obeys | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| obeys the | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| the commands | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| commands of | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| of the | 1 | 0 | 0 | 1 | 0\\*1 | 1 | 0\\*1 |\n", "| the party | 1 | 0 | 0 | 0 | 0 | 1 | 3.17\\*1 |\n", "| **SUM** | **17** | **16**(len) | **6.75** | 18(len) | 2.17 | 16(len) | 3.17 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3-Gram\n", "\n", "| gram | hyp | ref1 | overlaps | ref2 | overlaps | ref3 | overlaps |\n", "| --------------------- | ------ | ----------- | -------- | ------- | -------- | ------- | -------- |\n", "| It is a | 1 | 1 | 1.58\\*1 | 0 | 0 | 0 | 0 |\n", "| is a guide | 1 | 1 | 0\\*1 | 0 | 0 | 0 | 0 |\n", "| a guide to | 1 | 1 | 0\\*1 | 0 | 0 | 0 | 0 |\n", "| guide to action | 1 | 1 | 0\\*1 | 0 | 0 | 0 | 0 |\n", "| to action which | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| action which ensures | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| which ensures that | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| ensures that the | 1 | 1 | 0\\*1 | 0 | 0 | 0 | 0 |\n", "| that the military | 1 | 1 | 0\\*1 | 0 | 0 | 0 | 0 |\n", "| the military always | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| military always obeys | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| always obeys the | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| obeys the commands | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| the commands of | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| commands of the | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| of the party | 1 | 0 | 0 | 0 | 0 | 1 | 1\\*1 |\n", "| **SUM** | **16** | **16**(len) | **1.58** | 18(len) | 0 | 16(len) | 1 |\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4-Gram\n", "\n", "| gram | hyp | ref1 | overlaps | ref2 | overlaps | ref3 | overlaps |\n", "| ------------------------- | ------ | ----------- | -------- | ------- | -------- | ------- | -------- |\n", "| It is a guide | 1 | 1 | 0\\*1 | 0 | 0 | 0 | 0 |\n", "| is a guide to | 1 | 1 | 0\\*1 | 0 | 0 | 0 | 0 |\n", "| a guide to action | 1 | 1 | 0\\*1 | 0 | 0 | 0 | 0 |\n", "| guide to action which | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| to action which ensures | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| action which ensures that | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| which ensures that the | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| ensures that the military | 1 | 1 | 0\\*1 | 0 | 0 | 0 | 0 |\n", "| that the military always | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| the military always obeys | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| military always obeys the | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| always obeys the commands | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| obeys the commands of | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| the commands of the | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| commands of the party | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| **SUM** | **15** | 16(len) | 0 | **18**(len) | **0** | 16(len) | 0 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5-Gram\n", "\n", "| gram | hyp | ref1 | overlaps | ref2 | overlaps | ref3 | overlaps |\n", "| ---------------------------------- | ------ | ----------- | -------- | ------- | -------- | ------- | -------- |\n", "| It is a guide to | 1 | 1 | 0\\*1 | 0 | 0 | 0 | 0 |\n", "| is a guide to action | 1 | 1 | 0\\*1 | 0 | 0 | 0 | 0 |\n", "| a guide to action which | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| guide to action which ensures | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| to action which ensures that | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| action which ensures that the | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| which ensures that the military | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| ensures that the military always | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| that the military always obeys | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| the military always obeys the | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| military always obeys the commands | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| always obeys the commands of | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| obeys the commands of the | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| the commands of the party | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| **SUM** | **14** | 16(len) | 0 | **18**(len) | **0** | 16(len) | 0 |" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.3709935957649324" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sentence_nist([reference1, reference2, reference3], hypothese1)" ] }, { "cell_type": "code", "execution_count": 202, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5045666840058485" ] }, "execution_count": 202, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# bleu\n", "nltk.translate.bleu([reference1, reference2, reference3], hypothese1)" ] }, { "cell_type": "code", "execution_count": 141, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.4619035460750132" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sentence_nist([reference1, reference2, reference3], hypothese2)" ] }, { "cell_type": "code", "execution_count": 201, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5.92086005993801e-155" ] }, "execution_count": 201, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# bleu\n", "nltk.translate.bleu([reference1, reference2, reference3], hypothese2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 小结\n", "\n", "通过详细的过程我们可以非常容易地看出与 BLEU 的不同:\n", "\n", "- 对每个 Ngram,BLEU 分别取不同 gram 下各个参考译文计数的最大值,然后与假设译文该 gram 的值对比取小的;而 NIST 会对译文的每个 gram 乘以一个权重,该权重来自所有译文,使用 Info 公式计算\n", "- 对每个 Ngram,BLEU 计算 P 值,使用上一步取出的每个 gram 的较小值求和除以假设译文各 gram 计数和;而 NIST 将乘以权重后的 gram 计数求和除以 gram 长度,然后取所有参考译文中最大的\n", "- 对每个 Ngram,BLEU 使用加权求和,NIST 直接求和\n", "- 惩罚因子除计算方法不同外,BLEU 只考虑与假设译文最接近的参考译文的长度,而 NIST 则考虑了每个 Ngram" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 参考\n", "\n", "- [NIST](http://localhost:8888/notebooks/Yam/All4NLP/BLEU/NIST.ipynb)\n", "- [Automatic evaluation of machine translation quality using n-gram co-occurrence statistics](http://www.mt-archive.info/HLT-2002-Doddington.pdf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">吐槽一下,公式真的很难看懂……" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }