We're going to replicate the benchmark in [A Named Entity Based Approach to Model Recipes](https://arxiv.org/abs/2004.12184), by Diwan, Batra, and Bagler using StanfordNLP, and check it using [seqeval](https://github.com/chakki-works/seqeval).

Evaluating NER is surprisingly tricky, as [David Batista explains](https://www.davidsbatista.net/blog/2018/05/09/Named_Entity_Evaluation/), and I want to check that the results in the paper are the same as what seqeval gives, so I can compare it to other models.

The authors share their data in an [associated git repository](https://github.com/cosylabiiit/recipe-knowledge-mining) and train a model using [Stanford NER](https://nlp.stanford.edu/software/CRF-NER.html), which is open source, so we have a chance of replicating the results.

# Installing Stanford NLP

We're going to install Stanford NLP which is a Java library.
To make things easier we will use [stanza](https://stanfordnlp.github.io/stanza/) which includes tools for [installing and invoking Stanford NLP](https://stanfordnlp.github.io/stanza/corenlp_client.html).

In [1]:
    !pip install stanza

Collecting stanza
  Downloading stanza-1.3.0-py3-none-any.whl (432 kB)
     |████████████████████████████████| 432 kB 292 kB/s            
Installing collected packages: stanza
Successfully installed stanza-1.3.0


We can specify where to install Core NLP, but we will us the default, which is either "\\$CORE_NLP_HOME", or "\\$HOME/stanza_corenlp". (Ideally we'd use stanza to get this, but I couldn't easy work out how.)

In [2]:
import stanza
stanza.install_corenlp()

Downloading https://huggingface.co/stanfordnlp/CoreNLP/resolve/main/stanford-corenlp-latest.zip:   0%|        …

We'll need to invoke the Stanford Core NLP JAR that we just installed, so let's find it.

In [3]:
import os
import re
from pathlib import Path


# Reimplement the logic to find the path where stanza_corenlp is installed.
core_nlp_path = os.getenv('CORENLP_HOME', str(Path.home() / 'stanza_corenlp'))

# A heuristic to find the right jar file
classpath = [str(p) for p in Path(core_nlp_path).iterdir() if re.match(r"stanford-corenlp-[0-9.]+\.jar", p.name)][0]
classpath

'/root/stanza_corenlp/stanford-corenlp-4.4.0.jar'

Let's test the [basic usage](https://stanfordnlp.github.io/stanza/client_usage.html).

There are currently models for 8 languages, and for some fairly complex tasks like coreference resolution.

In [4]:
from stanza.server import CoreNLPClient

text = "David Batista wrote a blog post on NER evaluation. " \
       "Hiroki Nakayama wrote seqeval to evaluate sequential labelling tasks, such as NER. " \
       "We will test his library against Stanford Core NLP. "

with CoreNLPClient(
     annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'],
        timeout=30000,
        memory='6G') as client:
    
    ann =  client.annotate(text)

[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - Server default properties:
			(Note: unspecified annotator properties are English defaults)
			annotators = tokenize,ssplit,pos,lemma,ner,parse,depparse,coref
			inputFormat = text
			outputFormat = serialized
			prettyPrint = false
			threads = 5
[main] INFO CoreNLP - Threads: 5
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [1.1 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Load

David Batista wrote a blog post on NER evaluation. Hiroki Nakayama wrote seqeval to evaluate sequential labelling tasks, such as NER. We will test his library against Stanford Core NLP. 


[Thread-0] INFO CoreNLP - CoreNLP Server is shutting down.


We get 3 sentences out.

In [5]:
for sentence in ann.sentence:
    print(" ".join([token.word for token in sentence.token]))

David Batista wrote a blog post on NER evaluation .
Hiroki Nakayama wrote seqeval to evaluate sequential labelling tasks , such as NER .
We will test his library against Stanford Core NLP .


It can even do clever things like coreference resolution; resolving that "his library" refers to "Hiroki Nakayama's library".

In [6]:
for chain in ann.corefChain:
    print([ann.mentionsForCoref[mention.mentionID].headString for mention in chain.mention])

['nakayama', 'his']


We can extract things such as lemmas, parts of speech and standard NER tags.

But we want to train our own NER model to detect ingredients. First we will need to collect the data.

In [7]:
import pandas as pd

tokens = ann.sentence[1].token

pd.DataFrame({'word': [s.word for s in tokens],
              'lemma': [s.lemma for s in tokens],
              'pos': [s.pos for s in tokens],
              'ner': [s.ner for s in tokens]}).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
word,Hiroki,Nakayama,wrote,seqeval,to,evaluate,sequential,labelling,tasks,",",such,as,NER,.
lemma,Hiroki,Nakayama,write,seqeval,to,evaluate,sequential,labelling,task,",",such,as,ner,.
pos,NNP,NNP,VBD,NN,TO,VB,JJ,NN,NNS,",",JJ,IN,NN,.
ner,PERSON,PERSON,O,O,O,O,O,O,O,O,O,O,O,O


# Get Data

Helpfully the authors provide the annotated ingredients data in the format for Stanford NER that we can download [from github](https://github.com/cosylabiiit/recipe-knowledge-mining).

There are two sources of ingredients, `ar` is AllRecipes and `gk` is  FOOD.com (formerly GeniusKitchen.com).

In [8]:
from urllib.request import urlretrieve

data_sources = ['ar', 'gk']
data_splits = ['train', 'test']

base_url = 'https://raw.githubusercontent.com/cosylabiiit/recipe-knowledge-mining/master/'

def data_filename(source, split):
    return f'{source}_{split}.tsv'

for source in data_sources:
    for split in data_splits:
        name = data_filename(source, split)
        urlretrieve(base_url + name, name)

Each line of the file is either a single tab (separating different texts), or a token followed by a tab and then the entity type.

So for example the first ingredient is `4 cloves garlic`, which is a quantity (4) followed by a unit (cloves) and a name (garlic).

In [9]:
!head {data_filename('ar', 'train')} | cat -t

^I
4^IQUANTITY
cloves^IUNIT
garlic^INAME
^I
2^IQUANTITY
tablespoons^IUNIT
vegetable^INAME
oil^INAME
,^IO


We can read this in to Python, converting it to a list of annotated sentences, which is just a sequence of token, label pairs.

In [10]:
from typing import List, Tuple, Generator

Annotation = Tuple[str, str]
AnnotatedSentence = List[Annotation]

def segment_texts(data: str) -> Generator[AnnotatedSentence, None, None]:
    output = []
    for line in data.split('\n'):
        if line.strip():
            text, token = line.split('\t')
            output.append((text.strip(), token.strip()))
        elif output:
            yield output
            output = []
            
def segment_file(filename: str) -> List[AnnotatedSentence]:
    with open(filename, 'rt') as f:
        return list(segment_texts(f.read()))

In [11]:
ar_train = segment_file(data_filename('ar', 'train'))

In [12]:
ar_train[:2]

[[('4', 'QUANTITY'), ('cloves', 'UNIT'), ('garlic', 'NAME')],
 [('2', 'QUANTITY'),
  ('tablespoons', 'UNIT'),
  ('vegetable', 'NAME'),
  ('oil', 'NAME'),
  (',', 'O'),
  ('divided', 'STATE')]]

We can then calculate the number of sentences in the training set for a source.

In [13]:
len(ar_train)

1470

We can use this to check the types of entities annotated, as in the paper (DF is Dried/Fresh).

In [14]:
from collections import Counter

tag_counts = Counter([annotation[1] for sentence in ar_train for annotation in sentence])
tag_counts

Counter({'QUANTITY': 1583,
         'UNIT': 1338,
         'NAME': 2501,
         'O': 1662,
         'STATE': 879,
         'DF': 154,
         'SIZE': 64,
         'TEMP': 31})

# Train NER Model

Now we want to train a Stanford NER model on the new annotations.

First we have to configure it; but there's no information on the paper on how it's configured.
I've copied this template configuration out of the [FAQ](https://nlp.stanford.edu/software/crf-faq.html)
For more information on the parameters you can check the [NERFeatureFactory documentation](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/NERFeatureFactory.html) or the [source](https://github.com/stanfordnlp/CoreNLP/blob/main/src/edu/stanford/nlp/ie/NERFeatureFactory.java).

In [15]:
def ner_prop_str(train_files: List[str], test_files: List[str], output: str) -> str:
    """Returns configuration string to train NER model"""
    train_file_str = ','.join(train_files)
    test_file_str = ','.join(test_files)
    return f"""
trainFileList = {train_file_str}
testFiles = {test_file_str}
serializeTo = {output}
map = word=0,answer=1

useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
useDisjunctive=true
"""

This is expected to be a file, so let's write a helper that writes it to a file. (An alternative would be to pass these as arguments to the trainer).

In [16]:
def write_ner_prop_file(ner_prop_file: str, train_files: List[str], test_files: List[str], output_file: str) -> None:
    with open(ner_prop_file, 'wt') as f:
        props = ner_prop_str(train_files, test_files, output_file)
        f.write(props)

Stanza doesn't give an interface to train a CRF NER model using Stanford NLP, but we can invoke `edu.stanford.nlp.ie.crf.CRFClassifier` directly.

Let's write a properties file and invoke Java to run the classifier.
It prints a lot of training information, and importantly a summary report at the end which we want to see.

In [17]:
import subprocess
from typing import List

def train_model(model_name, train_files: List[str], test_files: List[str], print_report=True, classpath=classpath) -> str:
    """Trains CRF NER Model using StanfordNLP"""
    model_file = f'{model_name}.model.ser.gz'
    ner_prop_filename = f'{model_name}.model.props'
    write_ner_prop_file(ner_prop_filename, train_files, test_files, model_file)
        
    result = subprocess.run(
                ['java',
                 '-Xmx2g',
                 '-cp', classpath,
                 'edu.stanford.nlp.ie.crf.CRFClassifier',
                 '-prop', ner_prop_filename],
                capture_output=True)
    
    # If there's an error with invocation better log the stacktrace
    if result.returncode != 0:
        print(result.stderr.decode('utf-8'))
    result.check_returncode()
    
    if print_report:
        print(*result.stderr.decode('utf-8').split('\n')[-11:], sep='\n')
        
    return model_file

We can train models on each dataset separately, and all together.
For evaluation we'll use the corresponding test set.

This only takes a few minutes.

In [18]:
%%time

models = {}
for source in ['ar', 'gk', 'ar_gk']:
    print(source)
    train_files = [data_filename(s, 'train') for s in source.split('_')]
    test_files = [data_filename(s, 'test') for s in source.split('_')]
    models[source] = train_model(source, train_files, test_files)
    print()

ar
CRFClassifier tagged 2788 words in 483 documents at 7185.57 words per second.
         Entity	P	R	F1	TP	FP	FN
             DF	1.0000	0.9608	0.9800	49	0	2
           NAME	0.9297	0.9279	0.9288	463	35	36
       QUANTITY	1.0000	0.9962	0.9981	522	0	2
           SIZE	1.0000	1.0000	1.0000	20	0	0
          STATE	0.9601	0.9633	0.9617	289	12	11
           TEMP	0.8750	0.7000	0.7778	7	1	3
           UNIT	0.9819	0.9841	0.9830	434	8	7
         Totals	0.9696	0.9669	0.9682	1784	56	61


gk
CRFClassifier tagged 9886 words in 1705 documents at 11727.16 words per second.
         Entity	P	R	F1	TP	FP	FN
             DF	0.9718	0.9517	0.9617	138	4	7
           NAME	0.9132	0.9021	0.9076	1621	154	176
       QUANTITY	0.9882	0.9870	0.9876	1598	19	21
           SIZE	0.9750	0.9398	0.9571	78	2	5
          STATE	0.9255	0.9503	0.9377	708	57	37
           TEMP	0.8125	0.8125	0.8125	26	6	6
           UNIT	0.9810	0.9721	0.9766	1291	25	37
         Totals	0.9534	0.9497	0.9516	5460	267	289


ar_gk
CRFClassifier tagged 12

The summary report shows for each model and entity type:

* True Positives (TP): The number of times that entity was predicted correctly
* False Positives (FP): The number of times that entity in the text but not predicted correctly
* False Negative (FN): The number of times that entity was not in the text and predicted
* Precision (P): Probability a predicted entity is correct, TP/(TP+FP)
* Recall (R): Probability a correct entity is predicted, TP/(TP+FN)
* F1 Score (F1): Harmonic mean of precision and recall, 2/(1/P + 1/R).

We can compare the F1 Totals to the diagonal of Table IV in the paper

* AllRecipes.com (ar): We get 0.9682, they report 0.9682
* FOOD.com (gk): We get 0.9516, they report 0.9519
* Both (ar_gk): We get 0.9551, they report 0.9611

These are super close.
The furthest is `ar_gk` and in the repository they have a separate `ar_gk_train.tsv`; it would be interesting to check whether using it directly gives a closer result and why there is a difference.

# Running the model in Python

We can now use these trained models in Python by invoking Stanford NLP with Stanza.

First we'll load in the test data.

In [19]:
test_data = {}

for source in data_sources:
    test_data[source] = segment_file(data_filename(source, 'test'))
    print(source, len(test_data[source]))

ar 483
gk 1705


We can call StanfordNLP with our custom model by passing the property `ner.model`.

Our test data is already tokenized in a different way to StanfordNLP, so we'll add an option to the [Tokenizer](https://stanfordnlp.github.io/CoreNLP/tokenize.html) to use whitespace tokenization which is easy to invert.

It takes a while to start up the server so we want to annotate a large number of texts at once.

In [20]:
from tqdm.notebook import tqdm
from stanza.server import CoreNLPClient

def annotate_ner(ner_model_file: str, texts: List[str], tokenize_whitespace: bool = True):
    properties = {"ner.model": ner_model_file, "tokenize.whitespace": tokenize_whitespace, "ner.applyNumericClassifiers": False}
    
    annotated = []
    with CoreNLPClient(
         annotators=['tokenize','ssplit','ner'],
         properties=properties,
         timeout=30000,
         be_quiet=True,
        memory='6G') as client:
    
        for text in tqdm(texts):
            annotated.append(client.annotate(text))
    return annotated

We can then get the annotations

In [21]:
annotations = annotate_ner(models['ar'],
                           ['1 cup of frozen peas',
                            'A dash of salt . Or to taste',
                           '12 slices pancetta -LRB- Italian unsmoked cured bacon -RRB-',
                           'pumpkin sliced into 3 cm moons'])

  0%|          | 0/4 [00:00<?, ?it/s]

Note here that the word "Italian" has ner "NATIONALITY", which comes from another model (it wasn't in the training set!).

We want to use the `coarseNER`.

In [22]:
annotations[2].sentence[0].token[4]

word: "Italian"
pos: "JJ"
value: "Italian"
originalText: "Italian"
ner: "NATIONALITY"
lemma: "italian"
beginChar: 25
endChar: 32
tokenBeginIndex: 4
tokenEndIndex: 5
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "NATIONALITY"
entityMentionIndex: 3
nerLabelProbs: "O=0.870902471545891"

When I didn't set `"ner.applyNumericClassifiers": False` this would come up as a `NUMBER`.

In [23]:
annotations[3].sentence[0].token[3]

word: "3"
pos: "CD"
value: "3"
originalText: "3"
ner: "O"
lemma: "3"
beginChar: 20
endChar: 21
tokenBeginIndex: 3
tokenEndIndex: 4
hasXmlContext: false
isNewline: false
coarseNER: "O"
fineGrainedNER: "O"
nerLabelProbs: "O=0.8599887537555505"

We can then flatten the sentences and extract the NER tokens

In [24]:
from dataclasses import dataclass, asdict

@dataclass
class NERData:
    ner: List[str]
    tokens: List[str]
        
    # Let's use Pandas to make it pretty in a notebook
    def _repr_html_(self):
        return pd.DataFrame(asdict(self)).T._repr_html_()

def extract_ner_data(annotation) -> NERData:
    tokens = [token for sentence in annotation.sentence for token in sentence.token]
    return NERData(tokens=[t.word for t in tokens], ner=[t.coarseNER for t in tokens])

A relatively simple ingredient works well

In [25]:
extract_ner_data(annotations[0])

Unnamed: 0,0,1,2,3,4
ner,QUANTITY,UNIT,O,TEMP,NAME
tokens,1,cup,of,frozen,peas


A more complex sentence does quite badly, perhaps because this kind of thing wasn't seen.

In [26]:
extract_ner_data(annotations[1])

Unnamed: 0,0,1,2,3,4,5,6,7
ner,QUANTITY,UNIT,NAME,NAME,NAME,NAME,O,O
tokens,A,dash,of,salt,.,Or,to,taste


In [27]:
extract_ner_data(annotations[2])

Unnamed: 0,0,1,2,3,4,5,6,7,8
ner,QUANTITY,UNIT,NAME,O,O,O,O,O,O
tokens,12,slices,pancetta,-LRB-,Italian,unsmoked,cured,bacon,-RRB-


We can chain these functions together to get from text to NER

In [28]:
from typing import Dict

def ner_extract(ner_model_file: str, texts: List[str], tokenize_whitespace: bool = True) -> List[Dict[str, List[str]]]:
    annotations = annotate_ner(ner_model_file, texts, tokenize_whitespace)
    return [extract_ner_data(ann) for ann in annotations]

And then for each model, and test data we can calculate the predictions.

In [29]:
preds = {}
for model, modelfile in models.items():
    preds[model] = {}
    for test_source, token_data in test_data.items():
        texts = [' '.join([x[0] for x in text]) for text in token_data]
        preds[model][test_source] = ner_extract(modelfile, texts)

  0%|          | 0/483 [00:00<?, ?it/s]

  0%|          | 0/1705 [00:00<?, ?it/s]

  0%|          | 0/483 [00:00<?, ?it/s]

  0%|          | 0/1705 [00:00<?, ?it/s]

  0%|          | 0/483 [00:00<?, ?it/s]

  0%|          | 0/1705 [00:00<?, ?it/s]

## Sanity checks

Let's check the same tokens come through the model as were input

In [30]:
for test_source, token_data in test_data.items():
    tokens = [[x[0] for x in tokens] for tokens in token_data]
    
    for model in models:
        model_preds = preds[model][test_source]
        
        model_tokens = [p.tokens for p in model_preds]
        
        if tokens != model_tokens:
            raise ValueError("Tokenization issue in %s with model %s" % (test_source, model))

# Evaluating

Now that we have predictions we can evaulate with [seqeval](https://github.com/chakki-works/seqeval).

In [31]:
!pip install seqeval

Collecting seqeval
  Downloading seqeval-1.2.2.tar.gz (43 kB)
     |████████████████████████████████| 43 kB 102 kB/s            
[?25h  Preparing metadata (setup.py) ... [?25l- done
Building wheels for collected packages: seqeval
  Building wheel for seqeval (setup.py) ... [?25l- \ | done
[?25h  Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16181 sha256=117220ab957b2dfbf6fad8b7cf7fb429b409f1fb1b62fef7ea14d20e38b36203
  Stored in directory: /root/.cache/pip/wheels/05/96/ee/7cac4e74f3b19e3158dce26a20a1c86b3533c43ec72a549fd7
Successfully built seqeval
Installing collected packages: seqeval
Successfully installed seqeval-1.2.2


Seqeval expects the data to be in one of the following formats:

* IOB1
* IOB2
* IOE1
* IOE2
* IOBES(only in strict mode)
* BILOU(only in strict mode)

These all become important when trying to distinguish distinct entities that are adjacent; these are quite rare in practice.
See Wikipedia for a detailed explanation of [IOB (inside-outside-beginning)](https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)).

In this case it's assumed there's only one entity of each type (which can be wrong when multiple names are listing in a single ingredient).
We can easily convert it to IOB1 using this assumption by prefixing every tag other than 'O' with an 'I-'.

In [32]:
def convert_to_iob1(tokens):
    return ['I-' + label if label != 'O' else 'O' for label in tokens]

assert convert_to_iob1(['QUANTITY', 'SIZE', 'NAME', 'NAME', 'O', 'STATE']) == ['I-QUANTITY', 'I-SIZE', 'I-NAME', 'I-NAME', 'O', 'I-STATE']

Let's check the classification report for a single example and compare it to the report from StanfordNER.

The classification report doesn't have the TP, TN and FN, but instead has the support - the number of true entities in the data.
The set of data is equivalent:

* support = TP + FN
* TP = R * support
* FP = TP (1/P - 1)
* FN = support - TP

The results are the same.

In [33]:
from seqeval.metrics import classification_report

test_source = 'ar'
model = 'ar'

actual_ner = [convert_to_iob1([x[1] for x in ann]) for ann in test_data[test_source]]
pred_ner = [convert_to_iob1(p.ner) for p in preds[model][test_source]]

print(classification_report(actual_ner, pred_ner, digits=4))

              precision    recall  f1-score   support

          DF     1.0000    0.9608    0.9800        51
        NAME     0.9297    0.9279    0.9288       499
    QUANTITY     1.0000    0.9962    0.9981       524
        SIZE     1.0000    1.0000    1.0000        20
       STATE     0.9601    0.9633    0.9617       300
        TEMP     0.8750    0.7000    0.7778        10
        UNIT     0.9819    0.9841    0.9830       441

   micro avg     0.9696    0.9669    0.9682      1845
   macro avg     0.9638    0.9332    0.9471      1845
weighted avg     0.9695    0.9669    0.9682      1845



We can get the micro f1-score directly.

In [34]:
from seqeval.metrics import f1_score
'%0.4f' % f1_score(actual_ner, pred_ner)

'0.9682'

We can then try to reproduce Table IV by computing the f1-score for each model and data.

In [35]:
scores = {model: {} for model in models}
for test_source, data in test_data.items():
    actual_ner = [convert_to_iob1([x[1] for x in ann]) for ann in data]
    for model in models:
        pred_ner = [convert_to_iob1(p.ner) for p in preds[model][test_source]]
        scores[model][test_source] = f1_score(actual_ner, pred_ner)

We also need to calculate the scores on the combined test set, by contatenating them

In [36]:
actual_ner = [convert_to_iob1([x[1] for x in ann]) for data in test_data.values() for ann in data]
for model in models:
    pred_ner = [convert_to_iob1(p.ner) for test_source in test_data for p in preds[model][test_source]]
    scores[model]['combined'] = f1_score(actual_ner, pred_ner)

In [37]:
pd.DataFrame(scores).style.format('{:0.4f}')

Unnamed: 0,ar,gk,ar_gk
ar,0.9682,0.9331,0.9704
gk,0.8666,0.9511,0.9499
combined,0.8911,0.9469,0.9549


The results are *slightly* different to those in the paper, but all agree within 0.01 for each row.

So we've successfully reproduced the results in the paper, and shown the evaulation from Stanford NER toolkit is very close to that of seqeval (if you work around hallucinated entities).

In [38]:
reported_scores = pd.DataFrame([[0.9682, 0.9317, 0.9709],
              [0.8672, 0.9519, 0.9498],
              [0.8972, 0.9472, 0.9611]],
             columns = ['AllRecipes', 'FOOD.com', 'BOTH'],
             index = ['AllRecipes', 'FOOD.com', 'BOTH'])
reported_scores

Unnamed: 0,AllRecipes,FOOD.com,BOTH
AllRecipes,0.9682,0.9317,0.9709
FOOD.com,0.8672,0.9519,0.9498
BOTH,0.8972,0.9472,0.9611
