{"reviews": [{"IMPACT": "3", "SUBSTANCE": "4", "APPROPRIATENESS": "5", "MEANINGFUL_COMPARISON": "2", "PRESENTATION_FORMAT": "Poster", "comments": "- Strengths:\n* Outperforms ALIGN in supervised entity linking task which suggests that the\nproposed framework improves representations of text and knowledge that are\nlearned jointly.\n* Direct comparison with closely related approach using very similar input\ndata.\n* Analysis of the smoothing parameter provides useful analysis since impact of\npopularity is a persistent issue in entity linking.\n\n- Weaknesses:\n* Comparison with ALIGN could be better. ALIGN used content window size 10 vs\nthis paper's 5, vector dimension of 500 vs this paper's 200. Also its not clear\nto me whether N(e_j) includes only entities that link to e_j. The graph is\ndirected and consists of wikipedia outlinks, but is adjacency defined as it\nwould be for an undirected graph? For ALIGN, the context of an entity is the\nset of entities that link to that entity. If N(e_j) is different, we cannot\ntell how much impact this change has on the learned vectors, and this could\ncontribute to the difference in scores on the entity similarity task. \n* It is sometimes difficult to follow whether \"mention\" means a string type, or\na particular mention in a particular document. The phrase \"mention embedding\"\nis used, but it appears that embeddings are only learned for mention senses.\n* It is difficult to determine the impact of sense disambiguation order without\ncomparison to other unsupervised entity linking methods. \n\n- General Discussion:", "SOUNDNESS_CORRECTNESS": "4", "ORIGINALITY": "3", "is_meta_review": null, "RECOMMENDATION": "3", "CLARITY": "3", "REVIEWER_CONFIDENCE": "3"}, {"IMPACT": "3", "SUBSTANCE": "4", "APPROPRIATENESS": "5", "MEANINGFUL_COMPARISON": "2", "PRESENTATION_FORMAT": "Poster", "comments": "This paper addresses the problem of disambiguating/linking textual entity\nmentions into a given background knowledge base (in this case, English\nWikipedia).  (Its title and introduction are a little overblown/misleading,\nsince there is a lot more to bridging text and knowledge than the EDL task, but\nEDL is a core part of the overall task nonetheless.)  The method is to perform\nthis bridging via an intermediate layer of representation, namely mention\nsenses, thus following two steps: (1) mention to mention sense, and (2) mention\nsense to entity.  Various embedding representations are learned for the words,\nthe mention senses, and the entities, which are then jointly trained to\nmaximize a single overall objective function that maximizes all three types of\nembedding equally.  \n\nTechnically the approach is fairly clear and conforms to the current deep\nprocessing fashion and known best practices regarding embeddings; while one can\nsuggest all kinds of alternatives, it\u2019s not clear they would make a material\ndifference.  Rather, my comments focus on the basic approach.  It is not\nexplained, however, exactly why a two-step process, involving the mention\nsenses, is better than a simple direct one-step mapping from word mentions to\ntheir entities.  (This is the approach of Yamada et al., in what is called here\nthe ALIGN algorithm.)  Table 2 shows that the two-step MPME (and even its\nsimplification SPME) do better.  By why, exactly?  What is the exact\ndifference, and additional information, that the mention senses have compare4ed\nto the entities?  To understand, please check if the following is correct (and\nperhaps update the paper to make it exactly clear what is going on).  \n\nFor entities: their profiles consist of neighboring entities in a relatedness\ngraph.                    This graph is built (I assume) by looking at word-level\nrelatedness of\nthe entity definitions (pages in Wikipedia).  The profiles are (extended\nskip-gram-based) embeddings.  \n\nFor words: their profiles are the standard distributional semantics approach,\nwithout sense disambiguation.  \n\nFor mention senses: their profiles are the standard distributional semantics\napproach, but WITH sense disambiguation.  Sense disambiguation is performed\nusing a sense-based profile (\u2018language model\u2019) from local context words and\nneighboring mentions, as mentioned briefly just before Section 4, but without\ndetails.  This is a problem point in the approach.  How exactly are the senses\ncreated and differentiated?  Who defines how many senses a mention string can\nhave?  If this is done by looking at the knowledge base, then we get a\nbijective mapping between mention senses and entities -\u2013 that is, there is\nexactly one entity for each mention sense (even if there may be more entities).\n In that case, are the sense collection\u2019s definitional profiles built\nstarting with entity text as \u2018seed words\u2019?                    If so, what\ninformation\nis used\nat the mention sense level that is NOT used at the entity level?  Just and\nexactly the words in the texts that reliably associate with the mention sense,\nbut that do NOT occur in the equivalent entity webpage in Wikipedia?  How many\nsuch words are there, on average, for a mention sense?                    That is,\nhow\npowerful/necessary is it to keep this extra differentiation information in a\nseparate space (the mention sense space) as opposed to just loading these\nadditional words into the Entity space (by adding these words into the\nWikipedia entity pages)?  \n\nIf the above understanding is essentially correct, please update Section 5 of\nthe paper to say so, for (to me) it is the main new information in the paper.  \n\nIt is not true, as the paper says in Section 6, that \u201c\u2026this is the first\nwork to deal with mention ambiguity in the integration of text and knowledge\nrepresentations, so there is no exact baselines for comparison\u201d.  The TAC KBP\nevaluations for the past two years have hosted EDL tasks, involving eight or\nnine systems, all performing exactly this task, albeit against Freebase, which\nis considerably larger and more noisy than Wikipedia.  Please see\nhttp://nlp.cs.rpi.edu/kbp/2016/ .  \n\nOn a positive note: I really liked the idea of the smoothing parameter in\nSection 6.4.2.\n\nPost-response: I have read the authors' responses.  I am not really satisfied\nwith their reply about the KBP evaluation not being relevant, but that they are\ninterested in the goodness of the embeddings instead.  In fact, the only way to\nevaluate such 'goodness' is through an application.  No-one really cares how\nconceptually elegant an embedding is, the question is: does it perform better?", "SOUNDNESS_CORRECTNESS": "4", "ORIGINALITY": "3", "is_meta_review": null, "RECOMMENDATION": "4", "CLARITY": "3", "REVIEWER_CONFIDENCE": "4"}, {"IMPACT": "3", "SUBSTANCE": "4", "APPROPRIATENESS": "4", "MEANINGFUL_COMPARISON": "2", "PRESENTATION_FORMAT": "Oral Presentation", "comments": "- Strengths:\nGood ideas, simple neural learning, interesting performance (altough not\nstriking) and finally large set of applications.\n\n- Weaknesses: amount of novel content. Clarity in some sections. \n\nThe paper presents a neural learning method for entity disambiguation and\nlinking. It introduces a good idea to integrate entity, mention and sense\nmodeling within the smame neural language modeling technique. The simple\ntraining procedure connected with the modeling allows to support a large set of\napplication.\n\nThe paper is clear formally, but the discussion is not always at the same level\nof the technical ideas.\n\nThe empirical evaluation is good although not striking improvements of the\nperformance are reported. Although it seems an extension of (Yamada et al.,\nCoNLL 2016), it adds novel ideas and it is of a releant interest.\n\nThe weaker points of the paper are:\n\n- The prose is not always clear. I found Section 3 not as clear. Some details\nof Figure 2 are not explained and the terminology is somehow redundant: for\nexample, why do you refer to the dictionary of mentions? or the dictionary of\nentity-mention pairs? are these different from text anchors and types for\nannotated text anchors?\n- Tha paper is quite close in nature to Yamada et al., 2016) and the authors\nshould at least outline the differences.\n\nOne general observation on the current version is:\nThe paper tests the Multiple Embedding model against entity\nlinking/disambiguation tasks. However, word embeddings are not only used to\nmodel such tasks, but also some processes not directly depending on entities of\nthe KB, e.g. parsing, coreference or semantic role labeling. \nThe authors should show that the word embeddings provided by the proposed MPME\nmethod are not weaker wrt to simpler wordspaces in such other semantic tasks,\ni.e. those involving directly entity mentions.\n\nI did read the author's response.", "SOUNDNESS_CORRECTNESS": "4", "ORIGINALITY": "3", "is_meta_review": null, "RECOMMENDATION": "4", "CLARITY": "3", "REVIEWER_CONFIDENCE": "4"}], "abstract": "Integrating text and knowledge into a unified semantic space has attracted significant research interests recently. However, the ambiguity in the common space remains a challenge, namely that the same mention phrase usually refers to various entities. In this paper, to deal with the ambiguity of entity mentions, we propose a novel Multi-Prototype Mention Embedding model, which learns multiple sense embeddings for each mention by jointly modeling words from textual contexts and entities derived from a knowledge base. In addition, we further design an efficient language model based approach to disambiguate each mention to a specific sense. In experiments, both qualitative and quantitative analysis demonstrate the high quality of the word, entity and multi-prototype mention embeddings. Using entity linking as a study case, we apply our disambiguation method as well as the multi-prototype mention embeddings on the benchmark dataset, and achieve state-of-the-art performance.", "histories": [], "id": "104", "title": "Bridge Text and Knowledge by Learning Multi-Prototype Entity Mention Embedding"}