University of Mannheim University of Mannheim CrowdTruth Measures for Language Ambiguity: The Case of Medical Relation Extraction CrowdTruth Measures for Language Ambiguity: The Case of Medical Relation Extraction A widespread use of linked data for information extraction is distant supervision, in which relation tuples from a data source are found in sentences in a text corpus, and those sentences are treated as training data for relation extraction systems. Distant supervision is a cheap way to acquire training data, but that data can be quite noisy, which limits the performance of a system trained with it. Human annotators can be used to clean the data, but in some domains, such as medical NLP, it is widely believed that only medical experts can do this reliably. We have been investigating the use of crowdsourcing as an affordable alternative to using experts to clean noisy data, and have found that with the proper analysis, crowds can rival and even out-perform the precision and recall of experts, at a much lower cost. We have further found that the crowd, by virtue of its diversity, can help us find evidence of ambiguous sentences that are difficult to classify, and we have hypothesized that such sentences are likely just as difficult for machines to classify. In this paper we outline CrowdTruth, a previously presented method for scoring ambiguous sentences that suggests that existing modes of truth are inadequate, and we present for the first time a set of weighted metrics for evaluating the performance of experts, the crowd, and a trained classifier in light of ambiguity. We show that our theory of truth and our metrics are a more powerful way to evaluate NLP performance over traditional unweighted metrics like precision and recall, because they allow us to account for the rather obvious fact that some sentences express the target relations more clearly than others. 2015 2015 2015-10-12T14:00:00Z ecv20nbc0jsq0uee9bnjb197mk@google.com 2015-10-12T14:30:00Z 2015-10-02T10:01:49Z Room RBC 271 OPAQUE CONFIRMED Lei Zhang, Cong Liu and Achim Rettinger. 0 2015-10-02T12:00:25Z Research | A Topic-Sensitive Model for Salient Entity Linking 2015-10-02T12:03:03Z -//Google Inc//Google Calendar 70.9054//EN GREGORIAN 2.0 PUBLISH Gentile Anna Lisa Anna Lisa Gentile Anna Lisa Gentile 2015 2015 Goodbye to True: Advancing Semantics beyond the Black and White Goodbye to True: Advancing Semantics beyond the Black and White The set-theoretic notion of truth proposed by Tarski is the basis of most work in machine semantics and probably has its roots in the work and influence of Aristotle. We take it for granted that the world can be described, not in shades of grey, but in terms of statements and propositions that are either true or false - and it seems most of western science stands on the same principle. This assumption at the core of our training as scientists should be questioned, because it stands in direct opposition to our human experience. Is there any statement that can be made that can actually be reduced to true or false? Only, it seems, in the artificial human-created realms of mathematics, games, and logic. We have been investigating a different mode of truth, inspired by results in Crowdsourcing, which allows for a highly dimension notion of semantic interpretation that makes true and false look like a childish simplifying assumption. Dumitrache Anca Anca Dumitrache Anca Dumitrache Research | Semantic Relation Composition in Large Scale Knowledge Bases Room RBC 271 0 CONFIRMED 2015-10-02T12:00:41Z 2015-10-02T10:03:18Z 6qn80t9g85uvdoooftrhjjbbv4@google.com 2015-10-12T16:00:00Z 2015-10-12T15:30:00Z OPAQUE 2015-10-02T12:03:03Z Kristian Kolthoff and Arnab Dutta. Hose Katja Katja Hose Katja Hose Mulwad Varish Varish Mulwad Varish Mulwad Dutta Arnab Arnab Dutta Arnab Dutta Keynote speaker Mendes Pablo Pablo Mendes Pablo Mendes Finin Tim Tim Finin Tim Finin Semantic annotation techniques provide the basis for linking textual content with concepts in well grounded knowledge bases. In spite of their many application areas, current semantic annotation systems have some limitations. One of the prominent limitations of such systems is that none of the existing semantic annotator systems are able to identify and disambiguate quantitative (numerical) content. In textual documents such as Web pages, specially technical contents, there are many quantitative information such as product specifications that need to be semantically qualified. In this paper, we propose an approach for annotating quantitative values in short textual content. In our approach, we identify numeric values in the text and link them to an existing property in a knowledge base. Based on this mapping, we are then able to find the concept that the property is associated with; whereby, identifying both the concept and the specific property of that concept that the numeric value belongs to. Our experiments show that our proposed approach is able to reach an accuracy of over 70% for semantically annotating quantitative content. 2015 Semantic Annotation of Quantitative Textual Content Semantic Annotation of Quantitative Textual Content 2015 University of Bari University of Bari Nuzzolese Andrea Giovanni Andrea Giovanni Nuzzolese Andrea Giovanni Nuzzolese Meusel Robert Robert Meusel Robert Meusel Heiko Paulheim Heiko Paulheim Paulheim Heiko Uren Victoria Victoria Uren Victoria Uren Chair Brewster Christopher Christopher Brewster Christopher Brewster Cuzzola John John Cuzzola John Cuzzola Davis Brian Brian Davis Brian Davis 2015-06-03T10:30:00 2015-06-03T09:30:00 Goodbye to True: Advancing Semantics beyond the Black and White Invited Talk: Chris Welty Invited Talk: Chris Welty Qiu Disheng Disheng Qiu Disheng Qiu Radford Will Will Radford Will Radford Heiko Paulheim. 0 Plenary | Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web 2015-10-02T12:03:03Z OPAQUE 2015-10-02T10:04:01Z Room RBC 271 2015-10-12T16:00:00Z dq0d4i00ndh35hp6ssijocvqrc@google.com 2015-10-02T12:00:56Z 2015-10-12T16:30:00Z CONFIRMED Knoth Petr Petr Knoth Petr Knoth Paper author Rizzo Giuseppe Giuseppe Rizzo Giuseppe Rizzo Zhang Lei Lei Zhang Lei Zhang For making the web of linked data grow, information extraction methods are a good alternative for manual dataset curation, since there is an abundance of semi-structured and unstructured information which can be harvested that way. At the same time, existing structured data sets can be used for training and evaluating such information extraction systems. In this paper, we introduce a method for creating training and test corpora from websites annotated with structured data. Using different classes in schema.org and websites annotated with Microdata, we show how training and test data can be curated at large scale and across various domains. Furthermore, we discuss how negative examples can be generated as well as open challenges and future directs for this kind of training data curation. 2015 Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web 2015 Semantic relation composition is a generalized approach for finding conjunctive relation paths in a knowledge base (KB). In semantic web, direct and inverse relationships between entities provide us with ample of explicit knowledge. But there is a plethora of implicit knowledge beyond these direct paths. Consider a knowledge graph, we can achieve deeper insights about a particular entity if we consider the information shared by its neighboring entities via its adjacent relation paths of arbitrary lengths. In this paper, we devise a technique to automatically discover semantically enriched conjunctive relations in a KB. Our approach is generalized for any KB and requires no additional parameter tuning. Particularly, we employ classical rule mining techniques to perform relation composition on knowledge graphs to learn first order rules. We evaluate our proposed methodology on two state of the art information extraction systems, DBpedia and Yago with promising results in terms of generating high precision rules. Furthermore, we make the rules publicly available for community usage. 2015 2015 Semantic Relation Composition in Large Scale Knowledge Bases Semantic Relation Composition in Large Scale Knowledge Bases Abbasi Rabeeh Ayaz Rabeeh Ayaz Abbasi Rabeeh Ayaz Abbasi 2015 2015 In recent years, the amount of entities in large knowledge bases available on the Web has been increasing rapidly. Such entities can be used to bridge textual data with knowledge bases and thus help with many tasks, such as text understanding, word sense disambiguation and information retrieval. The key issue is to link the entity mentions in documents with the corresponding entities in knowledge bases, referred to as entity linking. In addition, for many entity-centric applications, entity salience for a document has become a very important factor. This raises an impending need to identify a set of salient entities that are central to the input document. In this paper, we introduce a new task of salient entity linking and propose a graph-based disambiguation solution, which integrates several features, especially a topic-sensitive model based on Wikipedia categories. Experimental results show that our method significantly outperforms the state-of-the-art entity linking methods in terms of precision, recall and F-measure. A Topic-Sensitive Model for Salient Entity Linking A Topic-Sensitive Model for Salient Entity Linking Ghorbani Ali Ali Ghorbani Ali Ghorbani 2015-10-02T12:03:03Z 2015-10-12T19:15:00Z Goodbye to True: Advancing semantics beyond the black and white. Chris Welty, Google Research 2l7a5duv5ina9b5a8fofftgjdk@google.com Room RBC 271 OPAQUE Plenary | Goodbye to True: Advancing Semantics beyond the Black and White 2015-10-02T12:01:17Z 0 2015-10-02T10:07:36Z 2015-10-12T18:00:00Z CONFIRMED Feilmayr Christina Christina Feilmayr Christina Feilmayr CONFIRMED Anca Dumitrache, Lora Aroyo and Chris Welty. 2015-10-12T13:00:00Z OPAQUE 2015-10-02T09:46:55Z 2015-10-02T12:03:03Z Research | CrowdTruth Measures for Language Ambiguity: The Case of Medical Relation Extraction 817b4icma6mkao4j5jjf2hhu2o@google.com 2015-10-12T13:30:00Z 1 2015-10-02T11:59:57Z Room RBC 271 De Luca Ernesto William Ernesto William De Luca Ernesto William De Luca Aroyo Lora Lora Aroyo Lora Aroyo Joshi Anupam Anupam Joshi Anupam Joshi Sleeman Jennifer Jennifer Sleeman Jennifer Sleeman 2015-10-12T19:15:00Z CONFIRMED Closing remarks and best paper Award announcment Room RBC 271 2015-10-02T10:08:17Z OPAQUE 96a4bh66arosfhfed2uu76kt08@google.com 2015-10-02T12:01:23Z 2015-10-02T12:03:03Z Plenary | Closing and Awards 1 2015-10-12T19:30:00Z Bagheri Ebrahim Ebrahim Bagheri Ebrahim Bagheri Zhang Ziqi Ziqi Zhang Ziqi Zhang Topic Modeling for RDF Graphs 2015 2015 Topic models are widely used to thematically describe a collection of text documents and have become an important technique for systems that measure document similarity for classification, clustering, segmentation, entity linking and more. While they have been applied to some non-text domains, their use for semi-structured graph data, such as RDF, has been less explored. We present a framework for applying topic modeling to RDF graph data and describe how it can be used in a number of linked data tasks. Since topic modeling builds abstract topics using the co-occurrence of document terms, sparse documents can be problematic, presenting challenges for RDF data. We outline techniques to overcome this problem and the results of experiments in using them. Finally, we show preliminary results of using Latent Dirichlet Allocation generative topic modeling for several linked data use cases. Topic Modeling for RDF Graphs Tresp Volker Volker Tresp Volker Tresp Basile Pierpaolo Pierpaolo Basile Pierpaolo Basile Program committee member Nickles Matthias Matthias Nickles Matthias Nickles Ghashghaei Mehrnaz Mehrnaz Ghashghaei Mehrnaz Ghashghaei Riedl Martin Martin Riedl Martin Riedl University of Sheffield University of Sheffield Rettinger Achim Achim Rettinger Achim Rettinger Chris Welty Chris Welty Chris Welty Ponzetto Simone Paolo Simone Paolo Ponzetto Simone Paolo Ponzetto Solanki Monika Monika Solanki Monika Solanki The 3rd International Workshop on Linked Data for Information Extraction LD4IE2015 The 3rd International Workshop on Linked Data for Information Extraction The 3rd International Workshop on Linked Data for Information Extraction 2015-10-12T15:30:00 Bethlehem, Pennsylvania - USA Bethlehem, Pennsylvania - USA 2015-10-12T09:00:00 2015-10-02T12:00:16Z OPAQUE Research | Semantic Annotation of Quantitative Textual Content Mehrnaz Ghashghaei, John Cuzzola, Ebrahim Bagheri and Ali Ghorbani. Room RBC 271 2015-10-12T14:00:00Z 2015-10-12T13:30:00Z 0 ri9oe6lcndmbii7j2mehclqu2o@google.com 2015-10-02T10:01:21Z CONFIRMED 2015-10-02T12:03:03Z Pujara Jay Jay Pujara Jay Pujara Kolthoff Kristian Kristian Kolthoff Kristian Kolthoff D'Amato Claudia Claudia D'Amato Claudia D'Amato Moro Andrea Andrea Moro Andrea Moro Fernandez Miriam Miriam Fernandez Miriam Fernandez CONFIRMED 2015-10-02T10:02:43Z 2015-10-02T12:00:31Z 2015-10-12T15:30:00Z 2015-10-12T15:00:00Z rlofdvrkt3tmr48533jbsgqt48@google.com 2015-10-02T12:03:03Z 0 Jennifer Sleeman, Tim Finin and Anupam Joshi. OPAQUE Room RBC 271 Research | Topic Modeling for RDF Graphs Barnaghi Payam Payam Barnaghi Payam Barnaghi Liu Cong Cong Liu Cong Liu Knoblock Craig Craig Knoblock Craig Knoblock Buitelaar Paul Paul Buitelaar Paul Buitelaar