University of Mannheim
University of Mannheim
CrowdTruth Measures for Language Ambiguity: The Case of Medical Relation Extraction
CrowdTruth Measures for Language Ambiguity: The Case of Medical Relation Extraction
A widespread use of linked data for information extraction is distant supervision, in which relation tuples from a data source are found in sentences in a text corpus, and those sentences are treated as training data for relation extraction systems. Distant supervision is a cheap way to acquire training data, but that data can be quite noisy, which limits the performance of a system trained with it. Human annotators can be used to clean the data, but in some domains, such as medical NLP, it is widely believed that only medical experts can do this reliably. We have been investigating the use of crowdsourcing as an affordable alternative to using experts to clean noisy data, and have found that with the proper analysis, crowds can rival and even out-perform the precision and recall of experts, at a much lower cost. We have further found that the crowd, by virtue of its diversity, can help us find evidence of ambiguous sentences that are difficult to classify, and we have hypothesized that such sentences are likely just as difficult for machines to classify. In this paper we outline CrowdTruth, a previously presented method for scoring ambiguous sentences that suggests that existing modes of truth are inadequate, and we present for the first time a set of weighted metrics for evaluating the performance of experts, the crowd, and a trained classifier in light of ambiguity. We show that our theory of truth and our metrics are a more powerful way to evaluate NLP performance over traditional unweighted metrics like precision and recall, because they allow us to account for the rather obvious fact that some sentences express the target relations more clearly than others.
2015
2015
2015-10-12T14:00:00Z
ecv20nbc0jsq0uee9bnjb197mk@google.com
2015-10-12T14:30:00Z
2015-10-02T10:01:49Z
Room RBC 271
OPAQUE
CONFIRMED
Lei Zhang, Cong Liu and Achim Rettinger.
0
2015-10-02T12:00:25Z
Research | A Topic-Sensitive Model for Salient Entity Linking
2015-10-02T12:03:03Z
-//Google Inc//Google Calendar 70.9054//EN
GREGORIAN
2.0
PUBLISH
Gentile
Anna Lisa
Anna Lisa Gentile
Anna Lisa Gentile
2015
2015
Goodbye to True: Advancing Semantics beyond the Black and White
Goodbye to True: Advancing Semantics beyond the Black and White
The set-theoretic notion of truth proposed by Tarski is the basis of most work in machine semantics and probably has its roots in the work and influence of Aristotle. We take it for granted that the world can be described, not in shades of grey, but in terms of statements and propositions that are either true or false - and it seems most of western science stands on the same principle. This assumption at the core of our training as scientists should be questioned, because it stands in direct opposition to our human experience. Is there any statement that can be made that can actually be reduced to true or false? Only, it seems, in the artificial human-created realms of mathematics, games, and logic. We have been investigating a different mode of truth, inspired by results in Crowdsourcing, which allows for a highly dimension notion of semantic interpretation that makes true and false look like a childish simplifying assumption.
Dumitrache
Anca
Anca Dumitrache
Anca Dumitrache
Research | Semantic Relation Composition in Large Scale Knowledge Bases
Room RBC 271
0
CONFIRMED
2015-10-02T12:00:41Z
2015-10-02T10:03:18Z
6qn80t9g85uvdoooftrhjjbbv4@google.com
2015-10-12T16:00:00Z
2015-10-12T15:30:00Z
OPAQUE
2015-10-02T12:03:03Z
Kristian Kolthoff and Arnab Dutta.
Hose
Katja
Katja Hose
Katja Hose
Mulwad
Varish
Varish Mulwad
Varish Mulwad
Dutta
Arnab
Arnab Dutta
Arnab Dutta
Keynote speaker
Mendes
Pablo
Pablo Mendes
Pablo Mendes
Finin
Tim
Tim Finin
Tim Finin
Semantic annotation techniques provide the basis for linking textual content with concepts in well grounded knowledge bases. In spite of their many application areas, current semantic annotation systems have some limitations. One of the prominent limitations of such systems is that none of the existing semantic annotator systems are able to identify and disambiguate quantitative (numerical) content. In textual documents such as Web pages, specially technical contents, there are many quantitative information such as product specifications that need to be semantically qualified. In this paper, we propose an approach for annotating quantitative values in short textual content. In our approach, we identify numeric values in the text and link them to an existing property in a knowledge base. Based on this mapping, we are then able to find the concept that the property is associated with; whereby, identifying both the concept and the specific property of that concept that the numeric value belongs to. Our experiments show that our proposed approach is able to reach an accuracy of over 70% for semantically annotating quantitative content.
2015
Semantic Annotation of Quantitative Textual Content
Semantic Annotation of Quantitative Textual Content
2015
University of Bari
University of Bari
Nuzzolese
Andrea Giovanni
Andrea Giovanni Nuzzolese
Andrea Giovanni Nuzzolese
Meusel
Robert
Robert Meusel
Robert Meusel
Heiko Paulheim
Heiko Paulheim
Paulheim
Heiko
Uren
Victoria
Victoria Uren
Victoria Uren
Chair
Brewster
Christopher
Christopher Brewster
Christopher Brewster
Cuzzola
John
John Cuzzola
John Cuzzola
Davis
Brian
Brian Davis
Brian Davis
2015-06-03T10:30:00
2015-06-03T09:30:00
Goodbye to True: Advancing Semantics beyond the Black and White
Invited Talk: Chris Welty
Invited Talk: Chris Welty
Qiu
Disheng
Disheng Qiu
Disheng Qiu
Radford
Will
Will Radford
Will Radford
Heiko Paulheim.
0
Plenary | Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web
2015-10-02T12:03:03Z
OPAQUE
2015-10-02T10:04:01Z
Room RBC 271
2015-10-12T16:00:00Z
dq0d4i00ndh35hp6ssijocvqrc@google.com
2015-10-02T12:00:56Z
2015-10-12T16:30:00Z
CONFIRMED
Knoth
Petr
Petr Knoth
Petr Knoth
Paper author
Rizzo
Giuseppe
Giuseppe Rizzo
Giuseppe Rizzo
Zhang
Lei
Lei Zhang
Lei Zhang
For making the web of linked data grow, information extraction methods are a good alternative for manual dataset curation, since there is an abundance of semi-structured and unstructured information which can be harvested that way. At the same time, existing structured data sets can be used for training and evaluating such information extraction systems. In this paper, we introduce a method for creating training and test corpora from websites annotated with structured data. Using different classes in schema.org and websites annotated with Microdata, we show how training and test data can be curated at large scale and across various domains. Furthermore, we discuss how negative examples can be generated as well as open challenges and future directs for this kind of training data curation.
2015
Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web
Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web
2015
Semantic relation composition is a generalized approach for finding conjunctive relation paths in a knowledge base (KB). In semantic web, direct and inverse relationships between entities provide us with ample of explicit knowledge. But there is a plethora of implicit knowledge beyond these direct paths. Consider a knowledge graph, we can achieve deeper insights about a particular entity if we consider the information shared by its neighboring entities via its adjacent relation paths of arbitrary lengths. In this paper, we devise a technique to automatically discover semantically enriched conjunctive relations in a KB. Our approach is generalized for any KB and requires no additional parameter tuning. Particularly, we employ classical rule mining techniques to perform relation composition on knowledge graphs to learn first order rules. We evaluate our proposed methodology on two state of the art information extraction systems, DBpedia and Yago with promising results in terms of generating high precision rules. Furthermore, we make the rules publicly available for community usage.
2015
2015
Semantic Relation Composition in Large Scale Knowledge Bases
Semantic Relation Composition in Large Scale Knowledge Bases
Abbasi
Rabeeh Ayaz
Rabeeh Ayaz Abbasi
Rabeeh Ayaz Abbasi
2015
2015
In recent years, the amount of entities in large knowledge bases available on the Web has been increasing rapidly. Such entities can be used to bridge textual data with knowledge bases and thus help with many tasks, such as text understanding, word sense disambiguation and information retrieval. The key issue is to link the entity mentions in documents with the corresponding entities in knowledge bases, referred to as entity linking. In addition, for many entity-centric applications, entity salience for a document has become a very important factor. This raises an impending need to identify a set of salient entities that are central to the input document. In this paper, we introduce a new task of salient entity linking and propose a graph-based disambiguation solution, which integrates several features, especially a topic-sensitive model based on Wikipedia categories. Experimental results show that our method significantly outperforms the state-of-the-art entity linking methods in terms of precision, recall and F-measure.
A Topic-Sensitive Model for Salient Entity Linking
A Topic-Sensitive Model for Salient Entity Linking
Ghorbani
Ali
Ali Ghorbani
Ali Ghorbani
2015-10-02T12:03:03Z
2015-10-12T19:15:00Z
Goodbye to True: Advancing semantics beyond the black and white.
Chris Welty, Google Research
2l7a5duv5ina9b5a8fofftgjdk@google.com
Room RBC 271
OPAQUE
Plenary | Goodbye to True: Advancing Semantics beyond the Black and White
2015-10-02T12:01:17Z
0
2015-10-02T10:07:36Z
2015-10-12T18:00:00Z
CONFIRMED
Feilmayr
Christina
Christina Feilmayr
Christina Feilmayr
CONFIRMED
Anca Dumitrache, Lora Aroyo and Chris Welty.
2015-10-12T13:00:00Z
OPAQUE
2015-10-02T09:46:55Z
2015-10-02T12:03:03Z
Research | CrowdTruth Measures for Language Ambiguity: The Case of Medical Relation Extraction
817b4icma6mkao4j5jjf2hhu2o@google.com
2015-10-12T13:30:00Z
1
2015-10-02T11:59:57Z
Room RBC 271
De Luca
Ernesto William
Ernesto William De Luca
Ernesto William De Luca
Aroyo
Lora
Lora Aroyo
Lora Aroyo
Joshi
Anupam
Anupam Joshi
Anupam Joshi
Sleeman
Jennifer
Jennifer Sleeman
Jennifer Sleeman
2015-10-12T19:15:00Z
CONFIRMED
Closing remarks and best paper Award announcment
Room RBC 271
2015-10-02T10:08:17Z
OPAQUE
96a4bh66arosfhfed2uu76kt08@google.com
2015-10-02T12:01:23Z
2015-10-02T12:03:03Z
Plenary | Closing and Awards
1
2015-10-12T19:30:00Z
Bagheri
Ebrahim
Ebrahim Bagheri
Ebrahim Bagheri
Zhang
Ziqi
Ziqi Zhang
Ziqi Zhang
Topic Modeling for RDF Graphs
2015
2015
Topic models are widely used to thematically describe a collection of text documents and have become an important technique for systems that measure document similarity for classification, clustering, segmentation, entity linking and more. While they have been applied to some non-text domains, their use for semi-structured graph data, such as RDF, has been less explored. We present a framework for applying topic modeling to RDF graph data and describe how it can be used in a number of linked data tasks. Since topic modeling builds abstract topics using the co-occurrence of document terms, sparse documents can be problematic, presenting challenges for RDF data. We outline techniques to overcome this problem and the results of experiments in using them. Finally, we show preliminary results of using Latent Dirichlet Allocation generative topic modeling for several linked data use cases.
Topic Modeling for RDF Graphs
Tresp
Volker
Volker Tresp
Volker Tresp
Basile
Pierpaolo
Pierpaolo Basile
Pierpaolo Basile
Program committee member
Nickles
Matthias
Matthias Nickles
Matthias Nickles
Ghashghaei
Mehrnaz
Mehrnaz Ghashghaei
Mehrnaz Ghashghaei
Riedl
Martin
Martin Riedl
Martin Riedl
University of Sheffield
University of Sheffield
Rettinger
Achim
Achim Rettinger
Achim Rettinger
Chris
Welty
Chris Welty
Chris Welty
Ponzetto
Simone Paolo
Simone Paolo Ponzetto
Simone Paolo Ponzetto
Solanki
Monika
Monika Solanki
Monika Solanki
The 3rd International Workshop on Linked Data for Information Extraction
LD4IE2015
The 3rd International Workshop on Linked Data for Information Extraction
The 3rd International Workshop on Linked Data for Information Extraction
2015-10-12T15:30:00
Bethlehem, Pennsylvania - USA
Bethlehem, Pennsylvania - USA
2015-10-12T09:00:00
2015-10-02T12:00:16Z
OPAQUE
Research | Semantic Annotation of Quantitative Textual Content
Mehrnaz Ghashghaei, John Cuzzola, Ebrahim Bagheri and Ali Ghorbani.
Room RBC 271
2015-10-12T14:00:00Z
2015-10-12T13:30:00Z
0
ri9oe6lcndmbii7j2mehclqu2o@google.com
2015-10-02T10:01:21Z
CONFIRMED
2015-10-02T12:03:03Z
Pujara
Jay
Jay Pujara
Jay Pujara
Kolthoff
Kristian
Kristian Kolthoff
Kristian Kolthoff
D'Amato
Claudia
Claudia D'Amato
Claudia D'Amato
Moro
Andrea
Andrea Moro
Andrea Moro
Fernandez
Miriam
Miriam Fernandez
Miriam Fernandez
CONFIRMED
2015-10-02T10:02:43Z
2015-10-02T12:00:31Z
2015-10-12T15:30:00Z
2015-10-12T15:00:00Z
rlofdvrkt3tmr48533jbsgqt48@google.com
2015-10-02T12:03:03Z
0
Jennifer Sleeman, Tim Finin and Anupam Joshi.
OPAQUE
Room RBC 271
Research | Topic Modeling for RDF Graphs
Barnaghi
Payam
Payam Barnaghi
Payam Barnaghi
Liu
Cong
Cong Liu
Cong Liu
Knoblock
Craig
Craig Knoblock
Craig Knoblock
Buitelaar
Paul
Paul Buitelaar
Paul Buitelaar