{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "\n", "This IPython notebook illustrates how to perform matching using the rule-based matcher.\n", "\n", "First, we need to import py_entitymatching package and other libraries as follows:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Import py_entitymatching package\n", "import py_entitymatching as em\n", "import os\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, read the (sample) input tables for matching purposes." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Get the datasets directory\n", "datasets_dir = em.get_install_path() + os.sep + 'datasets'\n", "\n", "path_A = datasets_dir + os.sep + 'dblp_demo.csv'\n", "path_B = datasets_dir + os.sep + 'acm_demo.csv'\n", "path_labeled_data = datasets_dir + os.sep + 'labeled_data_demo.csv'" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
_idltable_idrtable_idltable_titleltable_authorsltable_yearrtable_titlertable_authorsrtable_yearlabel
00l1223r498Dynamic Information VisualizationYannis E. Ioannidis1996Dynamic information visualizationYannis E. Ioannidis19961
11l1563r1285Dynamic Load Balancing in Hierarchical Parallel Database SystemsLuc Bouganim, Daniela Florescu, Patrick Valduriez1996Dynamic Load Balancing in Hierarchical Parallel Database SystemsLuc Bouganim, Daniela Florescu, Patrick Valduriez19961
22l1514r1348Query Processing and Optimization in Oracle RdbGennady Antoshenkov, Mohamed Ziauddin1996prospector: a content-based multimedia server for massively parallel architecturesS. Choo, W. O'Connell, G. Linerman, H. Chen, K. Ganapathy, A. Biliris, E. Panagos, D. Schrader19960
33l206r1641An Asymptotically Optimal Multiversion B-TreeThomas Ohler, Peter Widmayer, Bruno Becker, Stephan Gschwind, Bernhard Seeger1996A complete temporal relational algebraDebabrata Dey, Terence M. Barron, Veda C. Storey19960
44l1589r495Evaluating Probabilistic Queries over Imprecise DataReynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar2003Evaluating probabilistic queries over imprecise dataReynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar20031
\n", "
" ], "text/plain": [ " _id ltable_id rtable_id \\\n", "0 0 l1223 r498 \n", "1 1 l1563 r1285 \n", "2 2 l1514 r1348 \n", "3 3 l206 r1641 \n", "4 4 l1589 r495 \n", "\n", " ltable_title \\\n", "0 Dynamic Information Visualization \n", "1 Dynamic Load Balancing in Hierarchical Parallel Database Systems \n", "2 Query Processing and Optimization in Oracle Rdb \n", "3 An Asymptotically Optimal Multiversion B-Tree \n", "4 Evaluating Probabilistic Queries over Imprecise Data \n", "\n", " ltable_authors \\\n", "0 Yannis E. Ioannidis \n", "1 Luc Bouganim, Daniela Florescu, Patrick Valduriez \n", "2 Gennady Antoshenkov, Mohamed Ziauddin \n", "3 Thomas Ohler, Peter Widmayer, Bruno Becker, Stephan Gschwind, Bernhard Seeger \n", "4 Reynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar \n", "\n", " ltable_year \\\n", "0 1996 \n", "1 1996 \n", "2 1996 \n", "3 1996 \n", "4 2003 \n", "\n", " rtable_title \\\n", "0 Dynamic information visualization \n", "1 Dynamic Load Balancing in Hierarchical Parallel Database Systems \n", "2 prospector: a content-based multimedia server for massively parallel architectures \n", "3 A complete temporal relational algebra \n", "4 Evaluating probabilistic queries over imprecise data \n", "\n", " rtable_authors \\\n", "0 Yannis E. Ioannidis \n", "1 Luc Bouganim, Daniela Florescu, Patrick Valduriez \n", "2 S. Choo, W. O'Connell, G. Linerman, H. Chen, K. Ganapathy, A. Biliris, E. Panagos, D. Schrader \n", "3 Debabrata Dey, Terence M. Barron, Veda C. Storey \n", "4 Reynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar \n", "\n", " rtable_year label \n", "0 1996 1 \n", "1 1996 1 \n", "2 1996 0 \n", "3 1996 0 \n", "4 2003 1 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = em.read_csv_metadata(path_A, key='id')\n", "B = em.read_csv_metadata(path_B, key='id')\n", "\n", "# Load the pre-labeled data\n", "S = em.read_csv_metadata(path_labeled_data, \n", " key='_id',\n", " ltable=A, rtable=B, \n", " fk_ltable='ltable_id', fk_rtable='rtable_id')\n", "S.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, split the labeled data into development set and evaluation set. Use the development set to select the best learning-based matcher" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Split S into I an J\n", "IJ = em.split_train_test(S, train_proportion=0.5, random_state=0)\n", "I = IJ['train']\n", "J = IJ['test']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating and Using a Rule-Based Matcher\n", "\n", "This, typically involves the following steps:\n", "1. Creating the rule-based matcher\n", "2. Creating features\n", "3. Adding Rules\n", "4. Using the Matcher to Predict Results\n", "\n", "## Creating the Rule-Based Matcher" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "brm = em.BooleanRuleMatcher()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating Features\n", "\n", "Next, we need to create a set of features for the development set. Magellan provides a way to automatically generate features based on the attributes in the input tables. For the purposes of this guide, we use the automatically generated features." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Generate a set of features\n", "F = em.get_features_for_matching(A, B, validate_inferred_attr_types=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We observe that there were 20 features generated. As a first step, lets say that we decide to use only 'year' related features." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 id_id_lev_dist\n", "1 id_id_lev_sim\n", "2 id_id_jar\n", "3 id_id_jwn\n", "4 id_id_exm\n", "5 id_id_jac_qgm_3_qgm_3\n", "6 title_title_jac_qgm_3_qgm_3\n", "7 title_title_cos_dlm_dc0_dlm_dc0\n", "8 title_title_mel\n", "9 title_title_lev_dist\n", "10 title_title_lev_sim\n", "11 authors_authors_jac_qgm_3_qgm_3\n", "12 authors_authors_cos_dlm_dc0_dlm_dc0\n", "13 authors_authors_mel\n", "14 authors_authors_lev_dist\n", "15 authors_authors_lev_sim\n", "16 year_year_exm\n", "17 year_year_anm\n", "18 year_year_lev_dist\n", "19 year_year_lev_sim\n", "Name: feature_name, dtype: object" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.feature_name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding Rules\n", "\n", "Before we can use the rule-based matcher, we need to create rules to evaluate tuple pairs. Each rule is a list of strings. Each string specifies a conjunction of predicates. Each predicate has three parts: (1) an expression, (2) a comparison operator, and (3) a value. The expression is evaluated over a tuple pair, producing a numeric value." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['_rule_0', '_rule_1']" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add two rules to the rule-based matcher\n", "\n", "# The first rule has two predicates, one comparing the titles and the other looking for an exact match of the years\n", "brm.add_rule(['title_title_lev_sim(ltuple, rtuple) > 0.4', 'year_year_exm(ltuple, rtuple) == 1'], F)\n", "# This second rule compares the authors\n", "brm.add_rule(['authors_authors_lev_sim(ltuple, rtuple) > 0.4'], F)\n", "brm.get_rule_names()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Rules can also be deleted from the rule-based matcher\n", "brm.delete_rule('_rule_1')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the Matcher to Predict Results\n", "\n", "Now that our rule-based matcher has some rules, we can use it to predict whether a tuple pair is actually a match. Each rule is is a conjunction of predicates and will return True only if all the predicates return True. The matcher is then a disjunction of rules and if any one of the rules return True, then the tuple pair will be a match." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
_idltable_idrtable_idltable_titleltable_authorsltable_yearrtable_titlertable_authorsrtable_yearlabelpred_label
00l1223r498Dynamic Information VisualizationYannis E. Ioannidis1996Dynamic information visualizationYannis E. Ioannidis199611
11l1563r1285Dynamic Load Balancing in Hierarchical Parallel Database SystemsLuc Bouganim, Daniela Florescu, Patrick Valduriez1996Dynamic Load Balancing in Hierarchical Parallel Database SystemsLuc Bouganim, Daniela Florescu, Patrick Valduriez199611
22l1514r1348Query Processing and Optimization in Oracle RdbGennady Antoshenkov, Mohamed Ziauddin1996prospector: a content-based multimedia server for massively parallel architecturesS. Choo, W. O'Connell, G. Linerman, H. Chen, K. Ganapathy, A. Biliris, E. Panagos, D. Schrader199600
33l206r1641An Asymptotically Optimal Multiversion B-TreeThomas Ohler, Peter Widmayer, Bruno Becker, Stephan Gschwind, Bernhard Seeger1996A complete temporal relational algebraDebabrata Dey, Terence M. Barron, Veda C. Storey199600
44l1589r495Evaluating Probabilistic Queries over Imprecise DataReynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar2003Evaluating probabilistic queries over imprecise dataReynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar200311
55l43r1415Optimization of Run-time Management of Data Intensive Web-sitesKhaled Yagoub, Dan Suciu, Alon Y. Levy, Daniela Florescu1999On random sampling over joinsSurajit Chaudhuri, Rajeev Motwani, Vivek Narasayya199900
66l1466r1348Access Path Support for Referential Integrity in SQL2Joachim Reinert, Theo Hrder1996prospector: a content-based multimedia server for massively parallel architecturesS. Choo, W. O'Connell, G. Linerman, H. Chen, K. Ganapathy, A. Biliris, E. Panagos, D. Schrader199600
77l1535r1800Mariposa: A Wide-Area Distributed Database SystemCarl Staelin, Paul M. Aoki, Witold Litwin, Michael Stonebraker, Adam Sah, Jeff Sidell, Andrew Yu...1996Further Improvements on Integrity Constraint Checking for Stratifiable Deductive DatabasesSin Yeung Lee, Tok Wang Ling199600
88l1317r1676QuickStore: A High Performance Mapped Object StoreDavid J. DeWitt, Seth J. White1994An Overview of Repository TechnologyPhilip A. Bernstein, Umeshwar Dayal199400
99l621r175Communication Efficient Distributed Mining of Association RulesRan Wolff, Assaf Schuster2001EditorialRichard Snodgrass200100
1010l668r1694Indexing Multimedia Databases (Tutorial)Christos Faloutsos1995Information finding in a digital library: the Stanford perspectiveTak W. Yan, Héctor García-Molina199500
1111l1189r1674Weimin Du, Xiangning Liu, Abdelsalam HelalMultiview Access Protocols for Large-Scale Replication1998Multiview access protocols for large-scale replicationXiangning Liu, Abdelsalam Helal, Weimin Du199810
1212l1657r110Semantic B2B IntegrationChristoph Bussler2001Monitoring business processes through event correlation based on dependency modelAsaf Adii, David Botzer, Opher Etzion, Tali Yatzkar-Haham200100
1313l1490r599Extracting Large Data Sets using DB2 Parallel EditionSriram Padmanabhan1996Extracting Large Data Sets using DB2 Parallel EditionSriram Padmanabhan199611
1414l595r87Of Crawlers, Portals, Mice and Men: Is there more to Mining the Web? (Panel)Kyuseok Shim, Rajeev Rastogi, Minos N. Garofalakis, Sridhar Ramaswamy1999Of crawlers, portals, mice, and men: is there more to mining the Web?Minos N. Garofalakis, Sridhar Ramaswamy, Rajeev Rastogi, Kyuseok Shim199911
1515l380r1337Outerjoin Simplification and Reordering for Query OptimizationCsar A. Galindo-Legaria, Arnon Rosenthal1997Outerjoin simplification and reordering for query optimizationCésar Galindo-Legaria, Arnon Rosenthal199711
1616l165r1118Cache-and-Query for Wide Area Sensor DatabasesPhillip B. Gibbons, Srinivasan Seshan, Suman Kumar Nath, Amol Deshpande2003Cache-and-query for wide area sensor databasesAmol Deshpande, Suman Nath, Phillip B. Gibbons, Srinivasan Seshan200311
1717l796r588Generating Dynamic Content at Database-Backed Web Servers: cgi-bin vs. mod_perlAlexandros Labrinidis, Nick Roussopoulos2000Novel Approaches in Query Processing for Moving Object TrajectoriesDieter Pfoser, Christian S. Jensen, Yannis Theodoridis200000
1818l1160r1733Khaled Alsabti, Vineet Singh, Sanjay RankaA One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data1997A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident DataKhaled Alsabti, Sanjay Ranka, Vineet Singh199710
1919l1752r3SHORE: Combining the Best Features of OODBMS and File SystemsShore Team1995The LyriC language: querying constraint objectsAlexander Brodsky, Yoram Kornatzky199500
2020l1647r945Cost Based Query Scrambling for Initial DelaysTolga Urhan, Michael J. Franklin, Laurent Amsaleg1998The Cubetree Storage OrganizationNick Roussopoulos, Yannis Kotidis199800
2121l1135r1127Sampling-Based Estimation of the Number of Distinct Values of an AttributePeter J. Haas, Lynne Stokes, S. Seshadri, Jeffrey F. Naughton1995View maintenance in a warehousing environmentYue Zhuge, Héctor García-Molina, Joachim Hammer, Jennifer Widom199500
2222l1776r987Walking Through a Very Large Virtual Environment in Real-timeYixin Ruan, Kian-Lee Tan, Jason Chionh, Lidan Shou, Zhiyong Huang2001Walking Through a Very Large Virtual Environment in Real-timeLidan Shou, Jason Chionh, Zhiyong Huang, Yixin Ruan, Kian-Lee Tan200111
2323l676r1395Datawarehousing Has More Colours Than Just Black & WhiteThomas Zurek, Markus Sinnwell1999Datawarehousing Has More Colours Than Just Black &; WhiteThomas Zurek, Markus Sinnwell199911
2424l1087r648The Grid: An Application of the Semantic WebCarole A. Goble, David De Roure2002An XML query engine for network-bound dataZachary G. Ives, A. Y. Halevy, D. S. Weld200200
2525l629r1478Engineering Federated Information Systems: Report of EFIS '99 WorkshopFlix Saltor, Uwe Hohenstein, Ralf-Detlef Kutsche, Wilhelm Hasselbring, Gunter Saake, Stefan Conr...1999Engineering federated information systems: report of EEFIS '99 workshopS. Conrad, W. Hasselbring, U. Hohenstein, R.-D. Kutsche, M. Roantree, G. Saake, F. Saltor199911
2626l649r1366Random Sampling for Histogram Construction: How much is enough?Vivek R. Narasayya, Rajeev Motwani, Surajit Chaudhuri1998Random sampling for histogram construction: how much is enough?Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya199811
2727l211r1490BeSS: Storage Support for Interactive Visualization SystemsWilliam O'Connell, Thomas A. Funkhouser, Alexandros Biliris, Euthimios Panagos1996BeSS: storage support for interactive visualization systemsA. Biliris, T. A. Funkhouser, W. O'Connell, E. Panagos199611
2828l734r384Min-Max Compression Methods for Medical Image DatabasesJohn M. Tyler, Kosmas Karadimitriou1997Min-max compression methods for medical image databasesKosmas Karadimitriou, John M. Tyler199711
2929l611r141Mining Generalized Association RulesRamakrishnan Srikant, Rakesh Agrawal1995Multi-table joins through bitmapped join indicesPatrick O'Neil, Goetz Graefe199500
....................................
420420l834r883Estimating the Selectivity of XML Path Expressions for Internet Scale ApplicationsAshraf Aboulnaga, Jeffrey F. Naughton, Alaa R. Alameldeen2001Estimating the Selectivity of XML Path Expressions for Internet Scale ApplicationsAshraf Aboulnaga, Alaa R. Alameldeen, Jeffrey F. Naughton200111
421421l746r301Providing Database Migration Tools - A Practicioner's ApproachAndreas Meier1995Providing Database Migration Tools - A Practicioner's ApproachAndreas Meier199511
422422l1332r619Workshop on Workflow Management in Scientific and Engineering Applications - ReportGottfried Vossen, Richard McClatchey1997Workshop on workflow management in scientific and engineering applications-reportR. McClatchey, G. Vossen199711
423423l942r1473Research in Databases and Data-Intensive Applications - Computer Science Department and FZI, Uni...Birgitta Knig-Ries, Peter C. Lockemann1997Research in databases and data-intensive applications: Computer Science Dept. and FIZ, Universit...Brigitta König-Ries, Peter C. Lockermann199711
424424l806r356Tribeca: A Stream Database Manager for Network Traffic AnalysisMark Sullivan1996Type-safe relaxing of schema consistency rules for flexible modelling in OODBMSEric Amiel, Marie-Jo Bellosta, Eric Dujardin, Eric Simon199600
425425l794r784Spatial Data Management for Computer Aided DesignAndreas Mller, Marco Ptke, Thomas Seidl, Hans-Peter Kriegel2001Dynamic content acceleration: a caching solution to enable scalable dynamic Web page generationAnindya Datta, Kaushik Dutta, Krithi Ramamritham, Helen Thomas, Debra VanderMeer200100
426426l28r1618Storage Technology: RAID and BeyondGarth A. Gibson1995Tutorial on storage technology: RAID and beyondGarth A. Gibson199511
427427l1183r1409Stephen Blott, Roger Weber, Hans-Jrg SchekA Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional ...1998A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional ...Roger Weber, Hans-Jörg Schek, Stephen Blott199810
428428l1122r232Interview with Jim GrayMarianne Winslett2003In-context peer-to-peer information filtering on the WebAris M. Ouksel200300
429429l1430r1444Condition Handling in SQL Persistent Stored ModulesJeff Richey1995Condition handling in SQL persistent stored modulesJeff Richey199511
430430l1494r1257The Mariposa Distributed Database Management SystemJeff Sidell1996Open issues in parallel query optimizationWaqar Hasan, Daniela Florescu, Patrick Valduriez199600
431431l1592r439Report on the 18th British National Conference on Databases (BNCOD)Carole A. Goble, Brian J. Read2002Contracting in the days of eBusinessW. Hümmer, W. Lehner, H. Wedekind200200
432432l1015r45Database Systems - Breaking Out of the BoxAbraham Silberschatz, Stanley B. Zdonik1997Dynamic Memory Adjustment for External MergesortWeiye Zhang, Per-Åke Larson199700
433433l1147r1016Xiaolei QianScientist's Called Upon to Take Actions1996Scientists called upon to take actionsXiaolei Qian199610
434434l1756r310ARIES/CSA: A Method for Database Recovery in Client-Server ArchitecturesC. Mohan, Inderpal Narang1994Enterprise information architectures-they're finally changingWesley P. Melling199400
435435l1044r67Digital Library Services in Mobile ComputingEvaggelia Pitoura, Melliyal Annamalai, Bharat K. Bhargava1995Ordered shared locks for real-time databasesDivyakant Agrawal, Amr El Abbadi, Richard Jeffers, Lijing Lin199500
436436l412r651Phoenix: Making Applications RobustDavid B. Lomet, Roger S. Barga1999DataBlitz storage manager: main-memory database performance for critical applicationsJ. Baulier, P. Bohannon, S. Gogate, C. Gupta, S. Haldar199900
437437l796r1808Generating Dynamic Content at Database-Backed Web Servers: cgi-bin vs. mod_perlAlexandros Labrinidis, Nick Roussopoulos2000On wrapping query languages and efficient XML integrationVassilis Christophides, Sophie Cluet, Jérǒme Simèon200000
438438l1570r1468Instance-based attribute identification in database integrationRoger H. L. Chiang, Ee-Peng Lim, Chua Eng Huang Cecil2003Index-driven similarity search in metric spacesGisli R. Hjaltason, Hanan Samet200300
439439l1577r688Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualiza...Shinichi Morishita, Yasuhiko Morimoto, Takeshi Tokuyama, Takeshi Fukuda1996Static detection of security flaws in object-oriented databasesKeishi Tajima199600
440440l617r310Fine-Grained Sharing in a Page Server OODBMSMichael J. Carey, Markos Zaharioudakis, Michael J. Franklin1994Enterprise information architectures-they're finally changingWesley P. Melling199400
441441l1304r1178Query Rewriting for Semistructured DataVasilis Vassalos, Yannis Papakonstantinou1999The Aqua approximate query answering systemSwarup Acharya, Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy199900
442442l727r597Design and Analysis of Parametric Query Optimization AlgorithmsSumit Ganguly1998Incremental distance join algorithms for spatial databasesGísli R. Hjaltason, Hanan Samet199800
443443l1205r395Proxy-Server Architectures for OLAPPanos Kalnis, Dimitris Papadias2001Proxy-server architectures for OLAPPanos Kalnis, Dimitris Papadias200111
444444l915r1532Efficient k-NN search on vertically decomposed dataNiels Nes, Martin L. Kersten, Nikos Mamoulis, Arjen P. de Vries2002Efficient k-NN search on vertically decomposed dataArjen P. de Vries, Nikos Mamoulis, Niels Nes, Martin Kersten200211
445445l365r5350,000 Users on an Oracle8 Universal Server DatabaseAshok Josji, Tirthankar Lahiri, Amit Jasuja, Sumanta Chatterjee1998A workflow-based electronic marketplace on the WebAsuman Dogac, Ilker Durusoy, Sena Arpinar, Nesime Tatbul, Pinar Koksal, Ibrahim Cingil, Nazife D...199800
446446l458r767Comparing Hierarchical Data in External MemorySudarshan S. Chawathe1999Context-Based Prefetch for Implementing Objects on RelationsPhilip A. Bernstein, Shankar Pal, David Shutt199900
447447l655r412The SDSS skyserver: public access to the sloan digital sky server dataTanu Malik, Jordan Raddick, Alexander S. Szalay, Peter Z. Kunszt, Jim Gray, Christopher Stoughto...2002Report on the ACM fourth international workshop on data warehousing and OLAP (DOLAP 2001)Joachim Hammer200200
448448l123r1493Change-Centric Management of Versions in an XML WarehouseLaurent Mignet, Amlie Marian, Gregory Cobena, Serge Abiteboul2001A Sequential Pattern Query Language for Supporting Instant Data Mining for e-ServicesReza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi200100
449449l590r295Skew handling techniques in sort-merge joinRichard T. Snodgrass, Wei Li, Dengfeng Gao2002QURSED: querying and reporting semistructured dataYannis Papakonstantinou, Michalis Petropoulos, Vasilis Vassalos200200
\n", "

450 rows × 11 columns

\n", "
" ], "text/plain": [ " _id ltable_id rtable_id \\\n", "0 0 l1223 r498 \n", "1 1 l1563 r1285 \n", "2 2 l1514 r1348 \n", "3 3 l206 r1641 \n", "4 4 l1589 r495 \n", "5 5 l43 r1415 \n", "6 6 l1466 r1348 \n", "7 7 l1535 r1800 \n", "8 8 l1317 r1676 \n", "9 9 l621 r175 \n", "10 10 l668 r1694 \n", "11 11 l1189 r1674 \n", "12 12 l1657 r110 \n", "13 13 l1490 r599 \n", "14 14 l595 r87 \n", "15 15 l380 r1337 \n", "16 16 l165 r1118 \n", "17 17 l796 r588 \n", "18 18 l1160 r1733 \n", "19 19 l1752 r3 \n", "20 20 l1647 r945 \n", "21 21 l1135 r1127 \n", "22 22 l1776 r987 \n", "23 23 l676 r1395 \n", "24 24 l1087 r648 \n", "25 25 l629 r1478 \n", "26 26 l649 r1366 \n", "27 27 l211 r1490 \n", "28 28 l734 r384 \n", "29 29 l611 r141 \n", ".. ... ... ... \n", "420 420 l834 r883 \n", "421 421 l746 r301 \n", "422 422 l1332 r619 \n", "423 423 l942 r1473 \n", "424 424 l806 r356 \n", "425 425 l794 r784 \n", "426 426 l28 r1618 \n", "427 427 l1183 r1409 \n", "428 428 l1122 r232 \n", "429 429 l1430 r1444 \n", "430 430 l1494 r1257 \n", "431 431 l1592 r439 \n", "432 432 l1015 r45 \n", "433 433 l1147 r1016 \n", "434 434 l1756 r310 \n", "435 435 l1044 r67 \n", "436 436 l412 r651 \n", "437 437 l796 r1808 \n", "438 438 l1570 r1468 \n", "439 439 l1577 r688 \n", "440 440 l617 r310 \n", "441 441 l1304 r1178 \n", "442 442 l727 r597 \n", "443 443 l1205 r395 \n", "444 444 l915 r1532 \n", "445 445 l365 r53 \n", "446 446 l458 r767 \n", "447 447 l655 r412 \n", "448 448 l123 r1493 \n", "449 449 l590 r295 \n", "\n", " ltable_title \\\n", "0 Dynamic Information Visualization \n", "1 Dynamic Load Balancing in Hierarchical Parallel Database Systems \n", "2 Query Processing and Optimization in Oracle Rdb \n", "3 An Asymptotically Optimal Multiversion B-Tree \n", "4 Evaluating Probabilistic Queries over Imprecise Data \n", "5 Optimization of Run-time Management of Data Intensive Web-sites \n", "6 Access Path Support for Referential Integrity in SQL2 \n", "7 Mariposa: A Wide-Area Distributed Database System \n", "8 QuickStore: A High Performance Mapped Object Store \n", "9 Communication Efficient Distributed Mining of Association Rules \n", "10 Indexing Multimedia Databases (Tutorial) \n", "11 Weimin Du, Xiangning Liu, Abdelsalam Helal \n", "12 Semantic B2B Integration \n", "13 Extracting Large Data Sets using DB2 Parallel Edition \n", "14 Of Crawlers, Portals, Mice and Men: Is there more to Mining the Web? (Panel) \n", "15 Outerjoin Simplification and Reordering for Query Optimization \n", "16 Cache-and-Query for Wide Area Sensor Databases \n", "17 Generating Dynamic Content at Database-Backed Web Servers: cgi-bin vs. mod_perl \n", "18 Khaled Alsabti, Vineet Singh, Sanjay Ranka \n", "19 SHORE: Combining the Best Features of OODBMS and File Systems \n", "20 Cost Based Query Scrambling for Initial Delays \n", "21 Sampling-Based Estimation of the Number of Distinct Values of an Attribute \n", "22 Walking Through a Very Large Virtual Environment in Real-time \n", "23 Datawarehousing Has More Colours Than Just Black & White \n", "24 The Grid: An Application of the Semantic Web \n", "25 Engineering Federated Information Systems: Report of EFIS '99 Workshop \n", "26 Random Sampling for Histogram Construction: How much is enough? \n", "27 BeSS: Storage Support for Interactive Visualization Systems \n", "28 Min-Max Compression Methods for Medical Image Databases \n", "29 Mining Generalized Association Rules \n", ".. ... \n", "420 Estimating the Selectivity of XML Path Expressions for Internet Scale Applications \n", "421 Providing Database Migration Tools - A Practicioner's Approach \n", "422 Workshop on Workflow Management in Scientific and Engineering Applications - Report \n", "423 Research in Databases and Data-Intensive Applications - Computer Science Department and FZI, Uni... \n", "424 Tribeca: A Stream Database Manager for Network Traffic Analysis \n", "425 Spatial Data Management for Computer Aided Design \n", "426 Storage Technology: RAID and Beyond \n", "427 Stephen Blott, Roger Weber, Hans-Jrg Schek \n", "428 Interview with Jim Gray \n", "429 Condition Handling in SQL Persistent Stored Modules \n", "430 The Mariposa Distributed Database Management System \n", "431 Report on the 18th British National Conference on Databases (BNCOD) \n", "432 Database Systems - Breaking Out of the Box \n", "433 Xiaolei Qian \n", "434 ARIES/CSA: A Method for Database Recovery in Client-Server Architectures \n", "435 Digital Library Services in Mobile Computing \n", "436 Phoenix: Making Applications Robust \n", "437 Generating Dynamic Content at Database-Backed Web Servers: cgi-bin vs. mod_perl \n", "438 Instance-based attribute identification in database integration \n", "439 Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualiza... \n", "440 Fine-Grained Sharing in a Page Server OODBMS \n", "441 Query Rewriting for Semistructured Data \n", "442 Design and Analysis of Parametric Query Optimization Algorithms \n", "443 Proxy-Server Architectures for OLAP \n", "444 Efficient k-NN search on vertically decomposed data \n", "445 50,000 Users on an Oracle8 Universal Server Database \n", "446 Comparing Hierarchical Data in External Memory \n", "447 The SDSS skyserver: public access to the sloan digital sky server data \n", "448 Change-Centric Management of Versions in an XML Warehouse \n", "449 Skew handling techniques in sort-merge join \n", "\n", " ltable_authors \\\n", "0 Yannis E. Ioannidis \n", "1 Luc Bouganim, Daniela Florescu, Patrick Valduriez \n", "2 Gennady Antoshenkov, Mohamed Ziauddin \n", "3 Thomas Ohler, Peter Widmayer, Bruno Becker, Stephan Gschwind, Bernhard Seeger \n", "4 Reynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar \n", "5 Khaled Yagoub, Dan Suciu, Alon Y. Levy, Daniela Florescu \n", "6 Joachim Reinert, Theo Hrder \n", "7 Carl Staelin, Paul M. Aoki, Witold Litwin, Michael Stonebraker, Adam Sah, Jeff Sidell, Andrew Yu... \n", "8 David J. DeWitt, Seth J. White \n", "9 Ran Wolff, Assaf Schuster \n", "10 Christos Faloutsos \n", "11 Multiview Access Protocols for Large-Scale Replication \n", "12 Christoph Bussler \n", "13 Sriram Padmanabhan \n", "14 Kyuseok Shim, Rajeev Rastogi, Minos N. Garofalakis, Sridhar Ramaswamy \n", "15 Csar A. Galindo-Legaria, Arnon Rosenthal \n", "16 Phillip B. Gibbons, Srinivasan Seshan, Suman Kumar Nath, Amol Deshpande \n", "17 Alexandros Labrinidis, Nick Roussopoulos \n", "18 A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data \n", "19 Shore Team \n", "20 Tolga Urhan, Michael J. Franklin, Laurent Amsaleg \n", "21 Peter J. Haas, Lynne Stokes, S. Seshadri, Jeffrey F. Naughton \n", "22 Yixin Ruan, Kian-Lee Tan, Jason Chionh, Lidan Shou, Zhiyong Huang \n", "23 Thomas Zurek, Markus Sinnwell \n", "24 Carole A. Goble, David De Roure \n", "25 Flix Saltor, Uwe Hohenstein, Ralf-Detlef Kutsche, Wilhelm Hasselbring, Gunter Saake, Stefan Conr... \n", "26 Vivek R. Narasayya, Rajeev Motwani, Surajit Chaudhuri \n", "27 William O'Connell, Thomas A. Funkhouser, Alexandros Biliris, Euthimios Panagos \n", "28 John M. Tyler, Kosmas Karadimitriou \n", "29 Ramakrishnan Srikant, Rakesh Agrawal \n", ".. ... \n", "420 Ashraf Aboulnaga, Jeffrey F. Naughton, Alaa R. Alameldeen \n", "421 Andreas Meier \n", "422 Gottfried Vossen, Richard McClatchey \n", "423 Birgitta Knig-Ries, Peter C. Lockemann \n", "424 Mark Sullivan \n", "425 Andreas Mller, Marco Ptke, Thomas Seidl, Hans-Peter Kriegel \n", "426 Garth A. Gibson \n", "427 A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional ... \n", "428 Marianne Winslett \n", "429 Jeff Richey \n", "430 Jeff Sidell \n", "431 Carole A. Goble, Brian J. Read \n", "432 Abraham Silberschatz, Stanley B. Zdonik \n", "433 Scientist's Called Upon to Take Actions \n", "434 C. Mohan, Inderpal Narang \n", "435 Evaggelia Pitoura, Melliyal Annamalai, Bharat K. Bhargava \n", "436 David B. Lomet, Roger S. Barga \n", "437 Alexandros Labrinidis, Nick Roussopoulos \n", "438 Roger H. L. Chiang, Ee-Peng Lim, Chua Eng Huang Cecil \n", "439 Shinichi Morishita, Yasuhiko Morimoto, Takeshi Tokuyama, Takeshi Fukuda \n", "440 Michael J. Carey, Markos Zaharioudakis, Michael J. Franklin \n", "441 Vasilis Vassalos, Yannis Papakonstantinou \n", "442 Sumit Ganguly \n", "443 Panos Kalnis, Dimitris Papadias \n", "444 Niels Nes, Martin L. Kersten, Nikos Mamoulis, Arjen P. de Vries \n", "445 Ashok Josji, Tirthankar Lahiri, Amit Jasuja, Sumanta Chatterjee \n", "446 Sudarshan S. Chawathe \n", "447 Tanu Malik, Jordan Raddick, Alexander S. Szalay, Peter Z. Kunszt, Jim Gray, Christopher Stoughto... \n", "448 Laurent Mignet, Amlie Marian, Gregory Cobena, Serge Abiteboul \n", "449 Richard T. Snodgrass, Wei Li, Dengfeng Gao \n", "\n", " ltable_year \\\n", "0 1996 \n", "1 1996 \n", "2 1996 \n", "3 1996 \n", "4 2003 \n", "5 1999 \n", "6 1996 \n", "7 1996 \n", "8 1994 \n", "9 2001 \n", "10 1995 \n", "11 1998 \n", "12 2001 \n", "13 1996 \n", "14 1999 \n", "15 1997 \n", "16 2003 \n", "17 2000 \n", "18 1997 \n", "19 1995 \n", "20 1998 \n", "21 1995 \n", "22 2001 \n", "23 1999 \n", "24 2002 \n", "25 1999 \n", "26 1998 \n", "27 1996 \n", "28 1997 \n", "29 1995 \n", ".. ... \n", "420 2001 \n", "421 1995 \n", "422 1997 \n", "423 1997 \n", "424 1996 \n", "425 2001 \n", "426 1995 \n", "427 1998 \n", "428 2003 \n", "429 1995 \n", "430 1996 \n", "431 2002 \n", "432 1997 \n", "433 1996 \n", "434 1994 \n", "435 1995 \n", "436 1999 \n", "437 2000 \n", "438 2003 \n", "439 1996 \n", "440 1994 \n", "441 1999 \n", "442 1998 \n", "443 2001 \n", "444 2002 \n", "445 1998 \n", "446 1999 \n", "447 2002 \n", "448 2001 \n", "449 2002 \n", "\n", " rtable_title \\\n", "0 Dynamic information visualization \n", "1 Dynamic Load Balancing in Hierarchical Parallel Database Systems \n", "2 prospector: a content-based multimedia server for massively parallel architectures \n", "3 A complete temporal relational algebra \n", "4 Evaluating probabilistic queries over imprecise data \n", "5 On random sampling over joins \n", "6 prospector: a content-based multimedia server for massively parallel architectures \n", "7 Further Improvements on Integrity Constraint Checking for Stratifiable Deductive Databases \n", "8 An Overview of Repository Technology \n", "9 Editorial \n", "10 Information finding in a digital library: the Stanford perspective \n", "11 Multiview access protocols for large-scale replication \n", "12 Monitoring business processes through event correlation based on dependency model \n", "13 Extracting Large Data Sets using DB2 Parallel Edition \n", "14 Of crawlers, portals, mice, and men: is there more to mining the Web? \n", "15 Outerjoin simplification and reordering for query optimization \n", "16 Cache-and-query for wide area sensor databases \n", "17 Novel Approaches in Query Processing for Moving Object Trajectories \n", "18 A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data \n", "19 The LyriC language: querying constraint objects \n", "20 The Cubetree Storage Organization \n", "21 View maintenance in a warehousing environment \n", "22 Walking Through a Very Large Virtual Environment in Real-time \n", "23 Datawarehousing Has More Colours Than Just Black &; White \n", "24 An XML query engine for network-bound data \n", "25 Engineering federated information systems: report of EEFIS '99 workshop \n", "26 Random sampling for histogram construction: how much is enough? \n", "27 BeSS: storage support for interactive visualization systems \n", "28 Min-max compression methods for medical image databases \n", "29 Multi-table joins through bitmapped join indices \n", ".. ... \n", "420 Estimating the Selectivity of XML Path Expressions for Internet Scale Applications \n", "421 Providing Database Migration Tools - A Practicioner's Approach \n", "422 Workshop on workflow management in scientific and engineering applications-report \n", "423 Research in databases and data-intensive applications: Computer Science Dept. and FIZ, Universit... \n", "424 Type-safe relaxing of schema consistency rules for flexible modelling in OODBMS \n", "425 Dynamic content acceleration: a caching solution to enable scalable dynamic Web page generation \n", "426 Tutorial on storage technology: RAID and beyond \n", "427 A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional ... \n", "428 In-context peer-to-peer information filtering on the Web \n", "429 Condition handling in SQL persistent stored modules \n", "430 Open issues in parallel query optimization \n", "431 Contracting in the days of eBusiness \n", "432 Dynamic Memory Adjustment for External Mergesort \n", "433 Scientists called upon to take actions \n", "434 Enterprise information architectures-they're finally changing \n", "435 Ordered shared locks for real-time databases \n", "436 DataBlitz storage manager: main-memory database performance for critical applications \n", "437 On wrapping query languages and efficient XML integration \n", "438 Index-driven similarity search in metric spaces \n", "439 Static detection of security flaws in object-oriented databases \n", "440 Enterprise information architectures-they're finally changing \n", "441 The Aqua approximate query answering system \n", "442 Incremental distance join algorithms for spatial databases \n", "443 Proxy-server architectures for OLAP \n", "444 Efficient k-NN search on vertically decomposed data \n", "445 A workflow-based electronic marketplace on the Web \n", "446 Context-Based Prefetch for Implementing Objects on Relations \n", "447 Report on the ACM fourth international workshop on data warehousing and OLAP (DOLAP 2001) \n", "448 A Sequential Pattern Query Language for Supporting Instant Data Mining for e-Services \n", "449 QURSED: querying and reporting semistructured data \n", "\n", " rtable_authors \\\n", "0 Yannis E. Ioannidis \n", "1 Luc Bouganim, Daniela Florescu, Patrick Valduriez \n", "2 S. Choo, W. O'Connell, G. Linerman, H. Chen, K. Ganapathy, A. Biliris, E. Panagos, D. Schrader \n", "3 Debabrata Dey, Terence M. Barron, Veda C. Storey \n", "4 Reynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar \n", "5 Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya \n", "6 S. Choo, W. O'Connell, G. Linerman, H. Chen, K. Ganapathy, A. Biliris, E. Panagos, D. Schrader \n", "7 Sin Yeung Lee, Tok Wang Ling \n", "8 Philip A. Bernstein, Umeshwar Dayal \n", "9 Richard Snodgrass \n", "10 Tak W. Yan, Héctor García-Molina \n", "11 Xiangning Liu, Abdelsalam Helal, Weimin Du \n", "12 Asaf Adii, David Botzer, Opher Etzion, Tali Yatzkar-Haham \n", "13 Sriram Padmanabhan \n", "14 Minos N. Garofalakis, Sridhar Ramaswamy, Rajeev Rastogi, Kyuseok Shim \n", "15 César Galindo-Legaria, Arnon Rosenthal \n", "16 Amol Deshpande, Suman Nath, Phillip B. Gibbons, Srinivasan Seshan \n", "17 Dieter Pfoser, Christian S. Jensen, Yannis Theodoridis \n", "18 Khaled Alsabti, Sanjay Ranka, Vineet Singh \n", "19 Alexander Brodsky, Yoram Kornatzky \n", "20 Nick Roussopoulos, Yannis Kotidis \n", "21 Yue Zhuge, Héctor García-Molina, Joachim Hammer, Jennifer Widom \n", "22 Lidan Shou, Jason Chionh, Zhiyong Huang, Yixin Ruan, Kian-Lee Tan \n", "23 Thomas Zurek, Markus Sinnwell \n", "24 Zachary G. Ives, A. Y. Halevy, D. S. Weld \n", "25 S. Conrad, W. Hasselbring, U. Hohenstein, R.-D. Kutsche, M. Roantree, G. Saake, F. Saltor \n", "26 Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya \n", "27 A. Biliris, T. A. Funkhouser, W. O'Connell, E. Panagos \n", "28 Kosmas Karadimitriou, John M. Tyler \n", "29 Patrick O'Neil, Goetz Graefe \n", ".. ... \n", "420 Ashraf Aboulnaga, Alaa R. Alameldeen, Jeffrey F. Naughton \n", "421 Andreas Meier \n", "422 R. McClatchey, G. Vossen \n", "423 Brigitta König-Ries, Peter C. Lockermann \n", "424 Eric Amiel, Marie-Jo Bellosta, Eric Dujardin, Eric Simon \n", "425 Anindya Datta, Kaushik Dutta, Krithi Ramamritham, Helen Thomas, Debra VanderMeer \n", "426 Garth A. Gibson \n", "427 Roger Weber, Hans-Jörg Schek, Stephen Blott \n", "428 Aris M. Ouksel \n", "429 Jeff Richey \n", "430 Waqar Hasan, Daniela Florescu, Patrick Valduriez \n", "431 W. Hümmer, W. Lehner, H. Wedekind \n", "432 Weiye Zhang, Per-Åke Larson \n", "433 Xiaolei Qian \n", "434 Wesley P. Melling \n", "435 Divyakant Agrawal, Amr El Abbadi, Richard Jeffers, Lijing Lin \n", "436 J. Baulier, P. Bohannon, S. Gogate, C. Gupta, S. Haldar \n", "437 Vassilis Christophides, Sophie Cluet, Jérǒme Simèon \n", "438 Gisli R. Hjaltason, Hanan Samet \n", "439 Keishi Tajima \n", "440 Wesley P. Melling \n", "441 Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy \n", "442 Gísli R. Hjaltason, Hanan Samet \n", "443 Panos Kalnis, Dimitris Papadias \n", "444 Arjen P. de Vries, Nikos Mamoulis, Niels Nes, Martin Kersten \n", "445 Asuman Dogac, Ilker Durusoy, Sena Arpinar, Nesime Tatbul, Pinar Koksal, Ibrahim Cingil, Nazife D... \n", "446 Philip A. Bernstein, Shankar Pal, David Shutt \n", "447 Joachim Hammer \n", "448 Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi \n", "449 Yannis Papakonstantinou, Michalis Petropoulos, Vasilis Vassalos \n", "\n", " rtable_year label pred_label \n", "0 1996 1 1 \n", "1 1996 1 1 \n", "2 1996 0 0 \n", "3 1996 0 0 \n", "4 2003 1 1 \n", "5 1999 0 0 \n", "6 1996 0 0 \n", "7 1996 0 0 \n", "8 1994 0 0 \n", "9 2001 0 0 \n", "10 1995 0 0 \n", "11 1998 1 0 \n", "12 2001 0 0 \n", "13 1996 1 1 \n", "14 1999 1 1 \n", "15 1997 1 1 \n", "16 2003 1 1 \n", "17 2000 0 0 \n", "18 1997 1 0 \n", "19 1995 0 0 \n", "20 1998 0 0 \n", "21 1995 0 0 \n", "22 2001 1 1 \n", "23 1999 1 1 \n", "24 2002 0 0 \n", "25 1999 1 1 \n", "26 1998 1 1 \n", "27 1996 1 1 \n", "28 1997 1 1 \n", "29 1995 0 0 \n", ".. ... ... ... \n", "420 2001 1 1 \n", "421 1995 1 1 \n", "422 1997 1 1 \n", "423 1997 1 1 \n", "424 1996 0 0 \n", "425 2001 0 0 \n", "426 1995 1 1 \n", "427 1998 1 0 \n", "428 2003 0 0 \n", "429 1995 1 1 \n", "430 1996 0 0 \n", "431 2002 0 0 \n", "432 1997 0 0 \n", "433 1996 1 0 \n", "434 1994 0 0 \n", "435 1995 0 0 \n", "436 1999 0 0 \n", "437 2000 0 0 \n", "438 2003 0 0 \n", "439 1996 0 0 \n", "440 1994 0 0 \n", "441 1999 0 0 \n", "442 1998 0 0 \n", "443 2001 1 1 \n", "444 2002 1 1 \n", "445 1998 0 0 \n", "446 1999 0 0 \n", "447 2002 0 0 \n", "448 2001 0 0 \n", "449 2002 0 0 \n", "\n", "[450 rows x 11 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "brm.predict(S, target_attr='pred_label', append=True)\n", "S" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }