{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Basic EM workflow 1 (Restaurants data set)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This IPython notebook explains a basic workflow to match two tables using *py_entitymatching*. Our goal is to match restaurants from Fodors and Zagat sites. The datasets contain information about the restaurants.\n", "\n", "First, we need to import *py_entitymatching* package and other libraries as follows:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import sys\n", "sys.path.append('/Users/pradap/Documents/Research/Python-Package/anhaid/py_entitymatching/')\n", "\n", "import py_entitymatching as em\n", "import pandas as pd\n", "import os" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "python version: 3.5.2 | packaged by conda-forge | (default, Sep 8 2016, 14:36:38) \n", "[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.54)]\n", "pandas version: 0.19.2\n", "magellan version: 0.1.0\n" ] } ], "source": [ "# Display the versions\n", "print('python version: ' + sys.version )\n", "print('pandas version: ' + pd.__version__ )\n", "print('magellan version: ' + em.__version__ )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Matching two tables typically consists of the following three steps:\n", "\n", "** 1. Reading the input tables **\n", "\n", "** 2. Blocking the input tables to get a candidate set **\n", "\n", "** 3. Matching the tuple pairs in the candidate set **" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Read input tables\n", "\n", "We begin by loading the input tables. For the purpose of this guide, we use the datasets that are included with the package." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Get the paths\n", "path_A = em.get_install_path() + os.sep + 'datasets' + os.sep + 'end-to-end' + os.sep + 'restaurants/fodors.csv'\n", "path_B = em.get_install_path() + os.sep + 'datasets' + os.sep + 'end-to-end' + os.sep + 'restaurants/zagats.csv'" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Metadata file is not present in the given path; proceeding to read the csv file.\n", "Metadata file is not present in the given path; proceeding to read the csv file.\n" ] } ], "source": [ "A = em.read_csv_metadata(path_A, key='id')\n", "B = em.read_csv_metadata(path_B, key='id')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of tuples in A: 533\n", "Number of tuples in B: 331\n", "Number of tuples in A X B (i.e the cartesian product): 176423\n" ] } ], "source": [ "print('Number of tuples in A: ' + str(len(A)))\n", "print('Number of tuples in B: ' + str(len(B)))\n", "print('Number of tuples in A X B (i.e the cartesian product): ' + str(len(A)*len(B)))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameaddrcityphonetype
0534arnie mortons of chicago435 s. la cienega blv.los angeles310/246-1501american
1535arts delicatessen12224 ventura blvd.studio city818/762-1221american
2536hotel bel-air701 stone canyon rd.bel air310/472-1211californian
3537cafe bizou14016 ventura blvd.sherman oaks818/788-3536french
4538campanile624 s. la brea ave.los angeles213/938-1447american
\n", "
" ], "text/plain": [ " id name addr city \\\n", "0 534 arnie mortons of chicago 435 s. la cienega blv. los angeles \n", "1 535 arts delicatessen 12224 ventura blvd. studio city \n", "2 536 hotel bel-air 701 stone canyon rd. bel air \n", "3 537 cafe bizou 14016 ventura blvd. sherman oaks \n", "4 538 campanile 624 s. la brea ave. los angeles \n", "\n", " phone type \n", "0 310/246-1501 american \n", "1 818/762-1221 american \n", "2 310/472-1211 californian \n", "3 818/788-3536 french \n", "4 213/938-1447 american " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.head()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameaddrcityphonetype
01apple pan the10801 w. pico blvd.west la310-475-3585american
12asahi ramen2027 sawtelle blvd.west la310-479-2231noodle shops
23baja fresh3345 kimber dr.westlake village805-498-4049mexican
34belvedere the9882 little santa monica blvd.beverly hills310-788-2306pacific new wave
45benitas frites1433 third st. promenadesanta monica310-458-2889fast food
\n", "
" ], "text/plain": [ " id name addr city \\\n", "0 1 apple pan the 10801 w. pico blvd. west la \n", "1 2 asahi ramen 2027 sawtelle blvd. west la \n", "2 3 baja fresh 3345 kimber dr. westlake village \n", "3 4 belvedere the 9882 little santa monica blvd. beverly hills \n", "4 5 benitas frites 1433 third st. promenade santa monica \n", "\n", " phone type \n", "0 310-475-3585 american \n", "1 310-479-2231 noodle shops \n", "2 805-498-4049 mexican \n", "3 310-788-2306 pacific new wave \n", "4 310-458-2889 fast food " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B.head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "('id', 'id')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the keys of the input tables\n", "em.get_key(A), em.get_key(B)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Block tables to get candidate set\n", "\n", "Before we do the matching, we would like to remove the obviously non-matching tuple pairs from the input tables. This would reduce the number of tuple pairs considered for matching.\n", "*py_entitymatching* provides four different blockers: (1) attribute equivalence, (2) overlap, (3) rule-based, and (4) black-box. The user can mix and match these blockers to form a blocking sequence applied to input tables.\n", "\n", "For the matching problem at hand, we know that two restaurants with no overlap between the names will not match. So we decide the apply blocking over names:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ob = em.OverlapBlocker()\n", "C = ob.block_tables(A, B, 'name', 'name', \n", " l_output_attrs=['name', 'addr', 'city', 'phone'], \n", " r_output_attrs=['name', 'addr', 'city', 'phone'],\n", " overlap_size=1, show_progress=False)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
_idltable_idrtable_idltable_nameltable_addrltable_cityltable_phonertable_namertable_addrrtable_cityrtable_phone
0010361pacific pan pacific hotel500 post st.san francisco415/929-2087apple pan the10801 w. pico blvd.west la310-475-3585
1110167hyde street bistro1521 hyde st.san francisco415/441-7778bistro 4545 s. mentor ave.pasadena818-795-2478
225527pinot bistro12969 ventura blvd.los angeles818/990-0500bistro 4545 s. mentor ave.pasadena818-795-2478
336527bistro garden176 n. canon dr.los angeles310/550-3900bistro 4545 s. mentor ave.pasadena818-795-2478
449917bistro roti155 steuart st.san francisco415/495-6500bistro 4545 s. mentor ave.pasadena818-795-2478
\n", "
" ], "text/plain": [ " _id ltable_id rtable_id ltable_name ltable_addr \\\n", "0 0 1036 1 pacific pan pacific hotel 500 post st. \n", "1 1 1016 7 hyde street bistro 1521 hyde st. \n", "2 2 552 7 pinot bistro 12969 ventura blvd. \n", "3 3 652 7 bistro garden 176 n. canon dr. \n", "4 4 991 7 bistro roti 155 steuart st. \n", "\n", " ltable_city ltable_phone rtable_name rtable_addr \\\n", "0 san francisco 415/929-2087 apple pan the 10801 w. pico blvd. \n", "1 san francisco 415/441-7778 bistro 45 45 s. mentor ave. \n", "2 los angeles 818/990-0500 bistro 45 45 s. mentor ave. \n", "3 los angeles 310/550-3900 bistro 45 45 s. mentor ave. \n", "4 san francisco 415/495-6500 bistro 45 45 s. mentor ave. \n", "\n", " rtable_city rtable_phone \n", "0 west la 310-475-3585 \n", "1 pasadena 818-795-2478 \n", "2 pasadena 818-795-2478 \n", "3 pasadena 818-795-2478 \n", "4 pasadena 818-795-2478 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Match tuple pairs in candidate set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this step, we would want to match the tuple pairs in the candidate set. Specifically, we use learning-based method for matching purposes.\n", "This typically involves the following four steps:\n", "\n", "1. Sampling and labeling the candidate set\n", "2. Train matcher using labeled data\n", "3. Predict the matches in the candidate set using trained matcher" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sampling and labeling the candidate set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we randomly sample 450 tuple pairs for labeling purposes." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Sample candidate set\n", "S = em.sample_table(C, 450)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we label the sampled candidate set. Specify we would enter 1 for a match and 0 for a non-match." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Label S\n", "G = em.label_table(S, 'gold')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the purposes of this guide, we will load in a pre-labeled dataset (of 450 tuple pairs) included in this package." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "450" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "path_G = em.get_install_path() + os.sep + 'datasets' + os.sep + 'end-to-end' + os.sep + 'restaurants/lbl_restnt_wf1.csv'\n", "G = em.read_csv_metadata(path_G, \n", " key='_id',\n", " ltable=A, rtable=B, \n", " fk_ltable='ltable_id', fk_rtable='rtable_id')\n", "len(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train matcher using labeled data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we need to create a set of features.*py_entitymatching* provides a way to automatically generate features based on the attributes in the input tables. For the purposes of this guide, we use the automatically generated features." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Generate features automatically \n", "feature_table = em.get_features_for_matching(A, B)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we convert the labeled data to feature vectors using the feature table" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Select the attrs. to be included in the feature vector table\n", "attrs_from_table = ['ltable_name', 'ltable_addr', 'ltable_city', 'ltable_phone',\n", " 'rtable_name', 'rtable_addr', 'rtable_city', 'rtable_phone']\n", "# Convert the labeled data to feature vectors using the feature table\n", "H = em.extract_feature_vecs(G, \n", " feature_table=feature_table, \n", " attrs_before = attrs_from_table,\n", " attrs_after='gold',\n", " show_progress=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we train the learning-based matcher using the feature vectors. For the purposes of the guide, we will use Random Forest matcher that is included in the *py_entitymatching* package." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Instantiate the RF Matcher\n", "rf = em.RFMatcher()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Get the attributes to be projected while training\n", "attrs_to_be_excluded = []\n", "attrs_to_be_excluded.extend(['_id', 'ltable_id', 'rtable_id', 'gold'])\n", "attrs_to_be_excluded.extend(attrs_from_table)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Train using feature vectors from the labeled data.\n", "rf.fit(table=H, exclude_attrs=attrs_to_be_excluded, target_attr='gold')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predict the matches in the candidate set using trained matcher" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we use the trained matcher to predict matches in the candidate set. To do that, first we need to convert the candidate set to feature vectors." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Select the attrs. to be included in the feature vector table\n", "attrs_from_table = ['ltable_name', 'ltable_addr', 'ltable_city', 'ltable_phone',\n", " 'rtable_name', 'rtable_addr', 'rtable_city', 'rtable_phone']\n", "# Convert the cancidate set to feature vectors using the feature table\n", "L = em.extract_feature_vecs(C, feature_table=feature_table,\n", " attrs_before= attrs_from_table,\n", " show_progress=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we predict the matches in the candidate set using the trained matcher and the feature vectors." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Get the attributes to be excluded while predicting \n", "attrs_to_be_excluded = []\n", "attrs_to_be_excluded.extend(['_id', 'ltable_id', 'rtable_id'])\n", "attrs_to_be_excluded.extend(attrs_from_table)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Predict the matches\n", "predictions = rf.predict(table=L, exclude_attrs=attrs_to_be_excluded, \n", " append=True, target_attr='predicted', inplace=False)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
_idltable_idrtable_idltable_nameltable_addrltable_cityltable_phonertable_namertable_addrrtable_city...city_city_swtype_type_jac_qgm_3_qgm_3type_type_cos_dlm_dc0_dlm_dc0type_type_jac_dlm_dc0_dlm_dc0type_type_meltype_type_lev_disttype_type_lev_simtype_type_nmwtype_type_swpredicted
0010361pacific pan pacific hotel500 post st.san francisco415/929-2087apple pan the10801 w. pico blvd.west la...1.00.0000000.00.00.5138896.00.2500000.01.00
1110167hyde street bistro1521 hyde st.san francisco415/441-7778bistro 4545 s. mentor ave.pasadena...2.00.2222220.00.00.6341997.00.3636360.03.00
225527pinot bistro12969 ventura blvd.los angeles818/990-0500bistro 4545 s. mentor ave.pasadena...1.00.0000000.00.00.4242429.00.181818-3.02.00
336527bistro garden176 n. canon dr.los angeles310/550-3900bistro 4545 s. mentor ave.pasadena...1.01.0000001.01.01.0000000.01.00000011.011.00
449917bistro roti155 steuart st.san francisco415/495-6500bistro 4545 s. mentor ave.pasadena...2.00.0000000.00.00.4242429.00.181818-3.02.00
\n", "

5 rows × 48 columns

\n", "
" ], "text/plain": [ " _id ltable_id rtable_id ltable_name ltable_addr \\\n", "0 0 1036 1 pacific pan pacific hotel 500 post st. \n", "1 1 1016 7 hyde street bistro 1521 hyde st. \n", "2 2 552 7 pinot bistro 12969 ventura blvd. \n", "3 3 652 7 bistro garden 176 n. canon dr. \n", "4 4 991 7 bistro roti 155 steuart st. \n", "\n", " ltable_city ltable_phone rtable_name rtable_addr \\\n", "0 san francisco 415/929-2087 apple pan the 10801 w. pico blvd. \n", "1 san francisco 415/441-7778 bistro 45 45 s. mentor ave. \n", "2 los angeles 818/990-0500 bistro 45 45 s. mentor ave. \n", "3 los angeles 310/550-3900 bistro 45 45 s. mentor ave. \n", "4 san francisco 415/495-6500 bistro 45 45 s. mentor ave. \n", "\n", " rtable_city ... city_city_sw type_type_jac_qgm_3_qgm_3 \\\n", "0 west la ... 1.0 0.000000 \n", "1 pasadena ... 2.0 0.222222 \n", "2 pasadena ... 1.0 0.000000 \n", "3 pasadena ... 1.0 1.000000 \n", "4 pasadena ... 2.0 0.000000 \n", "\n", " type_type_cos_dlm_dc0_dlm_dc0 type_type_jac_dlm_dc0_dlm_dc0 \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 1.0 1.0 \n", "4 0.0 0.0 \n", "\n", " type_type_mel type_type_lev_dist type_type_lev_sim type_type_nmw \\\n", "0 0.513889 6.0 0.250000 0.0 \n", "1 0.634199 7.0 0.363636 0.0 \n", "2 0.424242 9.0 0.181818 -3.0 \n", "3 1.000000 0.0 1.000000 11.0 \n", "4 0.424242 9.0 0.181818 -3.0 \n", "\n", " type_type_sw predicted \n", "0 1.0 0 \n", "1 3.0 0 \n", "2 2.0 0 \n", "3 11.0 0 \n", "4 2.0 0 \n", "\n", "[5 rows x 48 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "predictions.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, project the attributes and the predictions from the predicted table." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Get the attributes to be projected out\n", "attrs_proj = []\n", "attrs_proj.extend(['_id', 'ltable_id', 'rtable_id'])\n", "attrs_proj.extend(attrs_from_table)\n", "attrs_proj.append('predicted')\n", "\n", "# Project the attributes\n", "predictions = predictions[attrs_proj]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
_idltable_idrtable_idltable_nameltable_addrltable_cityltable_phonertable_namertable_addrrtable_cityrtable_phonepredicted
0010361pacific pan pacific hotel500 post st.san francisco415/929-2087apple pan the10801 w. pico blvd.west la310-475-35850
1110167hyde street bistro1521 hyde st.san francisco415/441-7778bistro 4545 s. mentor ave.pasadena818-795-24780
225527pinot bistro12969 ventura blvd.los angeles818/990-0500bistro 4545 s. mentor ave.pasadena818-795-24780
336527bistro garden176 n. canon dr.los angeles310/550-3900bistro 4545 s. mentor ave.pasadena818-795-24780
449917bistro roti155 steuart st.san francisco415/495-6500bistro 4545 s. mentor ave.pasadena818-795-24780
\n", "
" ], "text/plain": [ " _id ltable_id rtable_id ltable_name ltable_addr \\\n", "0 0 1036 1 pacific pan pacific hotel 500 post st. \n", "1 1 1016 7 hyde street bistro 1521 hyde st. \n", "2 2 552 7 pinot bistro 12969 ventura blvd. \n", "3 3 652 7 bistro garden 176 n. canon dr. \n", "4 4 991 7 bistro roti 155 steuart st. \n", "\n", " ltable_city ltable_phone rtable_name rtable_addr \\\n", "0 san francisco 415/929-2087 apple pan the 10801 w. pico blvd. \n", "1 san francisco 415/441-7778 bistro 45 45 s. mentor ave. \n", "2 los angeles 818/990-0500 bistro 45 45 s. mentor ave. \n", "3 los angeles 310/550-3900 bistro 45 45 s. mentor ave. \n", "4 san francisco 415/495-6500 bistro 45 45 s. mentor ave. \n", "\n", " rtable_city rtable_phone predicted \n", "0 west la 310-475-3585 0 \n", "1 pasadena 818-795-2478 0 \n", "2 pasadena 818-795-2478 0 \n", "3 pasadena 818-795-2478 0 \n", "4 pasadena 818-795-2478 0 " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "predictions.head()" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3.0 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }