--- title: Client side scraping for contacts date: "2009-03-06T17:01:56Z" lastmod: "2009-03-09T08:17:08Z" categories: - coding wp_id: 2292 description: I built a tool to automate contact management by scraping LinkedIn profiles. It uses the Google AJAX Search API to find URLs and YQL with XPath to extract public profile data using client-side JavaScript. keywords: [client-side scraping, yql, google ajax search api, xpath, linkedin, javascript] --- By curious coincidence, just a day after my post on [client side scraping](/blog/client-side-scraping/), I had a chance to demo this to a client. They were making a contacts database. Now, there are two big problems with managing contacts. 1. Getting complete information 2. Keeping it up to date Now, people happy to fill out information about themselves in great detail. If you look at the public profiles on [LinkedIn](http://www.linkedin.com/), you’ll find enough and more details about most people. Normally, when getting contact details about someone, I search for their name on Google with a “site:linkedin.com” and look at that information. **Could this be automated?** I spent a couple of hours and came up with a primitive [contacts scraper](/get-contacts.html). Click on the link, type in a name, and you should get the LinkedIn profile for that person. (**Caveat**: It’s very primitive. It works only for specific URL public profiles. Try ‘Peter Peverelli’ as an example.) It uses two technologies. [Google AJAX Search API](http://code.google.com/apis/ajaxsearch/) and [YQL](http://developer.yahoo.com/yql/). The `search()` function searches for a phrase… ```javascript google.load("search", "1"); google.setOnLoadCallback(function() { gs = new google.search.WebSearch(); $("#getinfo").show(); }); function search(phrase, fn) { gs.setSearchCompleteCallback(gs, function() { fn(this.results); }); gs.execute(phrase); } ``` … and the `linkedin()` function takes a LinkedIn URL and extracts the relevant information from it, using XPath. ```javascript function scrape(url, xpath, fn) { $.getJSON("http://query.yahooapis.com/v1/public/yql?callback=?", { q: "select * from html where(url=\"" + url + "\" and xpath=\"" + xpath + "\")", format: "json", }, fn); } function linkedin(url, fn) { scrape(url, "//li[@class][h3]", fn); } ``` So if you wanted to find Peter Peverelli, it searches on Google for “[Peter Peverelli site:linkedin.com](http://www.google.com/search?q=Peter+Peverelli+site%3Alinkedin.com)” and picks the first result. From this result, it displays all the `