--- title: Automating Internet Explorer with jQuery date: "2008-05-18T12:00:00Z" lastmod: "2009-03-17T13:04:50Z" categories: - coding wp_id: 52 ---

Most of my screen-scraping so far has been through Perl (typically WWW::Mechanize). The big problem is that it doesn't support Javascript, which can often be an issue:

There are many approaches to overcoming this. The easiest is to use Win32::IE::Mechanize, which uses Internet Explorer in the background to actually load the page and do the scraping. It's a bit slower than scraping just the HTML, but it'll get the job done.

Another is to use Rhino. John Resig has written env.js that mimics the browser environment, and on most simple pages, it handles the Javascript quite well.

I would rather have a hybrid of both approaches. I don't like the WWW::Mechanize interface. I've gotten used to jQuery's rather powerful selectors and chainability. So I'll tell you a way of using jQuery to screen-scrape offline using Python. (It doesn't have to be Python. Perl, Ruby, Javascript... any scripting language that can use COM on Windows will work.)

Let's take Google Video. Currently, it relies almost entirely on Javascript. The video marked in red below appears only if you have Javascript.

The left box showing the top video uses Javascript

I'd like an automated way of checking what video is on top on Google Video every hour, and save the details. Clearly a task for automation, and clearly not one for pure HTML-scraping.

I know the video's details are stored in elements with the following IDs (thanks to XPath checker):

IDWhat's there
hs_title_linkLink to the video
hs_duration_dateDuration and date
hs_ratingsRatings. The stars indicate the rating and the span.Votes element inside it has the number of people who rated it.
hs_siteThe site that hosts the video
hs_descriptionShort description

So I could do the following on Win32::IE::Mechanize.

```perl use Win32::IE::Mechanize; my $ie = Win32::IE::Mechanize->new( visible => 1 ); $ie->get("http://video.google.com/"); my @links = $ie->links # ... then what? ```

I could go through each link to extract the hs_title_link, but there's no way to get the other stuff.

Instead, we could take advantage of a couple of facts:

Let's take this step by step. This Python program opens IE, loads Google Video and prints the text.

```python # Start Internet Explorer import win32com.client ie = win32com.client.Dispatch("InternetExplorer.Application") # Display IE, so you'll know what's happening ie.visible = 1 # Go to Google Video ie.navigate("http://video.google.com/") # Wait till the page is loaded from time import sleep while ie.Busy: sleep(0.2) # Print the contents # Watch out for Unicode print ie.document.body.innertext.encode("utf-8") ```

The next step is to add jQuery to the Google Video page.

```python # Add the jQuery script to the browser def addJQuery(browser, url="http://jqueryjs.googlecode.com/files/jquery-1.2.4.js" document = browser.document window = document.parentWindow head = document.getElementsByTagName("head")[0] script = document.createElement("script") script.type = "text/javascript" script.src = url head.appendChild(script) while not window.jQuery: sleep(0.1) return window.jQuery jQuery = addJQuery(ie) ```

Now the variable jQuery contains the Javascript jQuery object. From here on, you can hardly tell if you're working in Javascript or Python. Below are the expressions (in Python!) to get the video's details.

```python # Video title: "McCain's YouTube Problem ..." jQuery("#hs_title_link").text() # Title link: '/videoplay?docid=1750591377151076231' jQuery("#hs_title_link").attr("href") # Duration and date: '3 min - May 18, 2008 - ' jQuery("#hs_duration_date").text() # Rating: 5.0 jQuery("#hs_ratings img").length # Number of ratings '(8,288 Ratings) ' jQuery("#hs_ratings span.Votes").text() # Site: 'Watch this video on youtube.com' jQuery("#hs_site").text() # Video description jQuery("#hs_description").text() ```

This wouldn't have worked out as neatly in Perl, simply because you'd need to use -> instead of . (dot). With Python (and with Ruby and Javascript on cscript), you can almost cut-and-paste jQuery code.

If you want to click on the top video link, use:

```python jQuery("#hs_title_link").get(0).click() ```

In addition, you can use the keyboard as well. If you want to type username TAB password, use this:

```python shell = win32com.client.Dispatch("WScript.Shell") shell.sendkeys("username{TAB}password") ```

You can use any of the arrow keys, control keys, etc. Refer to the SendKeys Method on MSDN.

--- ## Comments - **Franky** _18 May 2008 12:00 pm_: Holy crap. I am in way over my head here, but this is pretty sweet stuff if I'm reading it correctly. Thanks for the info. - **[Alessandro](http://www.mathema.com)** _5 Jul 2009 11:50 am_: The idea is really good. Unfortunately it doesn't seem to work under Perl. When a JS function/script is invoked (jQuery is a JS function) IE does not return a COM object and the control from Perl is lost. There are, of course, methods to overcome the obstacle but not as simple as $window->jQuery! - **Anonymous** _5 Mar 2010 11:25 pm_: Wow, kick ass! Exactly what I was looking for. Used AutoIt on a host machine to instantiate the IE object and thanks to you was able to use jQuery to do my scraping. Combined with the IE Developer Toolbar, this has probably saved me weeks of time. - **Manikandan** _19 Nov 2009 3:30 pm_: I am trying to do a screenscrape and the code executes jquery. Whenever I had to execute javascript i could execute it by ie.navigate(javascript:....) But when I tried the same thing with jquery it does not execute. The code behind is something like this I tried ie.navigate(jQuery1258644477483="2") It did not thing. Appreciate your help!!! - **[Powershell: Including jQuery in InternetExplorer.Application | DeveloperQuestion.com](http://developerquestion.com/powershell-including-jquery-in-internetexplorer-application/)** _30 Oct 2010 1:32 pm_ _(pingback)_: [...] script document I Followed this article, explaining how to accomplish in Python what I want to do in Powershell. So now I have this [...] - **Joe** _27 Nov 2010 2:42 pm_: Hello Mr. Anand, I am trying to do this in Ruby, but I am not sure how to start. Maybe I am lucky and you have you moved to ruby since May 2008? You start with import win32com.client, and I suppose that is to support an IE browser API. I have Firefox, Chrome, Safari and others available to me. The basic aim I guess is to inject JQuery into my page object. Mechanize provides me a page. I am that far. How do I it give it JQuery superpowers. Mechanize uses Nokogiri selectors under the hood. But I would love to standardize on JQuery because it's useful in so many other contexts...in the way that regex is. Joe - **[Powershell: Using jQuery through InternetExplorer.Application](http://fatal-errors.com/powershell-using-jquery-through-internetexplorer-application/86850)** _12 Feb 2012 2:48 am_ _(pingback)_: [...] followed this article, explaining how to spice up an Internet Explorer COM-Object with jQuery. While the author used [...]