The question in the subject usually crosses my mind when I read the web stats for this website. Sometimes, many people arrive from some Google search query I wouldn't even imagine leads to me (say, in the first 5-6 pages of Google search results people actually bother to read).
So, to fulfill my narcissistic curiosity, I wrote a Perl script that answers the question by running an actual Google search and going over the result pages one by one until it finds a link to a specified website. As any self-respecting lazy programmer, I first tried to find ready-made solutions. They exist, but almost ubiquitously use the Google SOAP API. Using a SOAP API is indeed a better idea than hand-parsing the HTML, but unfortunately Google no longer supports it. A notice on their website says:As of December 5, 2006, we are no longer issuing new API keys for the SOAP Search API. Developers with existing SOAP Search API keys will not be affected.Instead of SOAP, Google now provides an Ajax search API (which I actually use for this website). I think I understand why they do this. While the original intention of the SOAP API was for people to easily integrate Google search on their websites, it was used more to harvest results from Google searches by automatic scripts (like my own) for various purposes, including SEO. So once the Javascript libraries became stable and supported enough, Google ditched SOAP and now provides the Ajax API that can't be really used for anything more than integrating a simple search box on a website. So I decided to take the old and tested path -- executing HTTP queries, getting results back and parsing them. find_google_link issues a Google search query using WWW:Mechanize, and gets the HTML of the results page back. This page is parsed to see how many results there are for the query. Then, in a loop, it issues queries for successive search results pages and stops once it find a page on which a link to the website appears. The parsing is done with
HTML::TreeBuilder
(a very convenient interface to HTML::Parser
).