Is there a way to programmatically access Google’s search engine results? [closed]

After finding this question I have been researching as the other answers seem out of date. The Google search API would be the obvious choice as quoted by other users however it is now been deprecated in favour of Custom Search API. Although not obvious at first the Custom Search API does allow you to … Read more

Why does the Google homepage use deprecated HTML (ie. is not valid HTML5)?

I attended a panel at SXSW a few years ago called “F*ck Standards” which was all about breaking from standards when it makes sense. There was a Google engineer on the panel who talked about the Google home page failing validation, using deprecated tags, etc. He said it was all about performance. He specifically mentioned … Read more

Looking for special characters in Google [closed]

Update: this answer is no longer applicable as of 2017. See https://blog.google/products/search/improvements-searching-special-characters-programming-languages/ Google strips most punctuation from queries, as described here, so it won’t help you with the bash syntax. It’s very easy to search for the string “##” in the bash documentation: Just run “info bash”, hit “s”, and enter “##” as the search … Read more

Designing a web crawler

If you want to get a detailed answer take a look at section 3.8 this paper, which describes the URL-seen test of a modern scraper: In the course of extracting links, any Web crawler will encounter multiple links to the same document. To avoid downloading and processing a document multiple times, a URL-seen test must … Read more