Searching the Web--and being found

Search sites are perhaps *the* most common starting point for Web exploration. Search sites include "robot powered" search engines and "human powered" subject guides to the 'net.

Readings

castro6.gif Chapter 24 .

Google bombing (wikipedia) -- George W. Bush's site was the top result in 2006 for Google searches for "miserable failure". Why/how?

Robot powered search engines

Search engine's approach the internet by indexing pages with a robot, a piece of software that automatically reads pages, filing words that it finds into a database, and then following links to more pages. Robots or crawlers examine some portion of a web page, and note every significant word on the page, and the address of the page.

Big sites, like Google, must maintain a database of nearly the same size as all the pages it indexes--a large fraction of the web.

Search engine's find out about new pages by three methods,

It typically takes several months for search engines to find a new page on the Web (without submission), let alone a new site. This can vary depending upon how high the profile of the site is (NY Times vs. your personal page).

Recent stats (see especially searchenginewatch.com's reports and Nielsen NetRatings results) find that Google is the most popular site (about 50% of all searches) followed by Yahoo, MSN and others.

In a 1999 research paper* Lawrence and Giles found:


Search engines differ from each other in many respects, including:

Human powered subject guides

Subject guides are compiled by humans, much as a librarian would compile a subject index for a traditional card catalog.

Subject guides typically cover far fewer pages on the web, but far better. They often link only to the home page of a site.

Subject guides are often good starting points if you don't know exactly what kind of resources you're looking for.

Some highlights

Searching tips

The main tip (Thanks to Sally Jo Milne) is to choose one or two search engines, and then get familiar with them, and read the help files.

Boolean searching

You should know the difference between pages which contain mention of:
cats or dogs. (Hotbot lingo "any words")








cats and dogs (Hotbot lingo "all words")









cats and not dogs (on Google, you'd search for +dogs -cats)








Other search refinements:

Solve crimes with Google!

SEO - Search Engine Optimization: Writing for top billing


If someone types "grackles" into a search engine site, how can you insure that your page will come up on the first page of search results? Here are some of the things that search engines consider in deciding on their page rankings:

Google also includes in its index words in the alt attribute of images. See this chart comparing what tags search engines pay attention to.

<meta> tags

<HTML>
<HEAD>
<TITLE>Poems about rackles</TITLE>
<META NAME="Author" CONTENT="Paul Meyer Reimer">
<META NAME="Description" CONTENT="This page is an homage to grackles">
<META NAME="Keywords" CONTENT="grackle, birds, poetry, writing">
</HEAD>

Meta tags contain information about the page, but are not displayed on the page. Search engines now mostly ignore the meta keywords tag, because they were too heavily spammed. So currently meta tags are mainly useful for your own site's crawlers, except for...

Controlling the (nice) robots


Some useful, and fairly self-explanatory tags. No guarantee that spam-bots pay any attention, of course.
<META NAME="ROBOTS" CONTENT="NOINDEX">
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
For more related information, search the web for robots.txt.