|
|
how to use WebWatch
|
||||
| Internet in Belgium | |||||
How to use Webwatch ?
Webwatch is a unique search engine that allows you to find Belgian Internet sites in two ways: by a subject index or a keyword search.
Index or crawler ?
The Index groups approximately 7800 (see current status) sites according to subject categories, and includes short descriptions of the sites. It is best used to find a variety of sites that treat the same subject or to browse through different subjects to find new and interesting sites. New sites are added by their creators who wish to alert the public of their site's existance, or by the Webwatch team. The Index is compiled by our staff at DAD who check the URL's, the descriptions, and the subject headings for accuracy. The Index also contains sections devoted to 'New' sites, 'Cool' sites and 'Must' sites. New sites are sites added in the past ten days. Cool sites are sites of superior content or quality picked by the Webwatch team. Must sites are generally business related, of use to the Internet community.
For example, if you are interested in finding information about automobiles, use the index by clicking on the subject 'Business and Economy / Cars'. All of the sites that have been submitted to Webwatch that deal with automobiles will be listed with short descriptions.
Optimal use of the crawler ...
The Crawler is an automated robot that continuously searches the Internet for new Belgian sites, or sites of Belgian interest. It methodically indexes the contents of the sites, finding keywords that can be used for searches, and discovering new links to search. The crawler is best used to find combinations of words that occur in the same page, allowing very detailed searches for specific information. There are many more sites contained in the Crawler's database than in the Index.
The key word search automatically looks for all of the words in the search field. If you type 'sports cars' into the search box, the crawler will automatically look for all of the pages it has indexed that contain BOTH words. The crawler does not differentiate between capital letters or letters with accents. E, é, è, ê, and e are all considered the same letter: e.
You can change the search options by clicking on the 'options' link. Here, you may:
- specify whether you would like to search for ALL of the words in the search field or ANY of the words in the search field.
- specify whether the crawler will return 10 hits or 25 hits.
- specify detailed results or summary results. Summary results display the titles of the sites found. Detailed results display the title, file size, first three lines of text, and URL, allowing a selection to be made from among the sites found without having to visit each address.
- specify that you are searching for the beginning of a word. By typing 'auto' and specifying this option, the crawler will find not only 'automobile' but 'autodidactic' and 'automaton'.
- specify sites entered within the past day, week, month, or last update. This option can make repeat searches more effective.
- specify searches uniquely by URL, Title, or Headers.
Relevance Ranking When Webwatch parses a file during indexing it notes the location and frequency of keywords that it encounters. Words in particular HTML tags are "weighted" heavier. An unweighted word, which would be a word that does not appear in any of the weighting tags, earns a value of 1 for each occurrence on the page. Words in the following tags get additional weighting as follows:
Words in the TITLE tag are weighted as 10 times heavier than an unweighted word Words included as META KEYWORDS are also weighted 10 times heavier than unweighted words Words found in one of the header tags are weighted 5 times heavier than unweighted words Words contained in a hyperlink are weighted 3 times heavier than unweighted words
If the word "Gretzky" appears once in the title, twice in headers, and 6 times throughout the rest of the page it would receive a total weighting of 26 (1x10 + 2x5 + 6x1).
This weighting is used to determine relevance. If you search for "Gretzky", the above page would return a weighting of 26. If there are other pages with the word "Gretzky" they will be ranked according to the same scoring system. The higher the number, the higher the relevance ranking.
Search Tips Numbers
Numbers indexing can be turned on or off in Webwatch by the robot administrator. Turning numbers off makes Webwatch indexes smaller. It may, therefore, not be possible to search for numbers appearing in a document.
Noise Words
Webwatch does not include common words such as "also", "been" and "there" in the index. These words are ignored when searching. If one or more is used in a search a message will be displayed on the results page indicating which words were noise words. The administrator can edit the list of noise words; see the Reference Guide for more information.
Punctuation
Punctuation is not indexed in Webwatch, therefore all punctuation is removed from a search string. This includes the "@" character which will affect e-mail addresses. The e-mail address "wayne@greatone.com" will be indexed in Webwatch as three separate words: "wayne", "greatone" and "com". A search for "wayne@greatone.com" will therefore become a search for the words "wayne", "greatone" and "com", and will find pages with The Great One's e-mail address on it. The same applies for hyphenated words such as "CD-ROM".
Word Size
Webwatch does not index words with 2 or fewer letters. Such words will be ignored in a search.
Accented Characters
Webwatch supports special characters with ISO Latin encoding, however, it does not index characters with accent marks. Accented characters are indexed without the accent. Webwatch also converts HTML entities into the base character for inclusion in the index.
It is important to note that for searching the following characters are all considered equal:
e, E, é (é), É (É), ë (ë), etc.
So a search for "Café" will also find "Cafe", "CAFE", "Cafë", etc. Searching for "Café" will not work.
The Internet is an evolving, living network. Although we try to keep the system accurate and timely, URLs contained in the Crawler and Index may not work if the server is down or offline, if the site has been removed, or if the site has been moved to another location. If you have problems accessing a site found on Webwatch, please alert us by sending a note to lips@dad.be.
More engines
Current status
Add your site
Help
About WebWatch