SEARCH ENGINE OPTIMIZATION | Part Two
Search Engine
Search Engine General Principle
To know further concerning this search engine performance, there are some things need to talk first, especially related to architecture problem and mechanism of the search engine.
Spider
Is the program that downloads the pages they find, spider looks like browser. Its difference is at browser directly presents the information (picture, text, etc) for the people need at that moment. While spider isn’t do that to present in the form which is seen like that, because its interest is for machine not for human, even spider is run by machine automatically. Its interest is to take the visited pages to be kept into the database owned by search engine.
Crawler
Is the program owned by search engine to trace and find link which is from each pages met. Its duty is to determine spider have to go where and evaluate link according to the early determined address. Crawler follows link and try to find the unheard document by search engine.
Indexer
This component do activity to elaborate each pages and check various element, such as text, headers, structure or write style feature, special HTML tag, etc.
Database
Is the standard place to saves data from pages which have been visited, downloaded and have been analyzed, once in a while referred with index from search engine.
Result Engine
The machine does the classification and rank determination from search result at search engine. This machine determines which page met best criteria from search result by user request, and how the appearance will be presented.
This process is done based on ranking algorithm owned by search engine. According to the page ranking method used by them are their rights, researchers has been studying the nature which they use, especially to improve the search result by that search engine.
Web Server
Is computer serving request and give reply from that request. These web servers usually produce the information or document in HTML form. On the page is available service to fill in the search keyword wanted by user. This web server also responsible to sent over search result which is asked to the computer that ask for the information.
Way The Search Engine Work
Search engine such as many we are known, not really do search to the entire World Wide Web directly. Search is done through databases which save text from each page outside there. Text from each pages are kept into their database server.
When you do the web search by using search engine, actually it searches the copy page which is kept in their database that contains the copy page at the last time they are visited. When you click provided link in the search result page done by search engine, the fact is the address given from server search engine through actual version in their database.
Database on the search engine selected and classify by robot program which is called spider. Though they analyze the page will be taken and kept into their database, in reality can be said that they take it from one place.
Then to find the other potential page, they relate to links on their pages which have been kept in their database. This machine cannot write down URL and think the pages will them to try to visit. Computer truly sophisticated more and more, but they just cannot think creative as like as human in general.
If a web page have never been linked from other page, then search engine spider will not find the page, they are only watch from their database.
To sites that are really new, and there is no other site which make their link to the newly site, it’s absolutely sure the spider will not know it. Way to make the newly site can be enlist at search engine, specially to site which is not linked yet by another site, is by giving the direct notification to the search engine that there is new site, this is done by human. Most of all search engines give the facility or offer for this matter.
After spider finds a page, they will send the invention result to other computer to be indexed. This program identify the text, link and other content on the page, then save it into their database file, so that through the database can be done the keyword search or anything in advanced which they offer, till you find one site which match with your desire.
Some types of page and link are not packed into most search engine because some reasons. Some the others nor entered because spider cannot access the page. The page which is not packed into the index referred with “invisible web”, where you cannot directly see the page. There are large numbers of this kind of page, it reaches bigger than the page may be presented.
Hidden Page
Page may be peeped out from search result at search engine referred with “visible web”, while the page may not be peeped out or cannot be seen, because some rules, either on the site owner request, or because something else, then referred to “invisible web”. First version of this invisible webpage begin to be hot topic since 2000, when at that moment there are many pages was not presented by search result, so that technically arranged and created special procedure that capable to handle that problems.
In this time there are some page types use this facility, the page cannot be seen but creates in such a manner to be in webpage so that can available. The types are:
- Page with non HTML form, such as PDF, Word, Excel, Corell Suite, etc, these documents are translated into HTML form, so that in this time most of popular search engine can present the translation result.
- Page base on script that is link with special symbols in language program form, this matter have been handled and almost all search engine have this ability.
- Dynamic form page through program that create database (like Active Server Pages, Cold Fusion, PHP, etc) could be indexed when it has stable/fix URL that could be found by search engine.
Why be Hidden?
There still some restriction for search engine to search for invisible file, so that search engine cannot give the search result.
- Search engine still not thought this solution yet. If one page want to go to another page need type process, then it is restriction for the search engine, because machine cannot do it. They cannot do searching to catalog, they cannot fill login or password, etc.
- Page that asked not to be indexed by search engine. For this matter, search engine do this pursuant to rules which have been specified. There no technical reason for this matter, every sites owner or search engine owner have ability for not to index their pages if they wish that way.
If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting! And please feel free to comment.



