Search Engine:
A search
engine is a software system that is designed to carry
out web searches. They search the World Wide Web in a systematic
way for particular information specified in a textual web search query.
The search results are generally presented in a line of results,
often referred to as search engine results pages (SERPs) The
information may be a mix of links to web pages, images, videos, infographics,
articles, research papers, and other types of files. Some search engines
also mine data available in databases or open directories.
Unlike web directories, which are maintained only by human editors, search
engines also maintain real-time information by running an algorithm on
a web crawler. Internet content that is not capable of being searched by a
web search engine is generally described as the deep web.
A
search engine maintains the following processes in near real time:
1.
Web
crawling
2.
Indexing
3.
Searching
Web
search engines get their information by web crawling from site to
site. The "spider" checks for the standard filename robots.txt,
addressed to it. The robots.txt file contains directives for search spiders,
telling it which pages to crawl and which pages not to crawl. After checking
for robots.txt and either finding it or not, the spider sends certain
information back to be indexed depending on many factors, such as the
titles, page content, JavaScript, Cascading Style Sheets (CSS),
headings, or its metadata in HTML meta tags. After a certain number of
pages crawled, amount of data indexed, or time spent on the website, the spider
stops crawling and moves on. "[N]o web crawler may actually crawl the
entire reachable web. Due to infinite websites, spider traps, spam, and other
exigencies of the real web, crawlers instead apply a crawl policy to determine
when the crawling of a site should be deemed sufficient. Some websites are
crawled exhaustively, while others are crawled only partially".
Indexing
means associating words and other definable tokens found on web pages to their
domain names and HTML-based fields. The associations are made in a public
database, made available for web search queries. A query from a user can be a
single word, multiple words or a sentence. The index helps find information
relating to the query as quickly as possible
Between
visits by the spider, the cached version of page (some or all the
content needed to render it) stored in the search engine working memory is
quickly sent to an inquirer. The cached page holds the appearance of the
version whose words were previously indexed, so a cached version of a page can
be useful to the web site when the actual page has been lost, but this problem
is also considered a mild form of linkrot.
Typically
when a user enters a query into a search engine it is a few keywords. The index already
has the names of the sites containing the keywords, and these are instantly
obtained from the index. The real processing load is in generating the web
pages that are the search results list: Every page in the entire list must
be weighted according to information in the indexes.
The
usefulness of a search engine depends on the relevance of the result
set it gives back. While there may be millions of web pages that
include a particular word or phrase, some pages may be more relevant, popular,
or authoritative than others. Most search engines employ methods to rank the
results to provide the "best" results first. How a search engine
decides which pages are the best matches, and what order the results should be
shown in, varies widely from one engine to another. The methods also
change over time as Internet usage changes and new techniques evolve. There are
two main types of search engine that have evolved: one is a system of
predefined and hierarchically ordered keywords that humans have programmed
extensively. The other is a system that generates an "inverted index"
by analyzing texts it locates. This first form relies much more heavily on the
computer itself to do the bulk of the work.
Most
Web search engines are commercial ventures supported by advertising revenue
and thus some of them allow advertisers to have their listings ranked
higher in search results for a fee. Search engines that do not accept
money for their search results make money by running search related ads alongside
the regular search engine results. The search engines make money every time
someone clicks on one of these ads.
Local search:
Local
search is the process that optimizes efforts of local businesses. They
focus on change to make sure all searches are consistent. It's important
because many people determine where they plan to go and what to buy based on
their searches.
No comments:
Post a Comment