Ask Me Help Desk

Ask Me Help Desk (https://www.askmehelpdesk.com/forum.php)
-   Other Scripting (https://www.askmehelpdesk.com/forumdisplay.php?f=455)
-   -   What language is used in developing Google (https://www.askmehelpdesk.com/showthread.php?t=384326)

  • Aug 7, 2009, 05:47 AM
    lavanyaa
    What language is used in developing Google
    In what language Google has been writte??

    Any one let me know as soon as possible
    Google chrome is opensource what about Google?
  • Aug 11, 2009, 03:06 PM
    Alacadabra
    Most of Google is implemented in C or C++ for efficiency and can run in either Solaris or Linux.

    In Google, the web crawling (downloading of web pages) is done by several distributed crawlers. There is a URLserver that sends lists of URLs to be fetched to the crawlers. The web pages that are fetched are then sent to the storeserver. The storeserver then compresses and stores the web pages into a repository. Every web page has an associated ID number called a docID which is assigned whenever a new URL is parsed out of a web page. The indexing function is performed by the indexer and the sorter. The indexer performs a number of functions. It reads the repository, uncompresses the documents, and parses them. Each document is converted into a set of word occurrences called hits. The hits record the word, position in document, an approximation of font size, and capitalization. The indexer distributes these hits into a set of "barrels", creating a partially sorted forward index. The indexer performs another important function. It parses out all the links in every web page and stores important information about them in an anchors file. This file contains enough information to determine where each link points from and to, and the text of the link.

    The URLresolver reads the anchors file and converts relative URLs into absolute URLs and in turn into docIDs. It puts the anchor text into the forward index, associated with the docID that the anchor points to. It also generates a database of links which are pairs of docIDs. The links database is used to compute PageRanks for all the documents.

    This information can be read at the below url:
    The Anatomy of a Search Engine
  • Nov 2, 2009, 03:49 AM
    lavanyaa
    Quote:

    Originally Posted by alacadabra View Post
    most of google is implemented in c or c++ for efficiency and can run in either solaris or linux.

    In google, the web crawling (downloading of web pages) is done by several distributed crawlers. There is a urlserver that sends lists of urls to be fetched to the crawlers. The web pages that are fetched are then sent to the storeserver. The storeserver then compresses and stores the web pages into a repository. Every web page has an associated id number called a docid which is assigned whenever a new url is parsed out of a web page. The indexing function is performed by the indexer and the sorter. The indexer performs a number of functions. It reads the repository, uncompresses the documents, and parses them. Each document is converted into a set of word occurrences called hits. The hits record the word, position in document, an approximation of font size, and capitalization. The indexer distributes these hits into a set of "barrels", creating a partially sorted forward index. The indexer performs another important function. It parses out all the links in every web page and stores important information about them in an anchors file. This file contains enough information to determine where each link points from and to, and the text of the link.

    The urlresolver reads the anchors file and converts relative urls into absolute urls and in turn into docids. It puts the anchor text into the forward index, associated with the docid that the anchor points to. It also generates a database of links which are pairs of docids. The links database is used to compute pageranks for all the documents.

    This information can be read at the below url:
    the anatomy of a search engine



    Fine thanks 4 your info

  • All times are GMT -7. The time now is 07:35 AM.