Expert Search Engine Optimization

Search Engine Patents

There are literally hundreds of patents which have either been applied for or awarded to search engines and researchers in the search and information retrieval fields.

I do not believe, as some seem to, that the fact that a patent has been applied for or has been granted is in itself evidence that the search engine in question has actually applied any portion of the patent to a live search engine, (though there are obvious cases where they have been) but prefer to think that patents are taken out for many reasons including:

Google Patents

(click to open or close)

Systems and methods for improving search quality (Patent Application) # 20050149499 (2005)

Systems and methods are disclosed for improving search quality. Search queries are expanded using a variety of linguistic techniques. For example, the words in a query can be supplemented with related words obtained from a database of compound words, inflectional forms, and/or orthographic variations. The expanded queries can be used to perform searches for responsive documents. A document index can be expanded using similar techniques.

Systems and methods for direct navigation to specific portion of target document (Patent Application) #20050149576 (2005)

Systems and methods for direct navigation to and/or highlighting a specific portion of a target document such as query-relevant portion of the document are disclosed. The method may include generating a search result link to a search result document and generating an instruction to a client document browser to navigate directly to an intra-document portion related to the query within the search result document. The search result may include a snippet extracted from the search result document such that the instruction causes navigation directly to at least a portion of the snippet. The instruction may be an artificial anchor undefined in the search result document, e.g., designated by a preassigned artificial anchor designator. The client browser may have an artificial anchor module installed to execute the instruction to navigate directly to and optionally highlight the intra-document portion within the target document in response to the document link being selected.

Information retrieval based on historical data (Patent Application) # 20050071741 (2005)

A system identifies a document and obtains one or more types of history data associated with the document. The system may generate a score for the document based, at least in part, on the one or more types of history data.

Method for searching media (Patent Application) # 20040122811 (2004)

The present invention is directed to a computer-implemented method and apparatus for searching in response to Internet-based search queries using a search engine and an electronic database. According to one example embodiment of the present invention, data sets representing published items are input, for example, scanned-in or sent electronically, and stored in a searchable database. Each data set includes text from at least one published item. Responsive to the search query, a search engine searches for and identifies relevant web pages and data sets representing published items and, in a more specific embodiment, ranked characterizations are returned for the relevant web pages and published items. An electronic path can be provided with the published item for accessing further information about the published item. In one embodiment, the electronic path is a hyperlink from a characterization of a relevant published item to a more complete electronic representation of the relevant published item. Publishers provide authorization to display copyrighted materials through a permission protocol.

Methods and apparatus for employing usage statistics in document retrieval Patent Application 20020123988 (2002)

Methods and apparatus consistent with the invention provide improved organization of documents responsive to a search query. In one embodiment, a search query is received and a list of responsive documents is identified. The responsive documents are organized based in whole or in part on usage statistics.

Systems and methods for highlighting search results - USP 6,839,702 (2005)

A system highlights search terms in documents distributed over a network. The system generates a search query that includes a search term and, in response to the search query, receives a list of one or more references to documents in the network. The system receives selection of one of the references and retrieves a document that corresponds to the selected reference. The system then highlights the search term in the retrieved document.

Techniques for finding related hyperlinked documents using link-based analysis USP 6,754,873 (2004)

Techniques for finding related hyperlinked documents using link-based analysis are provided. Backlink and forwardlink sets can be utilized to find web pages that are related to a selected web page. The scores for links from web pages that are from the same host and links from web pages with numerous links can be reduced to achieve a better list of related web pages. The list of related web pages can be utilized as a feature to a word-based search engine or an addition to a web browser.

Ranking search results by reranking the results based on local inter-connectivity USP # 6,725,259 (2004)

A search engine for searching a corpus improves the relevancy of the results by refining a standard relevancy score based on the interconnectivity of the initially returned set of documents. The search engine obtains an initial set of relevant documents by matching a user's search terms to an index of a corpus. A re-ranking component in the search engine then refines the initially returned document rankings so that documents that are frequently cited in the initial set of relevant documents are preferred over documents that are less frequently cited within the initial set.

Information extraction from a database USP # 6,678,681 (2004)

Techniques for extracting information from a database are provided. A database such as the Web is searched for occurrences of tuples of information. The occurrences of the tuples of information that were found in the database are analyzed to identify a pattern in which the tuples of information were stored. Additional tuples of information can then be extracted from the database utilizing the pattern. This process can be repeated with the additional tuples of information, if desired.

Detecting duplicate and near-duplicate files USP 6,658,423 (2003)

Improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by (i) extracting parts from the document, (ii) assigning the extracted parts to one or more of a predetermined number of lists, and (iii) generating a fingerprint from each of the populated lists. Two documents may be considered to be near-duplicates if any one of their fingerprints match.

Detecting query-specific duplicate documents USP 6,615,209 (2003)

An improved duplicate detection technique that uses query-relevant information to limit the portion(s) of documents to be compared for similarity is described. Before comparing two documents for similarity, the content of these documents may be condensed based on the query. In one embodiment, query-relevant information or text (also referred to as "snippets") is extracted from the documents and only the extracted snippets, rather than the entire documents, are compared for purposes of determining similarity.

Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query USP 6,529,903 (2003)

A search engine for searching a corpus improves the relevancy of the results by refining a standard relevancy score based on the interconnectivity of the initially returned set of documents. The search engine obtains an initial set of relevant documents by matching a user's search terms to an index of a corpus. A re-ranking component in the search engine then refines the initially returned document rankings so that documents that are frequently cited in the initial set of relevant documents are preferred over documents that are less frequently cited within the initial set.

Note: The following patent is not strictly a Google Patent, but Google currently has exclusive license to use the PageRank Patent and so it is included here.

Method for scoring documents in a linked database USP # 6,799,176 (2004)

A method is presented for scoring documents stored in a network. The method includes identifying links from linking documents to linked documents in the network and determining an importance of the identified links. The method further includes weighting the identified links based on the determined importance and scoring the linked documents based on the weighted links.

Yahoo Patents

(click to open or close)

Systems and methods for search query processing using trend analysis Patent Application # 20050102259 (2005)

Systems and methods for processing search requests include analyzing received queries in order to provide a more sophisticated understanding of the information being sought. In one embodiment, queries are parsed into units, which may comprise one or more words or tokens of the query, and the units are related in concept networks. Trend analysis is performed by sorting the queries into subsets along a dimension of interest and comparing concept networks for different subsets. Trend information is usable to enhance a response of an automated search agent to a subsequently received query

Systems and methods for search processing using superunits Patent Application # 20050080795 (2005)

In a search processing system, a concept network is generated from a set of queries by parsing the queries into units and defining various relationships between the units based in part on patterns of units that appear together in queries. Units in the concept network that have some similar characteristic(s) are grouped into superunits. For each superunit, there is a corresponding signature that defines the similar characteristic of the group. A query is processed by identifying constituent units, determining the superunit membership of some or all of the constituent units, and using that information to formulate a response to the query.

System and method of placing a search listing in at least one search result list Patent Application # 20050004835 (2005)

A system and method is provided for qualifying search listings for placement in at least one search result list and ordering the search listings according to an algorithm. Specifically, a searching device is adapted to received items of information, such as search listings (e.g., www.yahoo.com, etc.), search terms (e.g., "cars," "beauty supplies," etc.) and monetary amounts (e.g., $1.00, etc.), from a plurality of promoting devices, receive a search inquiry (i.e., a search term) from a reception device, and provide (in response thereto) at least one search result list including search listings (i) associated with the search inquiry and (ii) qualified for placement in the search result list. In other words, if the search term linked to the search listing is the same as (or substantially similar to) the search inquiry, then the first prong is met. Furthermore, if a predetermined number of monetary amounts (i.e., as linked to a predetermined number of search listings associated with the search inquiry) are not higher than the monetary amount linked to the search listing, then the second prong is met and the search listing is qualified for placement. Thus, only a predetermined number of search listings (e.g., three, five, etc.) that are both (i) associated with the search inquiry and (ii) linked to the highest monetary amounts are qualified for placement in the search result list. Once the search listings are qualified for placement in the search result list, the searching device is adapted to arrange the qualified search listings according to an algorithm (e.g., randomly, according to relevance, according to monetary amounts, etc.).

Universal search interface systems and methods Patent Application (2004)

Systems and methods for enhancing information retrieval and communication functionality through the use of a universal interface that is configurable to interface with multiple applications resident on a user computer, and which provides a persistent two-way communication channel for communicating with search intelligence on a remote system. Sharable, actionable labels and codebooks of labels may be defined by a user. Each label may be defined in a natural language format and may include a mapping to a specific application or set of applications executable on a user system. Transfer of labels and codebooks between user systems allows for enhanced information exchange and retrieval among users as well as information exchange tracking and analysis by a server system.

Systems and methods for generating concept units from search queries Patent Application # 20040199498 (2004)

Systems and method for enhancing search functionality provided to a user. In certain aspects, a query processing engine automatically decomposes queries into constituent units that are related to concepts in which a user may be interested. The query processing engine decomposes queries into one or more constituent units per query using statistical methods. In certain aspects, no real world knowledge is used in determining units. In other aspects, aspects of world and content knowledge are introduced to enhance and optimize performance, for example, manually using a team of one or more information engineers.

Canonicalization of terms in a keyword-based presentation system Patent Application # 20040199496 (2004)

A presentation system accepts presentations or references to presentations from prospective presenters. Some or all of the presentations or references are stored in a database and referenced by keywords such that presentations to be presented in response to particular searches can be identified. A presentation manager handles accepting bids and settling terms between prospective presenters. The results of such processes might be stored in a presentation details database. A presentation server handles retrieving presentations from the presentation details database for presentation to users along with requests such as search results. Both the presentation manager and the presentation server can operate on a keywords-basis, wherein presentation terms specify keywords to be associated with particular presentations and the presentation server serves particular presentations based on keywords in a search query for which the presentations are to be returned. The association of keywords can be done using canonicalization so that, under certain conditions, different keywords are treated as the same keyword. Canonicalizations might include plural/singular forms, gender forms, stem word forms, suffix forms, prefix forms, typographical error forms, word order, pattern ignoring, acronyms, stop word elimination, etc. Conditions might include aspects of the search query state, such as the user's demographics, the page from which the search query was initiated, etc.

Method and apparatus for search ranking using human input and automated ranking Patent Application # 20040024752 (2004)

A search system provides search results to searchers in response to search queries and the search results are ranked. The ranking is determined by an automated ranking process in combination with human editorial input. A search system might comprise a query server for receiving a current query, a corpus of documents to which the current query is applied, ranking data storage for storing information from an editorial session involving a human editor and a reviewed query at least similar to the current query, and a rank adjuster for generating a ranking of documents returned from the corpus responsive to the current query taking into account at least the information from the editorial session.

Search engine using sales and revenue to weight search results Patent # 6,631,372 (2003)

A search engine selects one or more search hits from among a plurality of hits, wherein a hit is a reference to a page or a site, based on a user interest, comprising an input module for accepting a query from a user, the query representing an interest of the user; a tracking module for tracking the user's navigation through the plurality of pages, including at least a destination purchase page, the destination purchase page being a page from which the user makes a purchase; a sales module which records associations between purchases and queries where the associations are provided, at least in part by an output of the tracking module; and a search module, which takes as its inputs at least a query and sales associations of that query provided by the sales module, and which outputs one or more search hits based on at least the query and the sales associations of that query. In some systems, instead of using sales data to alter the weights of the search results, merchant bidding is used to alter the weights of the search results, or a combination of the two is used.

Information retrieval from hierarchical compound documents Patent # 6,553,364 (2003)

A search query is applied to documents in a document repository wherein the documents are organized into a hierarchy. A search engine searches the hierarchy to return documents which match a query term either directly or indirectly. A specific embodiment of the search engine organizes the query term into individual subterms and matches the subterms against documents, returning only those documents which indirectly match the entire search query term and directly match at least one of the query subterms.

MSN Patents

(click to open or close)

Web address converter for dynamic web pages Patent Application 20050081140 (2005)

Herein is described an implementation of a Web address converter, which helps dynamic Web sites get the attention of spiders of Internet search engines.

Search system using user behavior data Patent Application 20050125382 (2005)

Context-based user feedback is gathered regarding searches performed on a search mechanism. The search mechanism is monitored for user behavior data regarding an interaction of a user with the search mechanism. The response data provided by the search mechanism is also monitored. Context data (describing the search) and user feedback data (the user's feedback on the search--either explicit or implicit) are determined. This can be used, for example, to evaluate a search mechanism or to check a relevance model.

System and method for checking a content site for efficacy Patent Application 20050114319 (2005)

The present invention provides a system and method for automatically suggesting optimizations that can be made to content pages to increase the chances that the network site containing the content page will be indexed and returned high in the rank ordered list of results form a search engine. In one embodiment, the present invention also includes a keyword generation tool for use in generating effective keywords for which a content page can be optimized.

Expanded search keywords Patent Application 20050102278 (2005)

A method for providing additional terms to a searching process based on a string is provided. The method includes receiving a string that incorporates a plurality of characters separated by at least one space or hyphen. In one aspect, the plurality of characters is concatenated to form at least one additional term. In another aspect, a space is replaced with a hyphen. In yet another aspect, a hyphen is replaced with a space. The at least one additional term is provided to the search process.

System and process for presenting search results in a tree format Patent Application 20050080770 (2005)

A system and process for graphically displaying the results of a standard electronic search to a user on a display device via an interactive search results window in which the user views and refines search results items using a tree format. In general, the tree has a first level that indicates how the search results may be refined. The second level of the tree shows what subsets (what) are available for a particular refining method. The third level shows how the already refined (by the second level) results may be refined further. This is repeatedly applied with odd-numbered levels of the tree indicating how the results may be refined, and even-numbered levels indicating what subsets are available. In addition to the tree, the search results window also includes a listing of the search results items associated with a user-selected portion of the tree.

Systems and methods for ranking documents based upon structurally interrelated information Patent Application 20050060297 (2005)

Systems and methods for ranking Web pages based on hyperlink information in a manner that is resistant to nepotistic links are provided. In one embodiment, a Web search service is provided for returning quality query results. The vulnerability of existing ranking algorithms, such as PageRank, to Web pages that are artificially generated for the sole purpose of inflating the score of target page(s) is addressed. Intuitively, it is recognized that it is less likely to reach a particular page on a Web server having many pages via a random jump than it is to reach a particular page on a Web server having few pages, which implies that the influence of such a page upon another page by linking to, or endorsing, the other page is diminished. Thus, in various non-limiting embodiments, each Web server, not each Web page, is assigned a guaranteed minimum score. This minimum score assigned to a server can then be divided among all the pages on that Web server.

Thank you for visiting our Expert search engine optimization site - Come back again soon.

July 9, 2005