Tag: search

Search Engines are Hard

When you look at all these steps and all the complications, this process is rife with things that go can wrong. The hardest part about writing a search engine is that you’re going to process billions of URLS and serve millions, if not billions, of queries. This does not leave a lot of room for error. 1 super-linear algorithm applied over the wrong-sized list of items and you are sunk. 1 lock inside another lock and you are sunk. There will be no code paths not explored. All of those comments in your code, which print out errors like “This will never happen,” will happen. When you think that you are done, there is still the load balancing, the caching, the DNS servers, the ad service, the image servers, the update architecture, and (to take off on a familiar tune) a cartridge in a tape drive.

Secondary Search

Google started offering secondary search boxes for major sites. Sites were growing accustomed to the idea that users often did not find their company’s content through the site’s own search box or its front page. More often than not, users would find links to specific articles or products on blogs, search engines or other sites, and navigate to that page. “So publishers are building their sites to make sure the experience is the same, whether users are coming in through the front door or the side.”

if their own search didn’t suck so much the argument would make a lot more sense

News Analysis

TextMap is a search engine for entities: the important (and not so important) people, places, and things in the news. Our news analysis system automatically identifies and monitors these entities, and identifies meaningful relationships between them. TextMap analyzes both the temporal and geographical distribution of news entities. We literally monitor the state-of-the-world through our analysis of 1000 domestic and international news sources every day.

shows promise, but they have about a factor of 1000 fewer machines than they need.

Click Boost

the authors suggest that relevance rank should be informed by click data, but note that “such steps are likely to amplify the search bias toward already popular sites.” In the talk, an audience member also noted that such steps may be susceptible to click spam, which is even easier to do than link spam for those wanting to manipulate search results. Finally, they noted strong recency and 24 hour trends in traffic data, saying that “47% of the clicks at any given time are predicted by the clicks from the previous day at the same time” and that, though the clicks from the previous 3 hours are a strong predictor of clicks for the current hour, after 4 hours, “the requests from the previous day yield higher precision and recall.”

Street View OCR

In addition to street scenes, indexing can be applied to other image sets. In one implementation, a store (e.g., a grocery store or hardware store) is indexed. Images of items within the store are captured, for example, using a small motorized vehicle or robot. The aisles of the store are traversed and images of products are captured in a similar manner as discussed above. Additionally, as discussed above, location information is associated with each image. Text is extracted from the product images. In particular, extracted text can be filtered using a product name database in order to focus character recognition results on product names.

now you won’t lose your keys again, ever.

Google Coupons

Google Coupons now offers the ability to return Coupons in your main search results via the Google Co-op. When you select the above button from Google Coupon search you will be asked to log in and taken to this page at Google Co-op beta for a subscription confirmation

have to try this