Tag: search

Just Google it.

Overheard in a restaurant: what is nori? what is this, the 20th century? Just Google it.

Block search results

ehow, we hardly knew ye.

When you block a site with the extension, you won’t see results from that domain again in your Google search results.

The Google Books Algorithm

There is a meaningful effort to say, how do we tune for books? We’ve got a lot of people doing very focused on the web. How do we take the lessons from what we learned on the web and invent new things that are unique to books?

starting to pay dividends.

Google Books Metadata

a fascinating smackdown for all the metadata whiners, from a member of the google books team. a lot of them have illusions about the quality of the metadata their institutions produce.

In paragraph 3, Geoff describes some of the problems we have with dates, and in particular the prevalence of 1899 dates. This is because we recently began incorporating metadata from a Brazilian metadata provider that, unbeknownst to us, used 1899 as the default date when they had no other. Geoff responded by saying that only one of the books he cited was in Portuguese. However, that metadata provider supplies us with metadata for all the books they know about, regardless of language. To them, Stephen King’s Christine was published in 1899, as well as 250K other books.

To which I hear you saying, “if you have all these metadata sources, why can’t the correct dates outvote the incorrect ones?” That is exactly what happens. We have 10s of metadata records telling us that Stephen King’s Christine was written in 1983. That’s the correct date. So what should we do when we have a metadata record with an outlier date? Should we ignore it completely? That would be easy. It would also be wrong. If we put in simple common sense checks, we’d occasionally bury uncommonly strange but genuine metadata. Sometimes there is a very old book with the same name as a modern book. We can either include metadata that is very possibly wrong, or we can prevent that metadata from ever being seen. The scholar in me — if he’s even still alive — prefers the former.

Intelligence in Wikipedia

using wikipedia infoboxes for training extractors, and then asking users to confirm guesses, increasing contribution and extraction quality in a mutual positive feedback loop.

Gmail suggest

yay! gmail keeps getting better. i don’t get how people keep using crap email clients like mail.app or thunderbird.

WolframAlpha

mr. wolfram is usually full of hyperbole, but i await his most recent creation with excitement.
2009-05-05: 5m lines of Mathematica code make up WolframAlpha.

the easiest way to create WolframAlpha without Mathematica would have been to write Mathematica first, then use it.

Bimodal Google Suggest

Live Search vs. GMM

includes a comment from a microsoft PM admitting that they suck at search.

Jeff Dean keynote

The attention to detail at Google is remarkable. Jeff gleefully described the various index compression techniques they created and used over the years. He talked about how they finally settling on a format that grouped 4 delta of positions together in order to minimize the number of shift operations needed during decompression. They paid attention to where their data was laid out on disk, keeping the data they needed to stream over quickly always on the faster outer edge of the disk, leaving the inside for cold data or short reads. They wrote their own recovery for errors with non-parity memory. They wrote their own disk scheduler. They repeatedly modified the Linux kernel to meet their needs. They designed their own servers with no cases, then switched to more standard off-the-rack servers, and now are back to custom servers with no cases again.

You think?