Tag: search

Search Satisfaction

Assuming the survey method is credible — which is a leap, considering that there is nothing about survey size, sample makeup, etc. in the press release — I still have trouble with the result. A 1 point difference is meaningless, mere noise. Last month Google was ahead, this month Yahoo is ahead, and who knows what will happen next month. It’s just noise among searchers who wouldn’t know better search if it spidered them, doubly do when it’s demonstrably uncorrelated with search engine usage.

noise basically. sorry, jeremy. also, only 4 points difference between the players on a 100 point scale? that calls the methodology if there was any, into question.

Powerset growth models

This helps a company that intends to index the web whether it is better to purchase, lease or create virtual servers on Amazon EC2. Assumptions about the size and refresh frequency of the index can be changed. Since the model is forward looking, it also makes assumptions about future server power and cost reductions from Moore’s Law.

they would have better spent their time on some actual product

Wikipedia disambiguation

In addition to pages describing different entities where contextual clues can be extracted (example), Wikipedia contains redirects for different surface forms of the same entity, list pages that categorize names, and disambiguation pages that show many of the different entities for a surface form. Wikipedia contains much more than unstructured text. Exploiting the semi-structured data — the redirect, list, and disambiguation pages — gives this work its power.

wikipedia for entity extraction. awesome what crowdsourcing can do.

Stochastic Encoding

The Minimum Description Length framework is powerful but is often overlooked. I believe that 1 reason for this is that methods for attaining efficient encodings are subtle. In this paper, I discuss one of those techniques, stochastic encoding. When there are multiple nearly equally valuable choices of a parameter, it is more valuable to choose stochastically—according to a probability distribution— rather than selecting the single best choice. Why? Because information can be transmitted in which parameter is chosen. This is exactly the “bitsback” argument

shows how text classifiers can be traced back to compression algorithms