Tag: search

Image search to solve problems

Rather than attempting to model the object to be inserted and then adjust the perspective, scale, and lighting, Alyosha suggests we change the problem to finding an appropriate object that already has the right perspective, scale, and lighting.

Automatic Geo Search

We want to make it so GeoServers that people stand up are automatically searchable from Google Maps and Google Earth.

Search Satisfaction

Assuming the survey method is credible — which is a leap, considering that there is nothing about survey size, sample makeup, etc. in the press release — I still have trouble with the result. A 1 point difference is meaningless, mere noise. Last month Google was ahead, this month Yahoo is ahead, and who knows what will happen next month. It’s just noise among searchers who wouldn’t know better search if it spidered them, doubly do when it’s demonstrably uncorrelated with search engine usage.

noise basically. sorry, jeremy. also, only 4 points difference between the players on a 100 point scale? that calls the methodology if there was any, into question.

Powerset growth models

This helps a company that intends to index the web whether it is better to purchase, lease or create virtual servers on Amazon EC2. Assumptions about the size and refresh frequency of the index can be changed. Since the model is forward looking, it also makes assumptions about future server power and cost reductions from Moore’s Law.

they would have better spent their time on some actual product

Wikipedia disambiguation

In addition to pages describing different entities where contextual clues can be extracted (example), Wikipedia contains redirects for different surface forms of the same entity, list pages that categorize names, and disambiguation pages that show many of the different entities for a surface form. Wikipedia contains much more than unstructured text. Exploiting the semi-structured data — the redirect, list, and disambiguation pages — gives this work its power.

wikipedia for entity extraction. awesome what crowdsourcing can do.

Crawl Stats

Another is how few startups are actually crawling… And the ones that are aren’t correlated with the folks getting buzz right now. In 3 months of data I didn’t see a single visit from Powerset.

Better GIS

SDI’s take years, if not decades, before they are ‘fully operational’. In less than 2 and half years Google has built a better ‘SDI’ than anyone else in the world.

i love disrupting industries asleep at the wheel.

Mobile Adsense

I get indexed by the mobile bot versions of Yahoo!, Microsoft, JumpTap, Ask, and others, and yet all of them together don’t drive 20% of the traffic that Google Search does – GOOG just completely owns the mobile web.

Stochastic Encoding

The Minimum Description Length framework is powerful but is often overlooked. I believe that 1 reason for this is that methods for attaining efficient encodings are subtle. In this paper, I discuss one of those techniques, stochastic encoding. When there are multiple nearly equally valuable choices of a parameter, it is more valuable to choose stochastically—according to a probability distribution— rather than selecting the single best choice. Why? Because information can be transmitted in which parameter is chosen. This is exactly the “bitsback” argument

shows how text classifiers can be traced back to compression algorithms

EU Google Competitor Gets $165M

LOL. siemens? SAP? imagine the meetings and steering committees. france is of course not far behind and will soon make an even more inane investment, not to be outdone. essentially, meet the coal subsidies of the 21th century.