Tag: google

Real database

We finally decided to go with a commercial database over the objections of a number of engineers, including myself. To ease the transition it was decided to convert AdWords over to the new system first, and to do the main ads system later. It was a project on a par with the internationalzation effort in terms of the tedious work required to comb over nearly all of the AdWords code and change all of the database queries. (Databases are supposed to all be compatible with one another, but in reality they pretty much aren’t.)

To make a long story short, it was an unmitigated disaster.

adwords runs on mysql (for the next time someone brings up the old “not for enterprise use”)

Google OS Open Source

it has long been argued that the google os, particularly MapReduce and GFS, is google’s real competitive strength. yahoo, meanwhile, is paying developers to develop clones of these. with seeming consolidation on a common computing platform, and ever-rising data center expenses, you gotta wonder how much sense it makes for the big 3 to duplicate all that CAPEX. they might be better off outsourcing their datacenters, and sharing some base datasets, such as a crawler cache (kinda like the feedmesh network).

the outsourced company, on the other hand, would end up running a grid with several million nodes and could optimize running costs overall, by using very low power servers, running on an opensourced processor architecture.

Yahoo

What moron would expire news article URLs when disk space is plentiful and cheap? Apparently Yahoo doesn’t want me to use del.icio.us. The sad thing, just as with their RSS mess, is that they have people working for them who know better. That place needs it’s own mini. Too many clueless middle managers.
2006-09-29: The red tape saga at Yahoo continues.

“NOW let’s just pause for a second.” It is the 4th pause for thought that Terry Semel, chairman and chief executive of Yahoo!, has requested in 10 minutes. He is trying to marshal various arguments to prove that his firm, the world’s largest internet company by visitors to its website, has a coherent and winning strategy compared with Google, a phenomenally successful search engine. With only slightly bigger revenues, Google has 3.5x the market value of Yahoo!. 2x in 3 months Wall Street has dumped the shares of Yahoo! and widened the gap.

2006-10-09: Good article on the many problems at Yahoo.

But the problems at Yahoo go beyond advertising. From video programming to social networking — areas of interest to users and advertisers alike — the company is losing its initiative. And each time a product fails in the market or is late, Yahoo loses some ability to do more deals and hire more talented employees. The shares are down 38% this year, sending some employees out the door in search of better shots at stock market wealth.

2006-10-14: Who indeed

I think the better buyer is a media company. News Corporation is the one that most comes to mind. Rupert has shown that he’s serious about the Internet and that he is not afraid to make big bets. It would be highly dilutive since News Corp itself has a $65B market cap, but it might be accretive to earnings given that News Corp trades at a higher EBITDA multiple than Yahoo! now. The one reason I think its most unlikely is that News Corp has shown an interest in working closely with Google and buying Yahoo! would take them in an opposite direction.

2006-11-23: Heh. Snark at the incompetence manifest in the manifest.

The latest example from Yahoo!, the world’s largest internet company by some measures, reverses the trend. Brad Garlinghouse, a manager just senior enough to be noteworthy, has put forth a “Peanut Butter Manifesto”, which was helpfully “leaked” to the Wall Street Journal. It was meant as part St Crispin’s Day speech to rally the troops, part corporate analysis of Yahoo!’s many troubles, part turnaround plan—and, it seems, part publicity stunt. But it turned out to be a redundant series of platitudes, split infinitives, clichés and mixed metaphors.

2006-12-03: Problems not just with monetization, but basic search, still:

Y! may have 28% of all Internet searches, but for some reason Y! does not generate 28% of Internet traffic.

2006-12-07: Cultural conflict

The story about Braun taking a big corner conference room at Y! HQ and turning it into an office (when even Jerry Yang has a cube out amongst the ‘rank and file’) is a totally rich illustration of SoCal vs NoCal, uh, charm.

2006-12-11: “leaked” like peanut butter?

Facebook flatly rejected the $1B offer, looking for far more. Yahoo was prepared to pay up to $1.62B, but negotiations broke off before the offer could be made.

2007-01-18: Time for Plan B at Yahoo. Funnily self-referential
2007-02-25: +1

At Yahoo, the marketers rule, and at Google the engineers rule. And for that, Yahoo is finally paying the price.

2007-04-19: Nice 1 page summary of an endless list of problems.
2007-06-11: 80 VPs? That’s crazy, and 70 too many.

Yahoo disputes the notion that it is losing people at an unusual rate, saying that it had named 80 vice presidents worldwide this year

2007-07-25: Now there’s a surprise. I am puzzled why good people like Micah Dubinko can stand the nonsense over there.

My time at Yahoo! wasn’t super productive – I had a lot of ideas, but zero ability to get them implemented

2007-09-12: Oy. Talk about a preference for pain.

I’m not going to lie to you, it’s rough going right now. We get smacked around by the media. It’s been a while since we had a really big, notable win. I think morale at the company is low, the future uncertain and the food still sucks (although, I’ve had worse). But despite that, we had a record turnout for our last internal hack day. We had so many people with ideas that we had to completely change the format of the event because the campus could no longer scale to meet our demands. There is still plenty fight in this company and we have no shortage of asses to kick. So lace up all you Yahoo!’s…the ass won’t kick itself.

2007-10-03: 50? Try 500. No wonder they can’t get anything done.

People at Yahoo figure out the average number of employees per VP. The number seems to be around 50.

2007-10-06: Could be worth as much $45 per share with a dramatic overhaul that would include outsourcing its paid search, cutting staff by 25% and restructuring its graphic display advertising.
2007-11-30: I especially liked the one about the 300 VPs.

2007-12-11: Traffic down

web search queries on Yahoo! were down 10% from November 2006

2008-02-01: Comarketing

Think about it… Yahoo doesn’t mean keyboards. They didn’t do plastics or ergonomic research or think of some insight about key travel distance. The message was that Yahoo was willing to put their name on anything.

2008-02-15: I think its related to education levels. Dumb people can’t tell if they are getting crap results.

Yahoo is strong in “struggling societies,” “blue collar backbone,” and “remote America,” where as Google obtains higher use in “small town contentment,” “affluent suburbia,” and “upscale America.”

2008-04-21: Simple features take 2 years to launch indeed.
2008-07-01: Can I haz flickr / del.icio.us?

Microsoft is trying to put together a sort of take-over coalition where Microsoft would get Yahoo’s search while AOL or News Corp would acquire other parts of Yahoo. However, it doesn’t seem all that likely.

Bot classes

After a couple days of robots.txt love, I have now much less crap in my logs. A good opportunity to see which bots are well-written. Based on what I am seeing with /robots.txt, I am sure glad I blocked most of these festering piles of dung from my site.

not using conditional get while requesting /robots.txt

Only kinjabot, OnetSzukaj/5.0 and Seekbot/1.0 get this right. All other bots, including Google and Yahoo, do not. Lame.

requesting /robots.txt too often

The biggest offender is VoilaBot, checking /robots.txt every 5 minutes, every day. You gotta be kidding me. Google and Yahoo are not much better, you’d think they’d figured out a way by now to communicate the state of /robots.txt across different crawlers. Other bots fare better by virtue of being less desperate.

Problems like this are economic opportunities.

Search as a force for good

peter starts with the well-known globe with the google queries superimposed. “google saves over 9000 person-years of effort every year. So Google saves 9000 lives per year.” 🙂 numerous mentions of people making a living from google ads. shows keyhole. “the computers from star trek are always omniscient but never helpful, they never tell you: don’t do this.” the spelling checker is not dictionary based but works on their huge accumulated data, like the 500 spellings of britney. the web is 1M times larger than the largest computational linguistics corpuses. 1B documents really makes the difference. humans achieve around 95% accuracy. work is being done on semantic understanding. one program they run is extracting categories and members of these categories from their corpus. “done very simply: you take the whole web, break it into sentences, and look for 6 patterns, such as including, as in Software companies, including..” they take an automated approach to machine translation by looking for pages on the web that have documents in 2 languages and derive the model from it. this yields a level of translation that is good enough. “doug cutting was more interested in the crawling and indexing side, so that’s where lucene looks good, not so much in the sorting.” google focuses on the 95%, the easy part, to get more leverage, but will have to go back to the hard part. they found that the feedback button didn’t work. some people were writing them that they were looking for a specific book by typing in “library”, which indicates a deeper problem. the first google logo was done during burning man 1999 to indicate that “no one is at the office, don’t blame us”.

optimizing urls for google

certainly of interest to open source cms (some of which have horrible urls) is this article by Brice Dunwoodie

More specifically, Google will parse and underscore literally and will parse a dash as a “token”, that represents white space. So if you construct a URL that contains “enterprise_content_management” in it, Google literally sees the word “enterprise_content_management”, which is really not a word at all.