Search Engines are Hard

When you look at all these steps and all the complications, this process is rife with things that go can wrong. The hardest part about writing a search engine is that you’re going to process billions of URLS and serve millions, if not billions, of queries. This does not leave a lot of room for error. 1 super-linear algorithm applied over the wrong-sized list of items and you are sunk. 1 lock inside another lock and you are sunk. There will be no code paths not explored. All of those comments in your code, which print out errors like “This will never happen,” will happen. When you think that you are done, there is still the load balancing, the caching, the DNS servers, the ad service, the image servers, the update architecture, and (to take off on a familiar tune) a cartridge in a tape drive.

Leave a comment