given desired values of either size or probability of false positives, gives appropriate bloom filter parameters
Tag: algorithm
Brute Force Data
filling holes in images, finding and segmenting objects, recovering 3D scene geometry from an image, and inserting objects into new scenes.
Scene Completion
The algorithm patches up holes in images by finding similar image regions in the database that are not only seamless but also semantically valid. Our chief insight is that while the space of images is effectively infinite, the space of semantically differentiable scenes is actually not that large.

Wikipedia disambiguation
In addition to pages describing different entities where contextual clues can be extracted (example), Wikipedia contains redirects for different surface forms of the same entity, list pages that categorize names, and disambiguation pages that show many of the different entities for a surface form. Wikipedia contains much more than unstructured text. Exploiting the semi-structured data — the redirect, list, and disambiguation pages — gives this work its power.
wikipedia for entity extraction. awesome what crowdsourcing can do.
Evolutionary algorithms now surpass humans
To mainstream engineers there is a disbelief that a self-organising process like an EA can produce designs that outperform those designed using conventional top-down, intelligent design
Bible checksums
The scribes who were in charge of the Old Testament text dedicated their lives to preserving the text’s accuracy when they made copies. The great lengths the scribes went to guarantee the reliability of the copies is illustrated by the fact that they would count every letter and every word, and record in the margins such things as the middle letter and word of the Torah. If a single error was found, the copy was immediately destroyed. As a software engineer, I can personally vouch that the scribe’s method of protecting the text is more rigorous than the common checksuming methods used today to protect software programs from corruption
Stochastic Encoding
The Minimum Description Length framework is powerful but is often overlooked. I believe that 1 reason for this is that methods for attaining efficient encodings are subtle. In this paper, I discuss one of those techniques, stochastic encoding. When there are multiple nearly equally valuable choices of a parameter, it is more valuable to choose stochastically—according to a probability distribution— rather than selecting the single best choice. Why? Because information can be transmitted in which parameter is chosen. This is exactly the “bitsback” argument
shows how text classifiers can be traced back to compression algorithms
Text Rasterization
But it was only a “body show”; the main message of this article is. No more horizontal pixel grid! Really! From now on the horizontal grid is 1/256 of a pixel! You can shift the text horizontally by any fractional value, while the visual appearance does not change a whit! This “little detail” means a lot. How about this:
- You can kern symbols with sub-pixel precision, not worrying about introducing extra blurriness.
- You can freely scale the text as you want, with 100% guarantee of preserving a stable text layout that always fits other graphic elements.
- You can always be sure that the calculated text width exactly corresponds with what you will see on screen and paper.
- You can apply fancy vector effects such as “faux bold” and “faux italic” being sure the text will not look any blurrier.
there is really no need to still have crappy fonts on all platforms in 2007, as this article superbly demonstrates.
Machine-Readable News
more algorithmic trading. it is quite irritating that in 2007 they still need to write articles that explain that “a computer can execute a trade much faster than a human” oy
Irrelevant ACM
The ACM seems to be stuck is some pre-Web timewarp, where things like object orientation, components and CORBA are exciting new ground.
+1