Tag: algorithm

Scene Completion

The algorithm patches up holes in images by finding similar image regions in the database that are not only seamless but also semantically valid. Our chief insight is that while the space of images is effectively infinite, the space of semantically differentiable scenes is actually not that large.

Wikipedia disambiguation

In addition to pages describing different entities where contextual clues can be extracted (example), Wikipedia contains redirects for different surface forms of the same entity, list pages that categorize names, and disambiguation pages that show many of the different entities for a surface form. Wikipedia contains much more than unstructured text. Exploiting the semi-structured data — the redirect, list, and disambiguation pages — gives this work its power.

wikipedia for entity extraction. awesome what crowdsourcing can do.

Bible checksums

The scribes who were in charge of the Old Testament text dedicated their lives to preserving the text’s accuracy when they made copies. The great lengths the scribes went to guarantee the reliability of the copies is illustrated by the fact that they would count every letter and every word, and record in the margins such things as the middle letter and word of the Torah. If a single error was found, the copy was immediately destroyed. As a software engineer, I can personally vouch that the scribe’s method of protecting the text is more rigorous than the common checksuming methods used today to protect software programs from corruption

Stochastic Encoding

The Minimum Description Length framework is powerful but is often overlooked. I believe that 1 reason for this is that methods for attaining efficient encodings are subtle. In this paper, I discuss one of those techniques, stochastic encoding. When there are multiple nearly equally valuable choices of a parameter, it is more valuable to choose stochastically—according to a probability distribution— rather than selecting the single best choice. Why? Because information can be transmitted in which parameter is chosen. This is exactly the “bitsback” argument

shows how text classifiers can be traced back to compression algorithms

Text Rasterization

But it was only a “body show”; the main message of this article is. No more horizontal pixel grid! Really! From now on the horizontal grid is 1/256 of a pixel! You can shift the text horizontally by any fractional value, while the visual appearance does not change a whit! This “little detail” means a lot. How about this:

  • You can kern symbols with sub-pixel precision, not worrying about introducing extra blurriness.
  • You can freely scale the text as you want, with 100% guarantee of preserving a stable text layout that always fits other graphic elements.
  • You can always be sure that the calculated text width exactly corresponds with what you will see on screen and paper.
  • You can apply fancy vector effects such as “faux bold” and “faux italic” being sure the text will not look any blurrier.

there is really no need to still have crappy fonts on all platforms in 2007, as this article superbly demonstrates.