this looks actually pretty good!
Tag: search
Michigan digitization project
Google sped up the effort ~350x, to ~1.5M volumes/y. in ~5 years, the effort will be complete.
Plagiarism Singularity
While some book authors and publishers have come to embrace Google’s book search, many are still fighting it. Paul Collins is predicting that Google’s book search will help turn up plenty of plagiarists, including some well known authors.
heh
Similar Image Search
isk-daemon is an open source database server capable of adding content-based (visual) image searching to any image related website or software. This technology allows users of any image-related website or software to sketch on a widget which image they want to find and have the website reply to them the most similar images or simply request for more similar photos at each image detail page.
similarity search for images.
Geocrawling
With the phenomenal growth of the WWW, rich data sources on many different subjects have become available online. Some of these sources store daily facts that often involve textual geographic descriptions. These descriptions can be perceived as indirectly georeferenced data – e.g., addresses, telephone numbers, zip codes and place names. Under this perspective, the Web becomes a large geospatial database, often providing up-to-date local or regional information. In this work we focus on using the Web as an important source of urban geographic information and propose to enhance urban Geographic Information Systems (GIS) using indirectly georeferenced data extracted from the Web. We describe an environment that allows the extraction of geospatial data from Web pages, converts them to XML format, and uploads the converted data into spatial databases for later use in urban GIS. The effectiveness of our approach is demonstrated by a real urban GIS application that uses street addresses as the basis for integrating data from different Web sources, combining these data with high-resolution imager
Web scraping taxonomy
In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, the data can be handled in a way similar to instances of a traditional database. The approaches proposed in the literature to address the problem of Web data extraction use techniques borrowed from areas such as natural language processing, languages and grammars, machine learning, information retrieval, databases, and ontologies. As a consequence, they present very distinct features and capabilities which make a direct comparison difficult to be done. In this paper, we propose a taxonomy for characterizing Web data extraction fools, briefly survey major Web data extraction tools described in the literature, and provide a qualitative analysis of them. Hopefully, this work will stimulate other studies aimed at a more comprehensive analysis of data extraction approaches and tools for Web data.
sitemaps.org
about time that the engines agree on a common sitemap format. now to take this forward with more TTL and other information, while making it hard to game.
Google Books Search Review
The most startling problem is the incorrect use of the Boolean OR operation, the simplest of all. It is taught in kindergarten that the search for A OR B cannot produce less results than the higher found for A or B. Still, the query aboulia produces 26 items, abulia yields 40, but aboulia OR abulia produces only 35. Neither can a search for A OR B produce more hits than the sum of the hits found for A and B together at most. But this is what happens as illustrated by this simple search: for books with the word arrogance in the title. It finds 2 books. The search for books with the word arrogant in the title finds 6 documents. (Minutes earlier the software produced 8 hits, and such disappearances add an additional dimension to the confusion). The search for books with arrogant OR arrogance in the title yields 13 books.
oy. it looks like google book search has trouble with simple boolean operators.
gregorrothfuss
the wonders / dangers? of unified nicks
Social Local
Bars, clothing shops, coffee shops and gift stores in the area are focusing highly on establishing popular MySpace profiles and connecting with users in their local areas. The effort was so successful that he abandoned his traditional Web site in favor of MySpace.
social networks as the new IYP?