Tag: search

Live Search Mobile

this looks actually pretty good!

Michigan digitization project

Google sped up the effort ~350x, to ~1.5M volumes/y. in ~5 years, the effort will be complete.

Plagiarism Singularity

While some book authors and publishers have come to embrace Google’s book search, many are still fighting it. Paul Collins is predicting that Google’s book search will help turn up plenty of plagiarists, including some well known authors.

heh

Similar Image Search

isk-daemon is an open source database server capable of adding content-based (visual) image searching to any image related website or software. This technology allows users of any image-related website or software to sketch on a widget which image they want to find and have the website reply to them the most similar images or simply request for more similar photos at each image detail page.

similarity search for images.

Web scraping taxonomy

In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, the data can be handled in a way similar to instances of a traditional database. The approaches proposed in the literature to address the problem of Web data extraction use techniques borrowed from areas such as natural language processing, languages and grammars, machine learning, information retrieval, databases, and ontologies. As a consequence, they present very distinct features and capabilities which make a direct comparison difficult to be done. In this paper, we propose a taxonomy for characterizing Web data extraction fools, briefly survey major Web data extraction tools described in the literature, and provide a qualitative analysis of them. Hopefully, this work will stimulate other studies aimed at a more comprehensive analysis of data extraction approaches and tools for Web data.

Geocrawling

With the phenomenal growth of the WWW, rich data sources on many different subjects have become available online. Some of these sources store daily facts that often involve textual geographic descriptions. These descriptions can be perceived as indirectly georeferenced data – e.g., addresses, telephone numbers, zip codes and place names. Under this perspective, the Web becomes a large geospatial database, often providing up-to-date local or regional information. In this work we focus on using the Web as an important source of urban geographic information and propose to enhance urban Geographic Information Systems (GIS) using indirectly georeferenced data extracted from the Web. We describe an environment that allows the extraction of geospatial data from Web pages, converts them to XML format, and uploads the converted data into spatial databases for later use in urban GIS. The effectiveness of our approach is demonstrated by a real urban GIS application that uses street addresses as the basis for integrating data from different Web sources, combining these data with high-resolution imager

sitemaps.org

about time that the engines agree on a common sitemap format. now to take this forward with more TTL and other information, while making it hard to game.

Google Books Search Review

The most startling problem is the incorrect use of the Boolean OR operation, the simplest of all. It is taught in kindergarten that the search for A OR B cannot produce less results than the higher found for A or B. Still, the query aboulia produces 26 items, abulia yields 40, but aboulia OR abulia produces only 35. Neither can a search for A OR B produce more hits than the sum of the hits found for A and B together at most. But this is what happens as illustrated by this simple search: for books with the word arrogance in the title. It finds 2 books. The search for books with the word arrogant in the title finds 6 documents. (Minutes earlier the software produced 8 hits, and such disappearances add an additional dimension to the confusion). The search for books with arrogant OR arrogance in the title yields 13 books.

oy. it looks like google book search has trouble with simple boolean operators.

gregorrothfuss

the wonders / dangers? of unified nicks

Social Local

Bars, clothing shops, coffee shops and gift stores in the area are focusing highly on establishing popular MySpace profiles and connecting with users in their local areas. The effort was so successful that he abandoned his traditional Web site in favor of MySpace.

social networks as the new IYP?