Tag: mapreduce

Let’s build a modern Hadoop

This is interesting, as a timestamp of what’s generally possible now.

If you’ve been around the big data block, you’ve probably felt the pain of Hadoop, but we all still use it because we tell ourselves, “that’s just the way infrastructure software is.” However, in the past 10 years, infrastructure tools ranging from NoSQL databases, to distributed deployment, to cloud computing have all advanced by orders of magnitude. Why have large-scale data analytics tools lagged behind? What makes projects like Redis, Docker and CoreOS feel modern and awesome while Hadoop feels ancient?

MapReduce commentary

It is exciting to see a much larger community engaged in the design and implementation of scalable query processing techniques. We, however, assert that they should not overlook the lessons of more than 40 years of database technology — in particular the many advantages that a data model, physical and logical data independence, and a declarative query language, such as SQL, bring to the design, implementation, and maintenance of application programs. Moreover, computer science communities tend to be insular and do not read the literature of other communities. We would encourage the wider community to examine the parallel DBMS literature of the last 25 years. Last, before MapReduce can measure up to modern DBMSs, there is a large collection of unmet features and required tools that must be added.

We fully understand that database systems are not without their problems. The database community recognizes that database systems are too “hard” to use and is working to solve this problem. The database community can also learn something valuable from the excellent fault-tolerance that MapReduce provides its applications. Finally we note that some database researchers are beginning to explore using the MapReduce framework as the basis for building scalable database systems.

wherein mr. stonebraker disses mapreduce