Tag: database

Using Buttcoin as a database

people are “abusing” bitcoin to embed their own data into a distributed store:

if you are willing to waste money — albeit very small fractions like 0.00000001 bitcoins — by sending that money to invalid bitcoin addresses, you essentially have created a channel for random data transmission. The bitcoin blockchain is in one sense a massively replicated ~7GB database that stores data for all eternity.

Stream Processing

The 2nd new market to consider is stream processing. On Wall Street everyone is doing electronic trading. A feed comes out of the wall and you run it through a workflow to normalize the symbols, clean up the data, discard the outliers, and then compute some sort of secret sauce. An example of the secret sauce would be to compute the momentum of Oracle over the last 5 ticks and compare it with the momentum of IBM over the same time period. Depending on the size of the difference, you want to arbitrage in one direction or the other. This is a fire hose of data. Volumes are going through the roof. It’s business analytics of the same sort we see in databases. You need to compute them over time windows, however, in small numbers of milliseconds. So, again, a specialized architecture can just clobber the relational elephants in this market. I also believe the same statement can be made, believe it or not, about OLTP (online transaction processing). I’m working on a specialized engine for business data process- ing that I think will be about 30x than the elephants on the TPC-C benchmark.

MapReduce commentary

It is exciting to see a much larger community engaged in the design and implementation of scalable query processing techniques. We, however, assert that they should not overlook the lessons of more than 40 years of database technology — in particular the many advantages that a data model, physical and logical data independence, and a declarative query language, such as SQL, bring to the design, implementation, and maintenance of application programs. Moreover, computer science communities tend to be insular and do not read the literature of other communities. We would encourage the wider community to examine the parallel DBMS literature of the last 25 years. Last, before MapReduce can measure up to modern DBMSs, there is a large collection of unmet features and required tools that must be added.

We fully understand that database systems are not without their problems. The database community recognizes that database systems are too “hard” to use and is working to solve this problem. The database community can also learn something valuable from the excellent fault-tolerance that MapReduce provides its applications. Finally we note that some database researchers are beginning to explore using the MapReduce framework as the basis for building scalable database systems.

wherein mr. stonebraker disses mapreduce

Real database

We finally decided to go with a commercial database over the objections of a number of engineers, including myself. To ease the transition it was decided to convert AdWords over to the new system first, and to do the main ads system later. It was a project on a par with the internationalzation effort in terms of the tedious work required to comb over nearly all of the AdWords code and change all of the database queries. (Databases are supposed to all be compatible with one another, but in reality they pretty much aren’t.)

To make a long story short, it was an unmitigated disaster.

adwords runs on mysql (for the next time someone brings up the old “not for enterprise use”)

Bridging the X-O-R impedance mismatch?

looks like a worthwhile effort on the road to eventually unifying the dominant 3 data models of today, XML, objects and relational. at a first glance, it seems to sidestep some of the really hard problems by defining a zoo of new types to accommodate these 3 worlds. the eventual solution will have to do better than this, but it’s a start. groovy seems to have similar features for the java world.

RAIDb

just learned about RAIDb

RAIDb stands for Redundant Array of Inexpensive Databases. This acronym has been used in reference to the RAID (Redundant Array of Inexpensive Disks) concept that achieves scalability and high availability of disk subsystems at a low cost. RAIDb aims at providing better performance and fault tolerance than a single database by combining multiple inexpensive database instances into an array of databases.

C-JDBC is a database cluster middleware that implements RAIDb, LGPL licensed.

Java odbms

ozone is a fully featured, object-oriented database management system completely implemented in Java and distributed under an open source license. The ozone project aims to evolve a database system that allows developers to build pure object-oriented, pure Java database applications. Just program your Java objects and let them run in a transactional database environment.