Aster Data Systems presented their background and future plans by
Steve Wooledge, Director of Marketing and
Shawn Kung, Sr. Director of Product Management. The company was founded 2005 by three Stanford doctoral colleagues and were
in stealth mode until May 2008. The engineering team is strong with 26
persons, 13 of whom are at the Ph.D.-level. Clients include Akamai,
MySpace, Share-This and a few more.
Aster Data Systems focuses on "software-only relational DBMS for
frontline data warehousing", striving for "always parallel" processing and for "always on" operations. They argued that their product -
Aster nCluster 3.0 - allows smooth incremental scaling to avoid costs in excess capacity.
Steve presented an overview of one of their largest clients -
MySpace -
having 118M users who generate 7B events in 2-3 TB per day, doing a
high-frequency batch load (15 minutes per hour). It takes several
thousands servers to support the data flow into the
frontline data warehouse, consisting of 100 nodes with 400 TB capacity.
They finally got around to defining
Frontline Data Warehouse
(FDW). Wow! What a discussion... Aster is essentially arguing to fork
the application data inflow, close to the customer-touch applications,
as shown below.
As
Claudia noted... Is FDW just a BIG operational data store? In addition,
there are several intermixed issues. First, what is the
scope of the subject areas in the FDW? How does it overlap with the
EDW? Second, isn't FDW duplicating the data quality/cleansing
processing. And third, the FDW is support rapid feedback back to the
applications. This last issue seems to be the business justification
for this hybrid approach.
The unique feature as Aster is their
in-database
implementation of MapReduce, which is a parallel data flow approach to
DBMS. This is a very interesting topic, since it uncovers a paradigm
shift beyond SQL. MapReduce allows the application programmer to push
their code closer to the data, roughly like a SQL User-Defined
Function. But doing so, with widely used languages, like Java, Python
and Perl. Thus, very sophisticated analytics can operate directly on
the data. A question that bother me was the intellectual property
rights surrounding MapReduce? Can anyone comment on this?
I highly recommend to my DW colleagues to read up on
MapReduce and, especially, the
Clarement Report on Database Research.
I suggested a marketing slogon "Aster picks up where SQL lets you down". If Aster uses it, they owe me a nickle per usage.
Oh, finally... Watch for an announcement from Aster in a few weeks...