The Boulder BI Brain Trust


June 2009 Archives

Greenplum redefines DW in the Enterprise Data Cloud

| | TrackBacks (0)
greenplum logo.pngGreenplum (GP) has launched an initiative on the Enterprise Data Cloud (EDC) that "virtualizes the analytic infrastructure over a pool of resources and giving users the power, through self-service, to instantiate their own database instances, without affecting other users." Presenting their strategies behind this initiative are: Paul Salazar, VP of Corporate Marketing and Ben Werther, Director of Product Management.

Paul summarizes their direction as pursuing EDC, wherever that might lead! Currently GP has 100 employees, $28M in funding, and 65 customers, such as NYSE, Fox Interactive Media, iCrossing, Bakrie Telecom, Reliance Communications. Since Oracle now owns Sun and Sun is an investor in GP, the issue of Oracle's control of GP was raised. Ben's answer is that the investment is minor and without Board seat, so that Oracle is not involved in confidential matters.

Ben gave an overview into scalable DBMS. Greenplum positioning is that they are unique in commodity everything for the hardware.

GP supports MapReduce, scalable MPP analytics defined by Google. A recent study by Stonebraker and DeWitt showed that MapReduce did not provide performance advantages over current SQL database engines. Ben confirmed this but added that MapReduce is a different paradigm that cater to analytic programmers. To perform similar functionality in SQL is difficult and time-consuming, even though equivalent performance is possible.

Finally we focused on the new EDC initiative, which addresses two problems: handling the ever-exploding scale of data, and the limitations of one big data warehouse. The key to this EDC approach is self-service, which is a shift of power and control to business users. GP asserts that EDW can not move at the speed of the business and actually fosters fragmented data silos. EDC allows the quick and easy provisioning of a new data marts/warehouses, all under IT control. Support the physical consolidation of data and allow its logical unification occur over time. GP is initially focusing on data mart consolidation and project sandboxes, both of which are commanding the attention of corporate IT.

My Take... GP is naive in positioning their EDC as a replacement to the traditional EDW, as a new view of corporate data. Ben countered that GP's goal was not to displace EDW but instead manage the other 90% of data that will not be incorporated into the EDW. This then surfaces issues about the nature of that 90% and benefits/costs of its incorporation (or not) into the EDW. Further, what really is the business justification for "consistent views of business reality" as embodied in the traditional EDW, as opposed to a "single integrated view" or "various divergent views". 

GP is correct in a sense of urgency to extend the scale and usage of corporate data. However, they are currently addressing only a limited solution for advanced analytical users, who are a HUGE important force but in a small segment of progressive corporations. Claudia remarked, "Their current message on EDC was making IT the arms dealer and giving guns to untrained users."

Absent is a vision for the enterprise data architecture that builds upon EDW investment by allowing the unification of information across many sources of data and many degrees of validity. EDC should be positioned as an enhancement or extension to EDW, not its replacement. GP needs to provide unification facilities to balance those provisioning facilities.

EDC is the proper direction, given business challenges and technology advances. However, GP is prematurely launching this direction as a marketing initiative. GP needs the story for IT so that IT will "jump all over it" as Ben stated.

Toward the end of our discussion, Ben presented the technical stack for EDC, as shown in this figure. Note the middle layer for EDC Platform Services...which are the unification facilities needed. Unfortunately GP is not supporting all these components currently but has a roadmap to do so in the future. My feeling is that we are seeing the beginning of a long journey. EDW and Cloud Computing are desirable dancing partners; however, the dance is yet to be choreographed.

Bottom line is that Greenplum is positioning themselves in an exciting new direction. With its heavy technical abilities and progressive leading-edge customers, Greenplum is definitely a player to follow.

Sand Technology -

| | TrackBacks (0)
sand_logo.gifArthur Ritchie President and CEO of Sand Technology along with Linda Ahrens VP Global Alliances and Marketing are here in Boulder this morning. Sand is a software solution that is designed to keep ALL OF YOUR DATA in an instant near line, high-performance system that allows highly scalable analytics.

Arthur kicked things off discussing the The "Humpty Dumpty" enterprise data warehouse and challenging us with new ways to look at BI. Anyone who believes in a single version of the truth is crazy dictator according to Art and the point he makes is that the believing in the notion of "Single Version of the Truth" is the core of what restricts us from learning and deriving value from business intelligence. Art believes that what you really have is a series of boundary conditions and this is where the value can be found. The challenge is building a system that gives you comprehensive view and access to these boundary conditions. Along with preconceived ideas around the single version of the truth analytics are generally throttled or disabled because systems can't support exploratory business intelligence the IT team steps in to stop long queries or at best backlogs them. Sand sees these two areas as the niche where they can deliver great value to the enterprise.The ability to drill to the detail level on any and all data.

Sand/DNA solution sits between the Operating Systems collecting atomic level data before its transformed and loaded into the EDW. The DNA product also denormalizes the atomic data so that a user can understand and see what is available. The Sand/DNA Access solution sits on the top side of the EDW providing a platform to access the atomic level information that normally can't be retrieved from and EDW. This supports data mining, predictive analytics. As I listen to Art I was starting to think they are much like an analytic appliance but the difference is that the appliances can't or don't crunch atomic level data and can't support 100's of users at one time.

The ability to store atomic level information not only supports interesting BI and predictive analytic opportunities but it also severs as a great source for the auditing and data governance initiatives that many companies are attempting to solve. Additionally the ability not to load detail/atomic data in the EDW will reduce the cost involved with the system and eliminate some of the strain power users are already putting on the system.

It was a fun session, Thanks Art and Linda for joining us today!



This page is an archive of entries from June 2009 listed from newest to oldest.

May 2009 is the previous archive.

July 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.