discussed their directions in Data Warehouse and Analytics with Bill O'Connell
, Distinguished Engineer and the CTO for Warehousing Solutions. This is an update from his BBBT visit
on October 9, 2009.
Bill started with a broad overview of IBM's efforts in DW and Analytics, which can be overwhelming with all the technologies, companies, products, and programs involved, along with periodic name changes. We had our usual discussion about defining 'analytics' that surfaced the usual issues, without much resolution. Bill showed the evolution of DW, as shown to the right.
The IBM Smart Analytics System (ISAS) is a packaging of analytics software, data warehouse, and hardware platform. Bill discuss the MPP architecture behind ISAS, as shown on the left. ISAS versions can be powered by System x, Power System, or System z. Interesting point was that there is so much computation power that the bottleneck is I/O speeds, even with random-access disk!
Quickly went through MPP architectures, hardware design, query optimization, partitioning, compression, scan performance, workload management, database monitoring, multi-temperature data, Oracle function support, ISAS 7700 hardware configuration, solid-state disk advantages, and then...
IBM InfoSphere Warehouse powered by DB2 was described in detail, as shown at the right. The Design Studio has many different modules, such as Data Architect, SQL Warehouse Tool, Multi-Dimensional Cubing Services, and Data Mining. Bill highlighted the support for data preparation for a Data Mining analysis as part of the Design Studio to define inputs, transformations, SQL code, and data flows. He also showed an example of integrating forecast by SPSS and wrapped up with analysis of unstructured data.
Bill concluded with IBM approach to handling Big Data for data intensive analytics. IBM took the open-source of Hadoop
and built JAQL
. This result in IBM's InfoSphere Big Insights
. The focus is analytics on enormous volumes (much more than can be handled by today's database) of data at rest. In contrast, InfoSphere Streams
does analytics on data in motion.
Today was an intensive overview of IBM's work in data warehousing and business analytics. There is so much, and it is continuously changing and evolving.