The Boulder BI Brain Trust

 

Recently in Vendor Briefing Category

Today, Friday the 13th of May, 2011, the Boulder BI Brain Trust heard from Larry Hill [find @lkhill1 onTwitter] and Rohit Amarnath [find @ramarnat on Twitter] of Full360 [find @full360 on Twitter] about the company's elasticBI™ offering.

Serving up business intelligence in the Cloud has gone through the general hype cycles of all other software applications, from early application service providers (ASP), through the software as a service (SaaS) pitches to the current Cloud hype, including infrastructure and platform as a service (IaaS and PaaS). All the early efforts have failed. To my mind, there have been three reasons for these failures.

  1. Security concerns on the part of customers
  2. Logistics difficulties in bringing large amounts of data into the cloud
  3. Operational problems in scaling single-tenant instances of the BI stack to large number of customers

Full360, a 15-year-old system integrator & consultancy, with a clientele ranging from startups to the top ten global financial institutions, has come up with a compelling Cloud BI story in elasticBI™, using a combination of open source and proprietary software to build a full BI stack from ETL [Talend OpenStudio as available through Jaspersoft] to the data mart/warehouse [Vertica] to BI reporting, dashboards and data mining [Jaspersoft partnered with Revolution Analytics], all available through Amazon Web Services (AWS). Full360 is building upon their success as Jaspersoft's primary cloud partner, and their involvement in the Rightscale Cloud Management stack, which was a 2010 winner of the SIIA CODiE award, with essentially the same stack as elasticBI.

Full360 has an excellent price point for medium size businesses, or departments within larger organizations. Initial deployment, covering set-up, engineering time and the first month's subscription, comes to less than a proof of concept might cost for a single piece of their stack. The entry level monthly subscription extended out for one year, is far less than an annual subscription or licensing costs for similar software, considering depreciation on the hardware, and the cost of personnel to maintain the system, especially considering that the monthly fee includes operations management and a small amount of consulting time, this is a great deal for medium size businesses.

The stack being offered is full-featured. Jaspersoft has, arguably, the best open source reporting tool available. Talend Open Studio is a very competitive data integration tool, with options for master data management, data quality and even an enterprise service bus for complete data integration from internal and external data sources and web services. Vertica is a very robust and high-performance column-store Analytic Database Management System (ADBMS) with "big data" capabilities that was recently purchased by HP.

All of this is wonderful, but none of it is really new, nor a differentiator from the failed BI services of the past, nor the on-going competition today. Where Full360 may win however, is in how they answer the three challenges that caused the failure of those past efforts.

Security

Full360's elasticBI™ handles the security question with the answer that they're using AWS security. More importantly, they recognized the security concerns as one of their presentation sections today stated, "Hurdles for Cloud BI" being cloud security, data security and application security. All three of these being handled by AWS standard security practices. Whether or not this is suficient, especially in the eyes of customers, is uncertain.

Operations

Operations and maintenance is one area where Full360 is taking great advantage of the evolution of current Cloud services best known methods and "devops" by using Chef opscode recipes for handling deployment, maintenance, ELT and upgrades. However, whether or not this level of automation will be sufficient to counter the lack of a multi-tenant architecture remains to be seen. There are those that argue that true Cloud or even the older SaaS differentiators and ability to scale profitably at their price-points, depends on multi-tenancy, which causes all customers to be at the same version of the stack. The heart of providing multi-tenancy is in the database, and this is the point where most SaaS vendors, other than salesforce-dot-com (SFDC), fail. However, Jaspersoft does claim support for multi-tenant architecture. It may be that Full360 will be able to maintain the balance between security/privacy and scalability with their use of devops, and without creating a new multi-tenant architecture. Also, the point of Cloud services isn't the cloud at all. That is, the fact that the hardware, software, platform, what-have-you is in a remote or distributed data center isn't the point.  The point is the elastic self-provisioning. The ability of the customer to add resources on their own, and being charged accordingly.

Data Volume

The entry-level data volume for elacticBI™ is the size of a departmental data mart today. But even today, successfully loading into the Cloud, that much data in a nightly ETL run, simply isn't feasible. Full360 is leveraging Aspera's technology for high-speed data transfer, and AWS does support a form of good ol' fashioned "sneaker net", allowing customers to mail in hard drives. In addition, current customers with larger data volumes, are drawing that data from the cloud, with the source being in AWS already, or from SFDC. This is a problem that will continue to be an "arms race" into the future, with data volumes, source location and bandwidth being in a three-way pile-up.

In conclusion, Full360 has developed an excellent BI Service to suplement their professional services offerings. Larger organizations are still wary of allowing their data out of their control, or may be afraid of the target web services provide for hackers, as exemplified by the recent bank & retailer email spammers, er marketing, and Sony break-ins. Smaller companies, which might find the price attractive enough to offset security concerns, haven't seen the need for BI. So, the question remains as to whether or not the market is interested in BI in the Cloud.

Calpont InfiniDB: The Good, The Bad and The Ugly

| | TrackBacks (0)

Last week the BBBT was visited by startup database company Calpont, although 'startup' hardly fits the bill considering the fact that the company was founded in 2000. I won't go into the company nor the BBBT session details; you can find the last one here. What I do want to blog about is the product itself. Infinidb comes in both a Community Edition (CE) and an Enterprise Edition (EE), where the latter contains MPP capabilities, monitoring tools and different support options. For more details, visit the comparison page. Since I don't have an MPP cluster available I took the CE for a spin and thought I'd share the results.

The Good

As you might have noticed on the edition comparison page, the core server features are the same for both CE and EE. That's good news compared to direct competitor Infobright. Infobright has stripped out DML (insert/update/delete) capabilities from the Community Edition. Some people would call this 'cripple ware' as the only way to update a datawarehouse table is to drop, recreate and reload it. 1-0 for InfiniDB already.

More on the good side: the installation process. The product installs in minutes and is only a 12.7 MB download. I downloaded the 64bit RPM version which requires two steps: extract the zip file and run the command rpm -i Calpont*.rpm as root. This will install the software in the default location /usr/local/Calpont. Then invoke the script  /usr/local/Calpont/bin/install-infinidb.sh, and configure the InfiniDB Aliases with . /usr/local/Calpont/bin/calpontAlias and you're good to go. In my case: almost good to go since the data directories are now under /user/local/Calpont which is not my 12 SDD disk raid set. Simply mounting the data device to /usr/local/Calpont/data1 solves that problem too. Alternatively, you can of course move the data1 directory to a different location and create a symlink in the original spot. The command service infinidb start fires up the database engine and invoking idbmysql gives you command line access to the database. Remember, it's all MySQL so if you're familiar with that, working with InfiniDB is a breeze.

The MySQL thing is another piece of the goodies: all front end tools, including the MySQL workbench (query browser, admin console) can be used with InfiniDB as well. The same JDBC drivers you already have for MySQL can be used with InfiniDB as well. The only difference when creating a new table is the fact that you should specify InfiniDB as the engine, but that's about it.

The last item I'd like to mention here is the bulk loader. The TPC-H benchmark consists of 22 queries and the database contains 8 tables. The data files can be created using the dbgen utility and will generate pipe delimited text files with a .tbl extension. The default settings InfiniDB uses for bulk loading text files are also the pipe delimiter and a .tbl extension, what a convenience!  Other than that, the file names have to be named exactly as the tables you want to load (so customer.tbl is data for the table customer) and placed in the Calpont data import directory. Invoking the command colxml will create a bulk loader import job based on these files and the meta data from the tables in the database. To start the import, simply run the cpimport command and InfiniDB starts loading. This may sound more complex than it actually is, so trust me, it couldn't be simpler. Or faster, for that matter: loading the 100GB data set took only about 25 minutes, a new record on my machine!

The Bad

There's actually only one 'bad' thing about the current version of InfiniDB: it's not finished yet. Yes, it works, it's fast (more about that later), but there are still a couple of serious limitations. The most notable of these, at least when running a TPC-H benchmark, is the support for subqueries. Version 1.1.0 alpha didn't support any form of subquery, so even a select * from table where column in (select othercolumn from othertable) couldn't be run. Version 1.1.1 alpha, released on April 23, solved this last one, but more complex subquery constructs or correlated subqueries are not yet supported. The upgrade from 1.1.0 to 1.1.1 enabled InfiniDB to complete 10 of the 22 TPC-H queries, instead of the only 5 it could run a week ago. But, as I've said, this should only be a temporary problem. The roadmap shows that in a month or so, phase two of the subquery support should be available in the next alpha release, with GA (General Availability) for version 1.1. set at early July. By then we can have a look at the complete run and see how it behaves, also when multiple threads are running in parallel.

The Ugly

Calpont uses the 'no indexes needed' as one of the key benefits of the product; I tend to disagree on that one. It's nice that you don't need to explicitely specify indexes, but when a DBMS doesn't support any constraints AT ALL, well, that's plain ugly. Want to enforce a NOT NULL contraint? Bad luck. Primary/foreign key relationships? Ditto. You could argue that these features are not really mandatory in a data-warehouse, but without constraint and index support all the constraint enforcement must be built into the ETL process. Another issue that will hopefully be solved in the final release is the insert/update speed. Bulkloading is indeed fast, but only 40 rows per second using a regular insert or update won't cut it in the real world.

The $64,000 question...

There is actually only one single reason why anyone would want to use a column store like InfiniDB in the first place: performance! So the main question is: does it deliver? Yes, it does. Compared to MySQL the performance improvement is no less than spectacular. In fact, to date nobody has been brave (or patient) enough to try an SF100 TPC-H on MySQL so a direct comparison is not even available. There are however plenty of other comparisons that can be made. The 10 queries that do run already all outperform Greenplum single node edition (except for query11) for instance. Some queries are somewhat faster (Q10, Q12, Q18), some are 3-4 times faster (Q1, Q3, Q4, Q14, Q16), and query 6 is more than 20 times faster. For a disk based analytical database (InfiniDB doesn't seem to take as much advantage of memory as other products I evaluated) it's really, really fast. Query 1 is always a good indicator since it forces a full table scan on the largest fact table (600 million rows in this case). If you can do this in under a minute on my moderate hardware, you do have a potential winner.

Conclusion

My initial thoughts about InfiniDB when I first tested it weren't very positive, to say the least. But, given the fact that they are moving in the right direction and have kept their promised delivery dates so far, combined with the ease of installation, ease of use (it's all MySQL) and of course the already great performance, a second look is certainly warranted. Given the limitations of the direct competitors (Kickfire with its proprietary hardware, Infobright with its crippled community edition and lack of MPP/scale out capabilities), InfiniDB should be on the top of your shortlist when looking for a MySQL based data-warehouse solution. When it's finished, of course.


Jos van Dongen,Tholis Consulting

Welcome to the first of what we hope will be many BBBT blogs resulting from the collaboration of BBBT members.  Several members were on the April 15, 2010 briefing held by Microsoft for SQL Server 2008 Release 2 (SS08R2). 

We came away deeply concerned:

Microsoft has a scattered message, confused pricing, and serious challenges in product integration.

The following major points sum up our assessment.

   

 

This page is a archive of recent entries in the Vendor Briefing category.

Sales & Marketing is the previous category.

Find recent content on the main index or look in the archives to find all content.

Vendor Briefing: Monthly Archives