Today, Friday the 13th of May, 2011, the Boulder BI Brain Trust heard from Larry Hill [find @lkhill1 onTwitter] and Rohit Amarnath [find @ramarnat on Twitter] of Full360 [find @full360 on Twitter] about the company's elasticBI™ offering.
Serving up business intelligence in the Cloud has gone through the general hype cycles of all other software applications, from early application service providers (ASP), through the software as a service (SaaS) pitches to the current Cloud hype, including infrastructure and platform as a service (IaaS and PaaS). All the early efforts have failed. To my mind, there have been three reasons for these failures.
Security concerns on the part of customers
Logistics difficulties in bringing large amounts of data into the cloud
Operational problems in scaling single-tenant instances of the BI stack to large number of customers
Full360, a 15-year-old system integrator & consultancy, with a clientele ranging from startups to the top ten global financial institutions, has come up with a compelling Cloud BI story in elasticBI™, using a combination of open source and proprietary software to build a full BI stack from ETL [Talend OpenStudio as available through Jaspersoft] to the data mart/warehouse [Vertica] to BI reporting, dashboards and data mining [Jaspersoft partnered with Revolution Analytics], all available through Amazon Web Services (AWS). Full360 is building upon their success as Jaspersoft's primary cloud partner, and their involvement in the Rightscale Cloud Management stack, which was a 2010 winner of the SIIA CODiE award, with essentially the same stack as elasticBI.
Full360 has an excellent price point for medium size businesses, or departments within larger organizations. Initial deployment, covering set-up, engineering time and the first month's subscription, comes to less than a proof of concept might cost for a single piece of their stack. The entry level monthly subscription extended out for one year, is far less than an annual subscription or licensing costs for similar software, considering depreciation on the hardware, and the cost of personnel to maintain the system, especially considering that the monthly fee includes operations management and a small amount of consulting time, this is a great deal for medium size businesses.
The stack being offered is full-featured. Jaspersoft has, arguably, the best open source reporting tool available. Talend Open Studio is a very competitive data integration tool, with options for master data management, data quality and even an enterprise service bus for complete data integration from internal and external data sources and web services. Vertica is a very robust and high-performance column-store Analytic Database Management System (ADBMS) with "big data" capabilities that was recently purchased by HP.
All of this is wonderful, but none of it is really new, nor a differentiator from the failed BI services of the past, nor the on-going competition today. Where Full360 may win however, is in how they answer the three challenges that caused the failure of those past efforts.
Security
Full360's elasticBI™ handles the security question with the answer that they're using AWS security. More importantly, they recognized the security concerns as one of their presentation sections today stated, "Hurdles for Cloud BI" being cloud security, data security and application security. All three of these being handled by AWS standard security practices. Whether or not this is suficient, especially in the eyes of customers, is uncertain.
Operations
Operations and maintenance is one area where Full360 is taking great advantage of the evolution of current Cloud services best known methods and "devops" by using Chef opscode recipes for handling deployment, maintenance, ELT and upgrades. However, whether or not this level of automation will be sufficient to counter the lack of a multi-tenant architecture remains to be seen. There are those that argue that true Cloud or even the older SaaS differentiators and ability to scale profitably at their price-points, depends on multi-tenancy, which causes all customers to be at the same version of the stack. The heart of providing multi-tenancy is in the database, and this is the point where most SaaS vendors, other than salesforce-dot-com (SFDC), fail. However, Jaspersoft does claim support for multi-tenant architecture. It may be that Full360 will be able to maintain the balance between security/privacy and scalability with their use of devops, and without creating a new multi-tenant architecture. Also, the point of Cloud services isn't the cloud at all. That is, the fact that the hardware, software, platform, what-have-you is in a remote or distributed data center isn't the point. The point is the elastic self-provisioning. The ability of the customer to add resources on their own, and being charged accordingly.
Data Volume
The entry-level data volume for elacticBI™ is the size of a departmental data mart today. But even today, successfully loading into the Cloud, that much data in a nightly ETL run, simply isn't feasible. Full360 is leveraging Aspera's technology for high-speed data transfer, and AWS does support a form of good ol' fashioned "sneaker net", allowing customers to mail in hard drives. In addition, current customers with larger data volumes, are drawing that data from the cloud, with the source being in AWS already, or from SFDC. This is a problem that will continue to be an "arms race" into the future, with data volumes, source location and bandwidth being in a three-way pile-up.
In conclusion, Full360 has developed an excellent BI Service to suplement their professional services offerings. Larger organizations are still wary of allowing their data out of their control, or may be afraid of the target web services provide for hackers, as exemplified by the recent bank & retailer email spammers, er marketing, and Sony break-ins. Smaller companies, which might find the price attractive enough to offset security concerns, haven't seen the need for BI. So, the question remains as to whether or not the market is interested in BI in the Cloud.
Last week I attended Strata, a conference organized by O’ Reilly and devoted to big data. I was a large conference (790 attendees) whose content included both technical talks and tutorials about the new generation of big data tools, e.g., Hadoop, Cassandra, visualization, as well presentations on big data business applications. The diversity and size of the audience and the reported business successes provided a strong indication of how important and popular the area of big data has become.
Big data is pervasive in many of the companies Trident has funded the last few years. We have invested in companies that generate and/or process big data, e.g., eXelate, Extole, HomeAway, Sojern, Turn, Xata, as well as companies that provide platforms for storing, managing and analyzing big data, .e.g., Acteea, Host Analytics, Pivotlink. We recognize that many of the companies we invest in the future will need to have competence in big data.
There is a big difference between big data and data warehousing stemming primarily from the nature of the data. Data warehousing was all about analyzing transactional data that was captured from enterprise applications such an ERP or POS system. In addition to the actual transactions, big data is about capturing, storing, managing and analyzing data about the behavior of transactions, i.e., what happens before and after a transaction. This has several implications. First it means that the captured data is less structured. It is easier to analyze a collection of purchasing transactions in order to try to identify a pattern, instead of analyzing a series of selections made across of set of web pages to establish a pattern of behavior. Second it implies that meaning must be extracted from events, e.g., the browsing activity prior to buying an item. To be effective in this more open-ended exploratory data analysis one has to break through the data silos that are typically found in enterprises and bring all available data to bear. It also means that one must be collecting all available data rather than trying to decide a priori which data to collect and keep.
Data science is becoming a field. Big data is eliminating the segregation between the people who manage the data, the people who analyze the data, and the people who present/visualize the data. A good data scientist must be able to do all three, though, as I wrote last week, translating business requirements to a data problem and the resulting insights to business actions and value remain largely missing skills in data scientists. Good data scientists are in high demand, as indicated by the jobs being advertised at the conference and as reported at the conference by LinkedIn. They are expected to play a significant role on how their companies evolve. That’s not something we were used to hearing about data analysts who were always considered fixtures of the back office. I know because I started my career in data analysis.
Corporations have a lot to learn about big data from consumer-oriented companies that generate, manage and analyze big data, e.g., Amazon, eBay, Facebook, Twitter, and LinkedIn to name a few. This is a reversal of sorts. In the mid 90s when I was with IBM I was running an organization that was devoted to building data warehouses and providing analytical tools and services to Global 1000 companies. At that time various companies, including many of the then nascent Internet companies, were trying to learn from the data warehousing and business intelligence practices of Walmart, Citibank, and First Data. Today such companies will do well by trying to understand and apply the big data techniques being developed by many internet and social media companies. One big difference is how such companies approach data stores. Traditional businesses see the enterprise data warehouse as storing the “single version of truth” about the data. Big data stores are viewed as containing multiple perspectives. Their contents must be analyzed with the right set of tools in order to gain a perspective about the problem at hand.
Talking to the conference’s attendees I got the impression that more companies than ever before are starting to view data as an invaluable asset and a potential key to their success. They are no longer intimidated by data volumes and are using the new generation of big data management and analysis tools to bring more data under their control.
Strata was a great conference that brought under one roof the leaders in big data thinking, and doing. It also showed that, though increasingly important, this is still a small community and in many respects its overall size has not changed since the time I was one of the analysts. We all need to find ways to accelerate the education and introduction to market of new data scientists. The ability of many companies to continuously innovate, become leaders, and remain in this position could largely depend on their ability to recruit data scientists who can effectively exploit their big data assets.
Last week my partners and I hosted a meeting of our IT Advisory Board. This board consists of senior IT executives from Global 2000 companies including CIOs and CTOs. I will write about the topics discussed in this meeting in a few days, once I had the opportunity to clean up the several pages of notes I took. Today I wanted to relate one of the conversations I had with a couple of the board’s members during one of the meeting’s breaks. We started talking about the effective utilization of business analytics by companies. Both executives commented that their companies are increasing the utilization of analytics to understand their consumer customers. In fact, one of them stated that during the last 3 years their analytics group grew from 4 analysts to 20. Moreover, they stated that senior business management in their companies is more sensitized to the importance of analyzing corporate data to gain any type of competitive advantage. As I tried to ask them about the analytic tools and solutions their companies currently employ and how these have been changing over time to deal with the increasing volumes of data, they both stopped and said that their biggest obstacle for the broader utilization of business analytics was not technology but the proper and effective application of the information they extract from the analyses they perform.
Both of these companies have always been early adopters of analytic and associated data management technologies. Both executives indicated that they feel particularly good about the caliber of their corporate data analyst groups. However, today, by their admission, both of their companies lack the people who can provide a “two-way translation,” i.e.., first to properly translate a business problem to a data analysis problem (that can subsequently be tackled by the quants), and second to formulate (or re-translate) the analysis results in a way that business executives will understand, appreciate, and be able to act on. Companies that provide business intelligence (BI), analytics and even data warehousing solutions talk about how they target “business analysts.” The business analyst has become a mythical position in corporations. The business analyst is generic description for an individual who works for a business unit, as opposed to an IT organization, and uses such analytic solutions in the course of business. However, our advisory board members said that the business analysts in their companies use such solutions to address well-understood problems and activities, not novel situations that may call for an data-driven analytic approach. These business analysts may use a query and reporting solution like Qliktech’s or a multidimensional analysis solution such as Microstrategy’s but they are not able to provide the two-way “translation” I described above. In their opinion, the right translator/analyst must have the appropriate level of business understanding and experience to understand complex business problems in their entirety, be articulate enough to describe it appropriately to data analysts, the right amount of data knowledge to be able to broadly identify the types of data that will need to be utilized by the quants to provide insights and information, the ability to take these insights and relate them to the original problem providing actionable solutions and finally the executive gravitas to present these solutions to the business executive(s) who will act on them.
There exist independent consultants who play this two-way analytics translator role very effectively but their extensive and continuous use by corporations is not feasible, mostly for financial reasons; they are too expensive and in too high demand. In discussions I’ve had with some members of IBM’s 7000-strong analytics and optimization consulting unit I heard that those of their consultants who can provide such services are in the highest demand. In fact IBM can’t find enough of them to hire and deploy with corporate clients around the world.
So while we celebrate the development of new analytic tools or solutions that can deal with even larger and more complex quantities of data that must be processed faster than ever before, we must not lose sight of the fact that we must address our inability to identify enough “translators” who can help us analyze the right problems and effectively use the insights we discover.
In 1996, while I was running IBM’s BI solutions organization, one of my groups developed the Surfaid web analytics solution. Surfaid, one of the first such solutions, was later acquired by Coremetrics (that was in turn recently acquired by IBM and made part of its marketing automation solution). Later on Omniture dominated the high-end web analytics market by figuring out the right ingredients of a web analytics solution (quick set up, effective data collection and management, informative reports for the emarketer) and the right model for delivering it (SaaS). Google also entered the market and dominated the low-end with a free offering. The evolution of this market over the past 10 years has taught us that web analytics will remain a relatively small component of the overall analytics and BI market.
Ecommerce marketers and merchandisers were some of the earliest adopters of web analytics. However, the continued growth of ecommerce combined with the increasing complexity of the decisions these business users must make is causing retailers (both pureplay and multi-channel) to look for more sophisticated analytic solutions than those offered by the web analytics vendors, or the ecommerce platform vendors, e.g., Demandware and ATG (recently acquired by Oracle), could offer. While web analytics solutions can be used to determine which landing pages encourage customers to make a purchase, or which pay per click ad campaigns are most effective, ecommerce marketers and merchandisers want to now understand customer loyalty, the impact of their customer retention strategies (discounts, coupons, extra services), and the customer segments with the greatest Lifetime Value (LTV). Today’s web analytics solutions cannot address these needs.
During the summer of 2009 Infopia, one of my portfolio companies and an ecommerce platform company at the time, asked the larger of its 300 etailers about their analytics needs. The company found out that while all of its customers were using a web analytics solution, none were able to address their particular ecommerce decision needs through these solutions. The market demand for a sophisticated ecommerce analytic application was so strong that the company’s board decided to direct significant resources towards the development of such an application. A year later the company sold its ecommerce platform business to Versata, changed its name to Acteea, did a pivot and is now a SaaS ecommerce analytic application company. Acteea’s SaaS analytic application enables merchandisers and marketers to track customer LTV and define winning product and customer segment strategies.
Acteea’s analytic application integrates into a cloud-based data mart pricing data, marketing campaign effectiveness data, adword data (words bid and words purchased), customer order data, inventory data, web site activity data (what is typically fed to the web analytics solutions), customer activity data from other channels, e.g., catalog, promotions data, and competitor pricing data. Data integration and cleaning has become a very complex process compared to the data integration that web analytics solutions must address. Extole, another of my portfolio companies, has developed a SaaS platform for social marketing that is quickly being adopted by etailers. The data produced by the platform’s applications (3 to date) will undoubtedly become another source to Acteea’s analytics as it can help etailers with decisions around social commerce.
In a recent board meeting we reviewed some of the early successes Acteea’s customers are having through the use of the company’s analytic application. For example, one of the companies analyzed keyword, adword, web analytics, pricing and product catalog data, refined its adword bidding approach and identified “driver” products that drew existing target customers into making add-on sales. Another customer began measuring the total return on marketing investment, and cart value by customer segment which led to revamping customer segmentation based on channel-loyalty, and cross-channel behavior leading to improved quarter-over-quarter sales. Finally, a third customer analyzed product sales and gross margin return on inventory and quickly identified the lowest performing products eliminating them from the appropriate channels, including the web site.
It is still too early to tell whether Acteea’s pivot will be successful, though the initial results are encouraging. Regardless, the company’s work is proving the market’s need for complex ecommerce analytic solutions that are distinct from the existing web analytic toolkits that have been available.
Over the past couple of years I have met with several startups that offer analytic solutions for mobile data. I have not invested in any of them. I had felt that the data captured from feature phones and early generations of smartphones was not rich enough to lead to interesting and distinct analytics. For example, while data captured from a mobile web browser such as sites visited, pageviews, time spent browsing could be analyzed, we didn’t need a new company to do that. Omniture could do that just fine. However, the new smartphones capture more interesting data. These data sets could drive the creation of a new and interesting analytics. As a result, I am becoming interested in mobile data analytics and have been actively looking for investment opportunities in this sector.
The new smartphones are becoming sensor platforms, as well as being computing platforms. In addition to photo and video camera, touch screen, GPS and accelerometer, new types of sensors are being connected to smartphones. For example, Bling Nation has introduced a sensor that adheres to a smartphone and is linked to the user’s PayPal account. Our own portfolio company Zeo has announced that it will connect its sensor to the iPhone in order to capture sleep-related data. Some of the data sets generated by all these sensors that I find interesting include:
The time-series of GPS and accelerometer data for each subscriber. By analyzing these time-series one can predict where and when the subscriber will be next and offer relevant services at the predicted location, e.g., parking availability with offers from parking garages.
Data generated from the use of augmented reality (AR) application can create new advertising opportunities, as well the opportunities to serve up relevant content the user had not thought to ask for.
Configuration data on the complete software stack running on each phone (from firmware to operating system to application software). This data can then be used, for example, by an app store to recommend new available applications that will be augment the user’s productivity. Such configuration databases today exist only in corporate IT settings.
Mobile payments data combined with geolocation data. Analyzing this data can lead to predictions about customer brand or product loyalty.
Entertainment-related applications, e.g., gaming, and health care applications, e.g., prescription dispensing, will also benefit from the analysis of this type of data. I am not certain whether new data management systems will be necessary for such data sets, though I imagine that the data will be big and complex, particularly as various time-series are captured, and will be stored in the cloud.
The wireless carriers may not be in the best position of collecting this data, not only because of their lack of experience with diverse data types, but also because they are regulated businesses. Google and Apple are in a much better position because they already collect much of this data through their Android and iOS platforms respectively. While these companies may also be best able to mine the data, they won’t enter this business in the near term. Instead it will be startups that will first experiment with creating interesting data sets out of the collected data and analyzing them. My assumption, also driving my interest in the sector, is that companies like Google will wait to see how these “experiments” go and proceed to acquire the more interesting of the analytics startup companies.
Users will need to give their permission for this rich data to be collected and combined. Vendors, including wireless carriers, will get the users’ permission by offering free services (something for which consumers have shown interest and affinity), better experience (optimized bandwidth, improved application performance, more accurate recommendations around applications, products, services, social connections, etc), and more accurate targeting of ads in ad-supported services.
The mobile space remains highly fragmented and the talent to create and analyze these data sets may be hard to find. The new smartphone platforms present opportunities for collecting valuable data sets that will lead to the development of unique analytics which will in turn drive important and novel decisions. Startups can lead the way to create these analytics and the enterprise platforms that manage them.
The survey data presented in last August’s Pacific Crest SaaS workshop pointed to the need for a variety of data analytic services. These services that can be offered under, Insight-as-a-Service, can range from business benchmarking, e.g., compare one business to its peers’ that are also customers of the same SaaS vendor, to business process improvement recommendations based on a SaaS application’s usage, e.g., reduce the amount spent on search keywords by using the SEM application’s keyword optimization module, to improving business practices by integrating syndicated data with a client’s own data, e.g., reduce the response time to customer service requests by crowdsourcing responses. Today I wanted to explore Insight-as-a-Service as I think it can be the next layer in the cloud stack and can prove the real differentiator between the existing and next-generation SaaS applications (see also here, and Salesforce’s acquisition of Jigsaw).
There are three broad types of data that can be used for the creation of insights:
Company data. This is the data a company stores in a SaaS application’s database. As SaaS applications add social computing components, e.g., Salesforce’s Chatter, or Yammer’s application, company data will become an even richer set.
Usage data. This is the Web data captured in the process of using a SaaS application, e.g., the modules accessed, the fields used, the reports created, even the amount of time spent on each report.
Syndicated data. This is third-party data, e.g., Bloomberg, LinkedIn, or open source, which can be integrated (mashed) with company data and/or usage data to create information-rich data sets.
Some of the issues that will need to be addressed for such services to be possible include:
Permission to use the data. For this to be possible, corporations must give permission for their company data to be used by the SaaS vendor for benchmarking. For example, if Salesforce customers are willing to make their data available then their sales forces’ effectiveness can be benchmarked against that of peer companies. It may be more likely for companies to give their permission if the data is abstracted or even aggregated in some way.
Data ownership. The ownership of usage data has not been addressed thus far. Before creating and offering insights, ownership will have to be addressed by the SaaS vendors and their customers. Once ownership is established, as I had written before, this data can, at the very least, be used by the SaaS vendor to provide better customer service or even to identify upsell opportunities and customer churn situations. While some vendors, e.g., Netsuite, are starting to utilize parts of usage data, utilization remains low and scarce.
Data privacy. Company and usage data will most definitely include details that may need to be protected and excluded from any analysis. The SaaS vendors will have to understand the data privacy issues and provide corporate clients with the necessary guarantees. Thus far SaaS vendors have only had to make data security guarantees. Privacy concerns around this data will be similar to those that currently surround the internet data that is being used to improve online advertising.
Potential need for pure-play Insight-as-a-Service vendors. The SaaS application companies may not prove capable of providing such insight services. It may be necessary to create specialized vendors to offer such services. Such pure-play vendors may have more appropriate and specialized know-how which will be reflected in their software applications (essentially analytic applications that can organize, manipulate and present insights). In addition they will be able to offer a broader range benchmarking since they will be able to evaluate data across SaaS vendors. However, having such vendors will also necessitate the move of company and usage data to yet another location/cloud thus increasing the security and privacy risks.
Eligibility for accessing these insights and business models under which they can be offered. One approach would be to only offer such insights to as a separate product by the SaaS application’s vendor to its customers. Another approach, particularly if the insights are to be created by a pure-play insights vendor, would be for such vendors to create data coops. Under this scheme corporations contribute company and usage data to the coop, the Insight-as-a-Service vendor analyzes all contributed data, and only offers the results to the companies that belong to the coop. For this service the vendor can use an annual subscription fee not unlike what industry analysts like Forrester and Gartner charge. Internet data companies such as Datalogix, that has created a coop with retail purchase data, can serve as good models to consider. Another business model may be for the vendor, either the SaaS application vendor or the Insight-as-a-Service vendor, to share revenue with the companies providing the company and usage data. Internet data exchanges like Blue Kai and eXelate would provide good business model examples to imitate.
Geography. As we’ve learned with consumer internet data, each country approaches data differently. For example, European countries are more restrictive with the use of collected data. SaaS companies must try to learn from the relevant experiences of internet data companies as they determine how to best offer such insight services.
Data normalization. Usage data will need to be normalized and then aggregated since each customer, and maybe even each individual user, uses a SaaS application differently. This could be tricky.
Hosted applications need not apply. Not all vendors will be able to offer such services. For these services to be successful, data from the entire customer base needs to be aggregated and organized. This implies that vendors claiming to offer SaaS solutions when they are only offering single-tenant hosted solutions deployed in, what amounts to, private clouds will not be able to provide such insight services. In fact, multi-tenant architectures will be even more important for insight-generation because they make data aggregation easier.
Insight-as-a-service can become the next layer of the cloud stack (following Infrastructure-as-a-Service, Platform-as-a-Service and Software-as-a-Service). In addition to SaaS application vendors that can start offering such services, there exists an opportunity to create a new class of pure-play Insight-as-a-Service vendors. Regardless, vendors will need to start addressing the issues and many more that I can’t anticipate at present. But since surveyed customers are already starting to ask for such services, it is time to start creating them. It means that the time for Insight-as-a-Service has arrived.
Last Tuesday I was invited to speak at an IBM event for technology bloggers. The event addressed two topics that represent key growth priorities for IBM: business analytics and smarter planet. IBM’s Fred Balboni, Global Business Lead for Analytics (responsible for all business analytics services at IBM), Rod Smith, VP of Emerging Technologies, and Mike Rhodin, SVP Software Solutions Group (responsible for all business analytics, BI, and collaboration at IBM) participated in the meeting, along with two IBM clients the Beacon Institute and the NY Tax Office. IBM invited seven bloggers to the event: Erick Schonfeld (Techcrunch), Derrick Harris (Gigaom), Alex Williams (ReadWriteWeb), Larry Dignan (ZDNet), Mike Vizard (IT BusinessEdge), Mike Loukides (O’ReillyMedia) and Mark Kobayashi-Hillary).
I was asked to talk about why VCs are interested today in business analytics.
My presentation followed the talks of the three IBM executives who stated that:
Worldwide customer demand for business analytics services is so strong that IBM is now trying to hire an additional 3000 consultants in Fred Balboni’s group. Just a few months ago that group was supposed to have 4000 consultants.
IBM customers are starting to ask for Hadoop-based data warehousing and analytics solutions in order to determine if they can them to a) manage and analyze large data sets more cost-effectively and b) to quickly perform exploratory data analyses on multi-dimensional data. However, the case studies provided by Rod Smith of IBM customers using Hadoop today were not dealing with big data (i.e., TB- or PB-scale data sets).
As part of its smarter planet initiative IBM continues to expand its collaboration with cities around the world where it applies business analytics and uses business processes from the corporate world in order to streamline and optimize the operations of cities.
Even after spending several billion on BI-related acquisitions, IBM intends to continue building out its business analytics technology stack through additional acquisitions and internal product development.
In my presentation I claimed that VCs are investing in business analytics for the following reasons:
The impact of cloud computing. Cloud computing is enabling the creation of new analytic applications, such as Infopia's ecommerce analytics application, and others about which I had written here, where analytics are embedded in the application from the beginning rather than as an afterthought (as it used to be the case in the past). It also enables new takes on existing analytic applications, such as the SaaS corporate performance management application developed by Host Analytics. Finally cloud computing promotes new usage patterns in BI solutions such as collaborative decision-making across the extended enterprise, i.e., across customers and partners, and better economics, i.e., better price/performance, over equivalent on-premise solutions. As an example of collaborative decision-making, I mentioned that cloud-based BI solution vendors can offer their clients corporate benchmarking services (think of it as Insight-as-a-Service) by analyzing uploaded and collected data from across their entire customer base, identifying important patterns, and translating them into the appropriate business actions.
The emergence of big data. Ever increasing volumes of structured and semi-structured data, including web data, such as that created by ad networks, and sensor data, such as that created by he Internet of Things, are starting to test the scalability of existing data warehousing solutions, make such solutions prohibitively expensive, and inhibit the timely creation of actionable analytics. I had written about the data volumes created by the web applications and how a new generation of analytic applications and data warehousing infrastructures need to be created in order to effectively deal with it here and here. During the meeting the director of the Beacon Institute spoke about the institute’s collaboration with IBM to monitor the Hudson River using Complex Event Processing and streaming databases. This solution generates real-time analytics from the voluminous data (most of the data being of little interest but 2% being of extreme interest) produced by thousands of sensors that Beacon has deployed on the Hudson. VCs have already begun to make investments in companies (e.,g., Cloudera, Aster Data, Karmasphere) that are developing solutions for these problems.
The need to analyze mobile and social data. The continuous growth of the mobile internet and commerce, as well as the accelerating growth of social internet and commerce create new sources of data and the need for new types of analytics.
One of the questions asked was whether IBM’s BI acquisitions are leading to an increased VC appetite for business analytics companies. Over the past couple of years business analytics, BI and data warehousing are experiencing a renaissance that is indeed leading to an increasing funding activity. However, this activity has nothing to do with IBM's acquisitions in this area. In fact, IBM's acquisitions only serve as additional validation of the importance of these areas. Forward-thinking VCs have been funding BI and DW companies for as long as BI existed, certainly before IBM started its BI acquisition spree. Over this period VCs have created many successful analytics, BI, and data warehousing (think of companies like Netezza, Datallegro, Omniture and more recently Greenplum). However, it is also true that certain subsectors are getting overfunded.
This was indeed a very interesting event and I look forward to reading the other blogs that will be written about it.
Earlier in the year I wrote
about areas of investment interest for 2010 and included Big Data aggregation,
management, and data mining/processing.A
couple of weeks ago I had attended Gigaom’s
Structure 2010, a conference devoted to big data. As is also noted by Derrick Harris here there is a growing market of vendors that are working around open source tools to support big data initiatives. I wanted to summarize my thoughts from the
conference but travel got in the way. When EMC’s acquisition of Greenplum was
announced big data came to fore again since Greenplum had created a good
implementation of MapReduce into its platform (even though I think EMC has been
thinking along more traditional data warehousing lines with this acquisition).
Only
10-20 companies have a good grasp of big data issues and are innovating in
this area.Though many companies in
the Fortune 1000 are starting to experiment with Hadoop, today only 10-20%
of enterprises need big data solutions.This number could grow as high as 40-50% in 5 years.
NoSQL databases
are emerging as the preferred systems for storing and managing big data
sets.The data in these sets is at
the terabyte or petabyte scale, it is semi-structured, highly distributed,
and much of it is of unknown value so it must be processed quickly to
identify the interesting parts to keep.NoSQL databases provide efficient and effective storage, management
and processing of such data sets at low cost.However, it is unlikely that these
databases will evolve into general purpose platforms, like relational databases
did.It will therefore be important
to match the big data problem being solved to the right database.Data analysis and business intelligence
are emerging as the best applications for taking full advantage of NoSQL
databases.Almost every company
that presented at the conference discussed such applications.
Too
many NoSQL database companies have already been created (Cloudera, 10gen,
MongoDB, VoltDB, CouchDB, etc).While
the user interest in such databases is increasing (many Fortune 1000
companies have started Hadoop evaluation projects), the market won’t be
able to sustain them.I expect to
see significant consolidation in the next 3-5 years.
Today
there is no “LAMP stack” equivalent for big data processing and analytics
and I think that there is an opportunity and need to create one.As Jim Kobielus at Forrester wrote in
his blog,
there is not going to be a
standalone market for Hadoop.The
pioneers of the big data movement, e.g., internet portals, social
networks, ad networks, etc., are creating ad hoc “stacks” using open
source tools for data management, data aggregation, in-memory storage.In most cases they extend/modify these
tools heavily.In addition to base
Hadoop, tools such as Hive, Cassandra, Scribe, memcached, MapReduce, Google
App Engine are part of such stacks and, consequently can be the components
of a pre-integrated, pre-certified Big Data Platform.Using these tools still requires a
programmer’s talents (PIG, and QL are efforts to provide a higher
abstraction programming layer to Hadoop and Hive respectively so that
database administrators can interact with the databases).The Big Data Platform must be usable by business
analysts, as well.Hadoop
client-side applications such as those being developed by Karmasphere and
Datameer provide steps in the right direction but they remain standalone.
For no
reason apparent to me, NoSQL database companies are trying to reinvent the
data warehousing and business intelligence infrastructures that have been
created over the years. Some of the “reinventions” may be absolutely
necessary and will lead to important innovations.However, these companies appear to be
also ignoring important aspects of the data management and data analysis
technology that has been developed over the years around data warehouses
built using relational technology.
Last Friday I attended SDForum’s The
Analytics Revolution Conference.The
presenters were startup public and private company executives and
investors.The presentations were mostly
around real-time analytics.For the past few years Gartner analysts have
been writing
and talking about the Real Time
Enterprise.Gartner sees the
analysis of the data that is generated by various enterprise applications such
as an ERP and residing behind the corporate firewall and an important
ingredient to achieving the Real Time Enterprise.However, Friday’s presentations brought to
focus that the impetus for real-time analytics is not the faster analysis of enterprise
data, but of the Big Data captured
on the Web (structured and unstructured social data, activity logs, data coming
from mobile applications, geolocation data, data mashups, the interactions
during the lifecycle of the captured data, etc).This data is orders of magnitude larger and
more complex than the data currently stored in the enterprise data
warehouses.I had written
that Internet SaaS applications will drive the SaaS innovation agenda.Friday’s presentations provided more examples
and additional justification for this conviction.
Companies like Facebook, eBay, LinkedIn and Zynga that
presented at the conference are interested in making analytics-driven decisions
in Internet speed and cheaper than it
is currently possible with the existing data warehousing and analytics
technologies. They want to use their data to continuously optimize their businesses, simulate
the effect of decisions on business processes, and forecast the impact of actions.Making decisions based on reports that summarize and find trends in data
that is a week or a month old simply won’t do.
To take advantage of emerging opportunity of real-time
analytics of big data, we are considering investments in the following three
areas:
Infrastructures
of next-generation data management systems, including open source systems
like Hadoop and derivatives, and systems for creating OLAP cubes in real
time.
Horizontal
and vertical analytic applications, examples of which are listed here.Additionally, these applications will
need to find a way to monetize on the data they capture, not only on the data
they process.See, for example, how
internet data exchanges like BlueKai
are able to monetize the cookie data they capture.
Analytic
services offered around the captured big data, e.g., the services offered
around big mobile data by companies like Ground Truth, and Flurry.
My underlying hypothesis (driving one of my investment
theses, and shared by several of the investors present at the conference) is
that cloud computing will make big data
analysis faster and cheaper.
I am convinced that over the next few years, investments in
each of these three areas will result in several strong exits.I don’t believe that the established on-premise
data warehousing, BI and analytics vendors will be able to create adequate
technological and business model innovations around real-time analytics.The results of a survey conducted by
InformationWeek and published
on March 27, 2010 make it clear that more enterprises are looking to cloud computing
to address the perceived shortcomings of on-premise applications, with SaaS BI
being one of the top areas of interest.Finally, as stated here,
incumbent on-premise software vendors will continue to face the innovator’s
dilemma that won’t allow them to move fast to offer the novel solutions for
real-time analytics that the market needs and is starting to demand.
Last Wednesday I presented at an event organized by IBM and FordhamUniversity’s business school, around the
launch of a new business analytics curriculum sponsored by IBM.Earlier in the fall, IBM announced the
establishment of a NY-based analytics center with 400 consultants.Through its collaboration with Fordham’s
business school, as well as through a similar collaboration with North Carolina State University’s
computer science department, IBM is hoping to gain access to qualified and
well-educated candidates to staff its analytics services organization that will
grow to 4000 consultants.
Through surveys that were recently completed by business and
IT leaders around the world, IBM has determined that the corporate interest in
business analytics and information-based (rather than gut-based) decisions is
increasing, while the number of qualified candidates who can help such
organizations remains small. Similar conclusions have been reached by researchers at Villanova University's School of Business in a study found here. The Fordham
program will educate individuals on how to blend business with quantitative
analysis skills.Today universities
graduate “pure quants” that find their way to many corporations from Wall
Street investment banks, to internet social networks like Facebook.Pure quants have obviously strong
quantitative analysis skills but relatively weak business skills.
I was asked to talk about the drivers for business analytics
and their impact on investments in new analytics companies.For several years Trident realized the
importance of analytics in business decisions and has invested in several
software companies that either develop applications and platforms to support
the creation of business intelligence and analytics or their business is mainly
driven by analytics.Today almost 20% of
our active portfolio companies belong to these two categories.Moreover, we continue to look for additional
investment opportunities in these areas here and abroad.
I see three drivers for today’s growing investor and
corporate interest around business analytics:
Big
data.Data is becoming
strategically important to enterprises of any size and type.It is also being generated in
unprecedented volumes. This is particularly the case with Internet
companies.Yahoo, Fox, AOL and some
of the larger ad networks routinely generate upwards of 100 TB of data per
year each.Facebook is generating
an order of magnitude more data than that.Once prepared for analysis, the size of each such data set can
triple in size.By comparison, in
the 90s, when I was running IBM’s BI Solutions organization and later when
I was the CEO of Customer Analytics, we were dealing with data warehouses
that contained, at most, a few terabytes of data and most frequently only
a several hundreds of gigabytes.To
analyze big data in an impactful and timely manner, we need new data
management paradigms (e.g., we are starting to see the broad use of column-based
databases, and the emerging use of Hadoop by a variety of organizations
across several industries) and new analysis paradigms that efficiently
combine solid analytic techniques with business/industry knowledge,
practices, etc.
The
quest for performance-driven decisions.Corporations are moving from report-driven decisions (where
business intelligence was used only to passively present historical data so
that a business executive can make decisions) to performance-driven
decisions where analytics are used to, oftentimes automatically make
decisions that impact corporate performance.For example, ad networks combine sophisticated
analytics with novel data management to determine within a few
milliseconds which ad to show an Internet user.Sophisticated optimization techniques
are used by ecommerce companies to determine the price of keywords used in
search advertising or the price they will pay for a new customer lead.
The
need for broadening the corporate use of analytics.In their quest to achieve analytics-driven
decision-making, corporations are accelerating the use SaaS BI and
cloud-based analytic solutions because of their lower total cost of
ownership, speedier implementation compared to their on-premise equivalent
solutions, and support of community-based and collaborative
problem-solving.
Realizing the impact of analytics-driven decision-making, corporations
are now thinking about analytics while designing new business processes and the
associated applications, rather than after the fact, as was happening to
date.By comparison, we used to build
data warehouses and BI applications by extracting and reporting on data from
systems that had been around and were never designed for analytics-driven
decision-making, e.g., ERP systems, or supply chain management systems.Today, most state of the art applications are
designed with an analytics component
instead of being fitted with one.
Over the next couple of years we will be able to assess
whether Fordham’s business analytics curriculum will have the desired impact
and provide IBM with a rich pipeline of analysts that combine the right mix of
the business and quantitative skills needed to satisfy the corporate demand for
analytics-driven decision-making.