A little over two years ago I wrote a series of blogs introducing
Insight-as-a-Service. My idea on how companies can provide insight as a
service started by observing my SaaS portfolio companies. In addition
to each customer's operational data used by their SaaS applications,
like all SaaS companies, these companies collect and store application
usage data. As a result, they have the capacity to benchmark the
performance of their customers and help them improve their corporate and
application performance. I had then determined that insight delivered
as a service can be applied not only for benchmarking but to other
analytic- and data-driven systems. Over the intervening time I came
across several companies that started developing products and services
that were building upon the idea of insight generation and providing
insight as a service. However, the more I thought about
insight-as-a-service, the more I came to understand that we didn't
really have a good enough understanding of what constitutes insight. In
today's environment where corporate marketing overhypes everything
associated with big data
and analytics, the word "insight" is being used very loosely, most of
the times in order to indicate any type of data analysis or prediction.
For this reason, I felt it was important to attempt defining the
concept of insight. Once we define it we can then determine if we can
deliver it as a service. During the past several months I have been
interacting with colleagues such as Nikos Anerousis of IBM, Bill Mark of
SRI, Ashok Srivastava of Verizon and Ben Lorica of O'Reilly in an
effort to try to define "insight."
An insight is the identification of cause and effect relations among elements of a data set that leads to the formation of an action plan which results
in an improvement as measured by a set of KPIs. Insights are
discovered by reasoning over the output of analytic models and
techniques. This output can take the form of predictions,
correlations, benchmarks, outlier identifications and optimizations.
The evaluation of a set of established relations to identify an insight, and the creation
of an action plan associated with a particular insight needs to be done
within a particular context and necessitates the use of domain
Most analytic model outputs do not provide insights. There are two
reasons for this. First, the models don't suggest a meaning for each of
their findings. Second, they don't put each finding in an actionable
context (even if the meaning were known). Finding a pattern doesn't
imply that you automatically find meaning and that you understand it.
It just implies that you are finding a correlation among a data set.
Moreover, finding causality alone is not necessary and sufficient for
generating an insight. One needs to be able to derive an action plan
that can successfully and effectively, i.e., with impact, be applied in a
particular context. This requirement implies that even knowing the
meaning of the finding doesn't tell me how to generalize it and use it
for something in the context I am trying to impact. That step requires
knowledge of my environment (business, social, education, etc.),
my strengths and weaknesses, other forces that may enhance or diminish
my efforts, etc.
An insight must be:
Stable. This means that an insight must not vary
depending on the relation-identification algorithm/model being used.
For example, if I use two different samples from the same data set to
create a predictive model employing the same model-creation method, then
the resulting models have to provide the identical result under the
same new data input.
Reproducible. This means regardless of how many
times a feed a particular data set through an insight-generation system,
the same insight will be produced.
Robust. This means that a certain amount of noise
in the input data will not diminish the quality of the insight. This is
particularly important requirement in big data environments.
Insight-generation systems must be able to organize noisy data and
focus on the data that makes "sense," based on a particular context.
Enduring. This means that the insight is valid for an amount of time that is related to the underlying data's "half life."
Because of the above requirements, insight-generation necessitates
the deeper analysis, including the causal analysis, of the underlying
relation-identification models, rather than just the testing of each
model's accuracy, as it is typically done in predictive analytics tasks.
Such causal analysis implies that when trying to generate insights it
is preferable to utilize machine learning techniques that describe
patterns declaratively, e.g., decision trees, rather than black box
approaches, e.g., neural nets and genetic algorithms. As a result of
this requirement, one may need to sacrifice prediction accuracy and
speed for expressiveness. Therefore, one needs to identify the domains
where insight-generation may be more important than predictive accuracy.
Moreover, because the models themselves need to anallyzed, simpler
models may be prefered to more complex ones.
Insight-generation is not a single shot process. Once an insight is
generated and the associated action plan is created, it is important to
apply the plan in the particular context and measure its impact. The
collected data must then be compared to the set of established KPIs in
order to determine whether the particular insight/action-plan pair led
to an improvement. Depending on this analysis, the system must then
decide whether to attempt improving the action plan, create a completely
new plan (assuming that alternatives can be found), or try to create a
brand new insight. This means that from a set of initial input data the
insight-generation system must seek to derive all possible predictions,
based on the set of available models.