The physical world (from goods to equipment) is becoming
digitally connected through a multitude of sensors. Sensors can be found today in most industrial
equipment, from metal presses to airplane engines, shipping containers (RFID),
and automobiles (telematics devices).
Consumer mobile devices are essentially sensor platforms. These connected devices can automatically
provide status updates, performance updates, maintenance requirements, and
machine-to-machine (M2M) interaction updates.
They can also be described
in terms of their characteristics, their location, etc. Until recently these sensors have been
interconnected using proprietary protocols. More recently, however, sensors are starting
to be connected via IP, to form the Internet
of Things, and by 2020 50B devices will be connected in this way. The connected physical world is becoming a
source of immense amount of low-level,
structured and semi-structured data, e.g., big data.
Collecting and utilizing sensor data is not new. For example, GE
uses data from sensors to monitor the performance of industrial equipment,
locomotives, jet engines and health care equipment. United Airlines uses sensors to monitor the
performance of its planes on each flight. And government organizations, such as
the TSA,
collect data from the various scanners they use at airports. The key applications that have emerged
through these earlier efforts are remote service and predictive
maintenance.
While our ability to collect the data from these
interconnected devices is increasing, our ability to effectively, securely and
economically store, manage, clean and, in general, prepare the data for exploration, analysis, simulation, and
visualization is not
keeping pace. Today we seem to
be pre-occupied with the goal of trying to put all of data we collect into a
single database. Even in this task we
are not doing a particularly good job. The
existing database management systems are proving inadequate for this task. They may be able to process the time series
data collected by sensors, but they cannot
correlate it. The effectiveness of newer
database management systems (NoSQL), e.g., Hadoop, MongoDB, Cassandra, is also proving
inconsistent and depends largely on the type of application accessing the
database and operating on the collected data.
The new generation of applications that will exploit the
big data collected by sensors must take a ground
up approach to the problem they are trying to address, not unlike that taken
by Splunk. In Splunk’s case, the
application developers considered the ways the sensor data being collected from
data centers must be cleaned, the other data sets with which it must be
integrated/fused, the approach to interact with the resulting data sets,
etc. Splunk’s developers were able to
accomplish this and deliver a very effective application because they
understood the problem, the spectrum of data that must be used to address the
problem, and the role the low-level data is playing in this spectrum. They also appear to have understood the
importance of providing effective analyses of the low-level data as well of the
higher-level data sets that resulted when several different data sources are
fused.
The Internet of Things necessitates the creation of two
types of systems with data implications.
First, a new type of ERP system (the system of record) that will enable organizations
to manage their infrastructure (IT infrastructure, human infrastructure,
manufacturing infrastructure, field infrastructure, transportation
infrastructure, etc.) in the same way that the current generation of ERP
systems allow corporations to manage their critical business processes. Second, a new analytic system that
will enable organizations to organize, clean, fuse, explore and experiment, simulate
and mine the data that is being stored to create predictive patterns and
insights. Today our
ability to analyze the collected data is inadequate because:
The
sensor data we collect is too low-level; it needs to be integrated with data
from other sensors, as well as higher-level data, e.g., weather data, supply
chain logistics data, to create information-richer data sets. Data integration
is important because a) high-velocity sensor data must be brought together and
b) low-granularity sensor data needs to be integrated with other
higher-granularity data. Today integration
of sensor data is still done manually on a case-by-case basis. Standards-based ways to integrate such data,
e.g., RESTful APIs, other types of web services, have not yet adopted broadly
in the Internet of Things world and they need to. We need to start thinking of sensor data APIs
in the same way we have been thinking about APIs for higher-level data. And once we start defining these
standards-based APIs we also need to start thinking about API management.
We
don't yet know the range of complex analyses to perform on the collected sensor
data because we don't know yet what enterprise and government problems we can
solve through this data.
Even
for the analyses we perform, we often lack the ability to translate any
analysis results to specific actions.
Finally, along with these two types of systems we will need
to effectively manage the IP addresses of all devices that are being connected
in these sensor networks. IPV6 gives us
the ability to connect the billions of sensors using IP. We need better ways to manage these connected
devices. Most organizations today manage
them on spreadsheets.
The big data
generated by the Internet of Things is opening up great opportunities for a new
generation of operational and analytic applications. Creating these applications will require
taking a ground-up approach from the basic sensor technology and the data
sensors can generate to the ways sensors and managed and data is integrated, to
the actions that can be taken as a result of the analyzed data.