The predictions for the growth in data volumes are stark – the impact of a large increase in the number and diversity of devices being attached to networks through the internet of things (IoT) and internet of everything (IoE) will be massive. Alongside the need to open up data access to more entities along the value chain, the decentralization of data stores could lead to major issues in how an organization can deal with its governance, risk and compliance (GRC) needs.
Maybe it is time to move away from a data storage concept for GRC and more towards a network-centric one?
From islands to streams
The problem with a storage-centric approach is that the data is spread around in multiple data stores – databases, file servers and so on. These islands of data were bad enough when all the data was within datacenters owned and managed by the organization, but with cloud being so easy to source and use, many data stores are now outside of direct control, being hidden through shadow IT purchases.
However, the majority of data that is being used within the business will have to hit the organization’s network at some point – even mobile users connected via an operator’s wireless WAN will need to pass results or changes over the ‘controlled’ network at some stage. These streams of data should be easier to manage – with the right approach.
What, why, where?
Using network analytics, data streams can be inspected in real time. The type of data (machine-to-machine, transactional, voice, video, image, etc) can be identified. The points of origin and destination can be ascertained. Content can be interrogated. Therefore, an organization now has knowledge of what the data is, where it is being used and can interpolate why something is being done – before the data is even written to a persistent store.
It also helps to get around the issues of islands of data – the streams can be aggregated more easily: the core network will, by necessity, have the majority of the data traversing it. Even where the final persistent stores may be diverse, such an approach allows a more holistic view across the organization’s data.
The importance of this cannot be underestimated. All those usages of external, shadow-IT driven data stores can be avoided: traffic that is being directed to consumer-grade cloud storage can be re-directed to enterprise grade stores. Information that an employee is trying to send – intentionally or accidentally – to the wrong person can be stopped before it becomes an issue. Data that is transitory or ephemeral can be dealt with using network filtering, and need never be stored at all.
Data can be prioritized, tagged and managed according to GRC needs. Metadata can be applied to the streams as they are interrogated. Reporting and overall business analytics can be carried out through master data management (MDM), where the metadata created by the network analytics becomes the primary source for tracing and tracking all actions that have occurred around a specific item. Dashboards can provide insights as to what is happening and what the status of an organization’s compliance to its own GRC strategy is.
At the speed of light?
The biggest problem here is dealing with the data at line speed. Within most organizations, the volumes of data will remain within the capabilities of a well-architected software-defined approach. Rules as to what happens to certain classes and types of data can be written in to software defined networking (SDN) and network functions virtualization (NFV) constructs. The problem comes for the large service providers – dealing with thousands of customers’ data streams may stress the capabilities of a software defined data approach too far.
However, hardware-assisted software approaches may well be the answer: taking certain functions that should be run as close to the network as possible and placing those into flexible field programmable gate arrays (FPGAs) can provide the speed that a service provider needs, while still enabling certain functions to be abstracted into software.
For a service provider, the capability to add network-based data management to their portfolio could well be a solid value-add function that customers would be willing to pay for. Lower persistently stored data volumes means lower storage costs and faster business analytics; better targeting of where data should be stored enables more intelligent placement for more effective analytics. Holistic knowledge of what is happening with data streams means that adherence to corporate, local and global governance standards would be far easier.
There are a couple of vendors in the market which are well on the path to providing such capabilities: CommVault, with its Simpana product and Druva with inSync both are dealing with data streams and taking actions on them before any data is stored. Others, such as Splunk, provide the means of dealing with upstream data from the myriad of devices that will be coming through under the IoT/IoE.
Data is the lifeblood of an organization. Dealing with it in a manner that is both effective and efficient is becoming increasingly difficult. A change in view from data islands to data streams is long overdue.