2013-11-05

by Jelani Harper

There are several governance concerns for incorporating Big Data into business, strategic, and operational decisions. Many of these issues relate to the characteristics of Big Data (high volumes, rapid ingestion, and available source diversity) which can enhance the decision-making process. Contemporary emphases on real-time analysis and predictive analytics attest to this fact.

Other concerns pertain to relatively new technologies (Hadoop, NoSQL) that have been developed to specifically account for Big Data’s incorporation into both the business and the enterprise.

But does all (or any) of that actually exacerbate, heighten, or change the nature of what has come to be known as Big Data Governance?

Bob Seiner, president of KIK Consulting & Educational Services and publisher of The Data Administration Newsletter, doesn’t think so.

“I think it’s a manufactured term,” Seiner said about Big Data Governance. “Is the governance of Big Data any different from the governance of little data? Is it any different from the governance of customer data versus product data versus vendor data? I just see there’s a need for governance.”

Effective governance of Big Data, however, may require a shifting of the roles – but not the rules – of governance personnel, and frequently involves the incorporation of different management technologies. However, the fundamentals of governance – metadata, stewardship, governance councils, etc. – remain largely the same.

The Rules

The pivotal point of Big Data Governance hinges upon the purpose of a particular organization’s Big Data initiative. Initiatives can be focused on a variety of factors from analytics to operations. Regardless of what area of focus an organization uses its Big Data for, it must implement formal accountability for recognizing (and approving) data sources, who has access to the data, and how the data is used.

In this respect, the speed and the amount of Big Data matters significantly less than do the principles of governance – the formal rules that govern where data comes from and how they’re implemented. Governance is used to orchestrate a chain of command regarding use, lineage, and the formal processes to ultimately increase data quality by designating accountability.

The facilitation of and adherence to such rules is one of the primary points of commonality in Data Governance regardless of data type. After the exploration and content analysis of Big Data, these rules must be established before that data can be incorporated into the enterprise. Seiner reflected on this universal requisite for governance programs.

“If we’re using Big Data for analysis we need to make sure that the people who are analyzing the data know where it came from, how it’s defined, how it can be used, and what the rules are that are associated with the data. But somebody who’s using Master Data or Business Intelligence data or Metadata needs the same thing for the data they use. It’s just a different type of data.”

From this perspective, it is crucial to note that the mainstays of Big Data Governance – its volume and velocity – may require additional management, but not governance. The governance rules are fixed regardless of data type, but management techniques, which oftentimes hinge on technological applications, will certainly need to vary to account for these characteristics of Big Data.

The Roles

Another reason for the perceived interest in the term Big Data Governance pertains to the variety of data types. The vast majority of sentimental data on social media networks, for instance, is either semi-structured or unstructured and doesn’t neatly fit into conventional relational databases. Although there is a definite distinction between Big Data and unstructured data, many sources of Big Data are unstructured. More importantly, there is a concrete relationship between data’s structure, Data Governance, and the names of the roles of the individuals involved in governance.

Regardless of what those roles are called and how they are tailored to a particular organization and its Big Data governance needs, there are generally:

Data Definers: Individuals, who authorize content, procure it from its source and ensure that it meets the predefined standards mandated in governance rules.

Data Approvers: Those who evaluate data on a cross enterprise-basis to reduce the risk of a silo culture and ensure continuity between data standards across the various domains.

Data Council Members: Employees from a variety of domains across the enterprise who formally set the rules by which data is to be governed and address any issues, questions or concerns of the Data Approvers.

Executive Data Users: Individuals within business units that appoint representatives to the Data Council and have the final say regarding policy.

The Question of Metadata

The key Metadata issues of Big Data Governance probably pertains more to Data Management than to governance. In order to derive meaning and insight from data (especially unstructured data), organizations must analyze data types and structures, which is typically performed by Data Scientists. Doing so requires the usage of a number of different technologies that pertain to examining structure and analyzing data content.

Several technologies can accommodate environments for sandboxes for Data Scientists, in much the same way that many technologies are developed for text analysis and to parse data for meaning. These findings substantially affect the definitions of Metadata, especially as they relate to congruence with existing Metadata terms. Seiner commented on the importance of Metadata to the governance of Big Data.

“If you’re doing Data Governance, you’re going to be collecting additional Metadata about the people associated with the data. So, Metadata itself becomes a by-product of Data Governance. But on the other side of the coin the Metadata itself has to be governed. Somebody has to determine what we’re going to collect, where we’re going to put it, and how we’re going to make it available. You’ve got to govern the Metadata just like you’ve got to govern the data.”

In addition to definitions, the specific categories of Metadata are largely determined by data structure. The sort of Metadata that’s preserved depends on the most pertinent factors for which a Big Data source is used, as well as how that information is derived from data structure. Metadata for videos regarding a company’s products, for instance, may include categories such as product type, favorability, length, source, and others.

Proprietorial Data

Governance of Big Data can also involve issues of propriety. This concern is particularly important when attempting to incorporate sentiment data from internet sources that are significantly less proprietorial than data obtained from an organization’s CRM, for instance. The terms of condition and usage of data from such sources (Facebook, Twitter, etc.) are highly specific and may also vary in accordance to state legislation. These concerns should be researched in advance and incorporated into governance rules, as well as frequently monitored due to the degree of fluctuation of internet-based data, which may require additional stewardship roles.

Once such concerns are incorporated into governance rules and responsibilities, however, their governance is similar to that of most other data. Seiner discussed this perspective.

“I think any data you talk about for any organization is proprietary to them. I’ve worked with a lot of different companies and all of their data is proprietary to them. What’s the difference if it’s high volume data or otherwise? They still need to protect it, to secure it, to define it, and do all the same things they would need to do with all the data they have in their organization.”

Regulatory Issues and Privacy

Regulatory issues of Big Data not only pertain to specific sources of data and regulatory agencies such as state and federal authorities, but also to notions of privacy. However, this particular point of governance for Big Data is more related to Data Management than to Big Data Governance. To account for issues of privacy linked to Big Data, organizations would do well to invest in analytic software that can monitor certain key words that can help protect their interests. Text analytics software (some of which can vary by technology, such as that specifically designed for email) is effective for this purpose.

Big Data Governance vs. Management

Most Big Data Governance concerns can actually be addressed by the proper management of Big Data and the technologies to do so. By forming effective governance rules (some of which must account for Big Data sources) and personnel roles – as well as researching and incorporating information regarding privacy, regulations, and non-proprietary sources into them – Big Data can be governed as efficiently as any other data source. Metadata concerns for the governance of Big Data largely pertain to data structure, which may take additional personnel and technologies to decipher and which determines the form of metadata.

But ultimately, the process by which Big Data is governed seems suspiciously the same as that for any other data. Seiner addressed this possibility.

“For Big Data, there are specific technologies that are used to manage, store and make available that volume of data. There’s a Big Data industry. Is there an unstructured data industry? Sure, there are a lot of different tools that are set up to manage unstructured data. But a lot of the disciplines between them are the same.”

Show more