Iphoneeinstein.com

Industry voice: Why leveraging metadata is vital to getting more out of big data

2014-11-05

Companies in virtually every industry are working feverishly to capitalise on big data, and harnessing its power across various lines of business has become the Holy Grail for information technologists. As big data repositories continue to scale both in number and size, so does the game-changing potential for organisations to effectively tap into and manage them.

Whether the ultimate goal is to enhance business strategies and processes, or to gain a competitive advantages in sales and marketing activities, big data can certainly be a catalyst for big results. But big data by itself does not represent actionable business intelligence. In order to yield tangible outcomes, big data must be easily searched, retrieved, analysed and consumed across the enterprise for a variety of applications – a process that can be highly streamlined and enhanced by using metadata.

Metadata, often referred to as "data about the data", can help companies better leverage, manage and ultimately harness the vast information resources that typically reside in multiple systems in order to reach organisational goals. Simply put, a metadata-driven approach to enterprise information management can help organisations achieve high returns on their initiatives where fast access to precise content that resides in large and diverse big data repositories is of paramount importance.

Why metadata matters

Metadata are the attributes, properties and tags that describe and classify information. They may be represented in the form of virtually any and all distinguishing characteristics associated with the information asset (type of information asset, author, date created, workflow state, and so forth). Once defined, metadata helps expose the value and purpose of the content, and it becomes an effective tool for organising and quickly locating information.

In looking at the value of metadata to get more out of big data, it helps to quickly review its evolution. When applications that leveraged metadata for classifying and organising information first emerged, metadata was mainly used to add keywords to the content.

This need is diminishing because indexing technologies and text analytics tools have evolved substantially during the last few years. While adding the keywords from (text) content to the document metadata is mostly just redundant work these days, adding descriptive metadata that does not directly exist within the content plays an important role in more effectively managing information. For example, while text analytics tools may determine that a proposal pertains to Customer X, they cannot identify whether or not the customer ultimately accepted the proposal.

This status attribute serves as critical business intelligence, helping sales reps pinpoint which proposals translated to successful sales results. When metadata is tied to search algorithms, users can generate highly precise results. This is particularly beneficial in big data scenarios, where standalone keyword-driven results may include an abundance of less relevant information. By leveraging metadata, users can quickly locate the right document, despite the vast amount of content residing within their repositories.

Netflix, one of the most successful services for entertainment enthusiasts, offers insight into the power of metadata. Netflix employs teams to curate a comprehensive set of metadata for each title in its database. With all this information, Netflix can identify programming preferences for its viewers based on viewership history.

Similar developments are occurring in enterprise systems: metadata is being applied to help search algorithms better understand users' past behaviour and their connection to the files in their organisation's repositories, applications and databases.

And because metadata exists for all structured data throughout an organisation, such as information that resides in CRM, ERP and other database systems, it can serve as the bridge that connects this structured data with the unstructured content (Microsoft Office documents, PDFs, media files) it relates to. Managing both structured data and unstructured content in one system allows users to gain better insights into data assets.

The big data imperative

A few years ago, Gartner predicted that enterprise data growth would reach 650% in five years, and that 80% of that data would be unstructured. IDC has since stated that the amount of information in the world doubles every 18 months. There is no denying that companies are producing more and more content faster than ever before, and this information often resides in disparate and disconnected business systems and repositories.

While the upsides to effectively harnessing big data are both well-touted and understood, organisations using traditional folder-driven systems for information management are more susceptible to its pitfalls. Employees waste significant amounts of time every day searching for the information they need, with Gartner estimating that it takes professionals an average of 18 minutes to locate each document. In addition, duplicate documents and versions creep across folders and systems, threatening data quality and business outcomes.

With IDC estimating that time wasted searching for corporate information costs an organisation more than US$19,000 (around £12,000, AU$22,000) per information worker per year, it is safe to assume that the larger the organisation and its data repositories, the greater the potential negative impact, based on preventable inefficiencies and lost productivity.

However, a metadata-driven approach can help organisations extract greater value from their big data, while tackling the inherent challenges of data scale…

Quick access to the right information

Dynamic business models and fierce competition require that companies strive to work increasingly faster and smarter. This requires users to be able to find the documents they're looking for instantly, with greater accuracy, and in a manner that is personalised and relevant to them. For example, using the expiry date attribute of agreements, users can quickly locate agreements that are set to expire in the next 90 days. Alternatively, metadata can help to identify content that does not yet exist in the repository. This approach can be used to control work orders so that the job is not started before employees' qualification documents have been delivered.

Eliminating information silos and connecting relevant content

In the era of big data, organisations need to bring order to their information chaos. Massive amounts of structured data and unstructured content often reside within multiple and disconnected platforms, applications, locations and devices. This makes it increasingly difficult for employees to find and access the information required to do their jobs.

Using metadata as a bridge to connect structured data with unstructured content, organisations can eliminate information silos across different business systems (ERP, CRM, etc.), departments and devices. Regardless of where the data resides, it can be accessed and synced across various systems and devices with no duplication of content. To this end, metadata breaks down the barriers between companies and their information, and structured data and unstructured content is then freed from the confines of applications, platforms and information silos.

For example, when a company leverages metadata to link its CRM system (structured data application) to its unstructured content repository, the sales force can access proposals, contracts and purchase orders directly from within the CRM system instead of having to navigate to the folder where these documents reside.

Using metadata, organisations can also create associations and relationships between various types of information across one or more repositories or related applications. These relationships establish relevancy and paint a 360-degree view of all the documents, processes and team members related to that information asset based on its metadata. This is extremely helpful when performing root cause analysis, as the operating procedures are automatically linked to the deviations, corrective actions and new learning requirements for the employees.

Best metadata-driven practices

As with any company-wide initiative designed with success in mind, a metadata-driven approach for managing big data requires a plan. Organisations should define processes and create templates to support employees in the development and management of information assets throughout their lifecycles. Metadata can serve as the foundation for maintaining information consistency and data quality, streamlining workflow capabilities, and protecting sensitive information, particularly when compliance is at stake.

Automate business processes

To unlock the full potential of metadata, organisations must look beyond simple search and navigation and identify how it can be leveraged for streamlining business processes and workflows.

Metadata-driven workflows can serve to maintain consistency and quality in documentation while ensuring that employees follow defined processes, and that these processes are executed in a more seamless and efficient manner. By leveraging metadata to execute content-centric workflows, companies can:

Assign tasks to employees, and track the status and state of all assignments

Be notified when materials have been edited or modified

Ensure that important documentation has been reviewed and approved by the appropriate individuals before its published

To get started, business departments should identify the most commonly-used information assets and create document templates for each asset, leveraging agreed upon metadata semantics. Examples may include proposals, contracts, invoices, product information or any business document that requires one or more individuals to review and/or approve it. These templates can automatically populate metadata attributes while ensuring consistency and accuracy in how content is described and processed, which is particularly valuable as big data gets even bigger.

Protect and secure confidential information

Content creation has become a highly collaborative process, adding a new layer of complexity to the management of roles and permissions. As such, the need to secure confidential information within big data repositories and systems has evolved past antiquated, folder-based security models. A metadata-driven approach allows organisations to assign and derive a document's final access control settings from its metadata. With the ability to dynamically establish access permissions to sensitive content, metadata can be used to help ensure that confidential data is protected from individuals who do not have access rights.

Many policies related to access permissions vary from business to business, while others are mandated by compliance requirements. Using a metadata-driven approach, information pertaining to the development and lifecycle of a document is captured at a granular level. As a result, permissions, audit trails and event logs are preserved, helping organisations verify compliance with defined access control policies.

How will you leverage big data?

From government agencies to Global 1000 companies, organisations that have already begun to unlock the value of big data have also inspired new ways of thinking about enterprise information management. We've reached a point today where the question is no longer, "When will you leverage big data?" but rather "How will you do it?"

And metadata is fast emerging as the foundation that can help harness the information that resides across business systems, providing a wide range of quantifiable benefits, regardless of an organisation's size or core objectives.

Mika Javanainen is Senior Director of Product Management at M-Files Corporation. Javanainen is in charge of managing and developing M-Files product portfolio, roadmaps, and pricing globally.