KEY POINTS
Seventy-one percent of chief marketing officers around the globe say their organization is unprepared to deal with the explosion of big data over the next few years, according to an IBM survey. They cited it as their top challenge, ahead of device fragmentation and shifting demographics.
The data tidal wave shows no signs of abating. By 2015, research firm IDC predicts there will be more than 5,300 exabytes of unstructured digital consumer data stored in databases, and we expect a large share of that to be generated by social networks. For context, one exabyte equals 1 million terabytes, and Facebook's databases ingest approximately 500 terabytes of data each day. Facebook ingests approximately 500 times more data each day than the New York Stock Exchange (NYSE). Twitter is storing at least 12 times more data each day than the NYSE.
"Unstructured" big data means data that is spontaneously generated and not easily captured and classified ("Structured" data is more akin to data entered into a form, like a user name might be, or generated as part of a pre-classified series, like the time stamp on a tweet.)
Machine learning or artificial intelligence (AI) — the study of how computer systems can be programmed to exhibit problem-solving and decision-making capabilities that emulate human intelligence — are helping marketers and advertisers glean insights from this vast ocean of unstructured consumer data collected by the world's largest social networks.
Advances in "deep learning," cutting-edge AI research that attempts to program machines to perform high-level thought and abstractions, are allowing marketers to extract information from the billions of photos, videos, and messages uploaded and shared on social networks each day. Image recognition technology is now advanced enough to identify brand logos in photos.
Audience targeting and personalized predictive marketing using social data are expected to be some of the business areas that benefit the most from mining big data — 61% of data professionals say big data will overhaul marketing for the better, according to Booz & Company.
INTRODUCTION
In the first installment of our three-part series on social big data, we looked at the different types of user data collected by each of the major social networks. We explained why marketers should consider these differences as they decide which social platforms are a better fit for their strategies.
But the reality of social media big data is that only a tiny fraction of its potential is currently being realized. In this follow-up report, we'll dive into innovations in interrelated fields like artificial intelligence and image recognition, which are quickly changing the way social big data is mined for insights and used in emerging marketing applications. These include audience clustering and predictive targeting.
Consumer Internet companies are in a race to build out their AI talent and acquire the most advanced machine-learning systems. Here are some of the major acquisitions and hires from the AI field that occurred in recent months:
Facebook launched a new research lab dedicated entirely to advancing the field of AI. Deep learning expert Yann LeCun is directing the efforts of the lab.
Google acquired DeepMind, a company that built learning algorithms for e-commerce, simulations, and games, for $400 million. DeepMind's 50 employees were considered to be among the most talented experts in the field of AI. Google also hired deep learning pioneer Geoff Hinton to improve products such as Android voice search.
LinkedIn acquired Bright, a company that focused on data- and algorithm-driven job matches, for $120 million — its largest acquisition to date.
Pinterest acquired VisualGraph, a company that specialized in image recognition and visual search. VisualGraph CEO Kevin Jang helped build Google's first machine vision application to improve image search.
Much of the value that social networks offer marketers and advertisers is still untapped, locked in what is known as unstructured data, the billions of user-generated written posts, pictures, and videos that circulate on social media.
The richness and nuances of human communication — whether audio, visual, or text-based — make it difficult to represent this type of information as data.
"Social data has a lot of challenging characteristics," explained Tim Barker, who is the chief product officer at DataSift, a company that uses machine-learning methods to work with social media data. "It's fast and high-volume, but behind every tweet, post, and comment is a customer."
New innovations are helping researchers make sense of all this unstructured data. For example, with the right technology, images can be analyzed to reveal an abundance of information about the person who uploaded it, as well as the context and objects shown in the photograph. Keeping up with the massive volume of messages, photos, and videos that consumers upload and share on social networks every day is a task that only automated intelligent machines can take on. The task would simply be too great for human-directed, manual systems.
That's why artificial intelligence, and in particular, a subset of AI known as "deep learning," is key to social media's future as an industry and as a force in society. In this report, we provide a brief explainer of what artificial intelligence is and how the various subsets of the field — including machine-learning — are being applied to unlock all the value contained in social media data. We spoke with artificial intelligence experts to find out how advanced their systems are, and how different social networks are leveraging their powerful technologies. Finally, we explain in concrete terms how artificial intelligence will revolutionize social media marketing and advertising, and how it will improve predictive ad targeting, content personalization, social listening, and other specific applications.
Click here to download the charts and data in Excel »
Click here to download the PDF version of this report »
Artificial Intelligence
The science of artificial intelligence or AI is based on the theory that human reasoning can be defined and programmed for a machine to mimic.
In the 1950s, psychologist Frank Rosenblatt attempted to develop a kind of algorithmic brain called the Perceptron, which could classify basic inputs. Learning algorithms like the Perceptron, programmed to detect patterns in information, formed the beginnings of machine-learning research. Gradually, this line of research developed more complex algorithms.
But according to a faction of scientists, too much focus was placed on developing shortcuts that led to relatively crude brain-like behavior, rather than trying to emulate the way the brain and its complex neural systems actually work.
This community of researchers preferred the term 'deep learning' for machine-learning systems that actually simulate human thought, which they believe hews closer to the original goal of artificial intelligence research.
A nice thumbnail history of deep learning and its uphill battle to gain recognition within the field of AI is recounted as part of a recent Wired magazine profile of deep-learning pioneer and current Google employee, Geoff Hinton.
Like the brain, deep-learning systems process information incrementally — beginning with low-level categories, such as letters, before deciphering higher-level categories — words. They can use deductive reasoning to collect, classify, and react to new information, and tease out meaning and arrive at conclusions, without needing humans to get involved and provide labels and category names.
The most advanced systems can also use deductive reasoning to predict outcomes based on large data sets.
This is the kind of technology that's needed to make sense of unstructured data, to parse meaning from contextual cues in text, to classify and recognize objects in photographs or video, and to identify voices and make sense of speech.
Many industries — from education to medicine — are developing deep learning-style AI systems, but lately the consumer Internet industry's biggest names, like Facebook and Google, have woken up to the power of this technology. There are several use cases that are particularly compelling to social media marketers and advertisers, which we'll detail later.
Unlocking Image And Video Data
Social networking experiences are becoming increasingly centered around photos and videos, but it is extremely difficult to extract information from visual content. Because of this, image and video recognition are two of the more exciting disciplines being worked on in the field of AI and deep learning.
Facebook users upload 350 million photos each day.
Snapchat users share 400 million "snaps" (Snapchat's term for photos and videos shared over the network) each day.
Instagram users upload 55 million photos each day.
This is just a small sample of the amount of visual content uploaded to social networks on a daily basis.
"I think that trend is specifically why Facebook hired Yann LeCun," explained Jim Hendler, who is an AI researcher at Rensselaer Polytechnic Institute.
LeCun is one of the foremost experts on deep learning and how it can be applied to image recognition. His model for helping computer systems identify symbols now powers much of the technology that allows bank ATMs to process checks. Facebook hired LeCun to head its new AI lab, which Mark Zuckerberg is tasking with modeling what Facebook users are interested in, so that they can predict what consumers will do in the future.
Facebook isn't the only Internet company trying to decipher images. In an interview with Wired, LeCun said that Web giants such as Google and Baidu (a China-based search engine) are using a deep learning-derived technique known as "back-propagation," in order to classify images in user photo collections.
Back-propagation is a method for training computer systems to match images with labels or tags that were pre-defined by users. For example, if enough people upload photos tagged "cat," then a system has a large enough sample size that it can reference to identify new photos of cats, and tag them appropriately.
This is one of the reasons why services such as Facebook and Instagram encourage users to tag objects and people in photos.
However, there is very little user-generated data identifying the contents of online video, which so far means that back-propagation is a poor method for video recognition.
As LeCun explains in the Wired article, object recognition within video is a far more complicated proposition. Researchers have to help their AI systems make sense of moving images. It's easy for a human to see and understand what's happening in a video. For example, a human will easily distinguish that a certain object in a video is in the foreground, and another is in the background. But that's not obvious to a computer.
Researchers have to train the machine-driven systems to isolate an object by detecting that it's not moving, model its dimensions, and make sense of that shape.
In essence, this requires researchers to invent new video-parsing algorithms that learn thanks to trial and error, LeCun tells Wired.
How Visual Data Helps Marketers
By deciphering image and video-based data, marketers will become more effective and comprehensive in their "social listening" efforts. Large companies spend a great deal of money monitoring people's attitudes toward a specific brand or product, and despite all the photo- and video-sharing happening on social media, these mediums were formerly mostly invisible to their analytics tools.
"Applying machine learning to text is hard enough, but applying it to images takes it to the next level," says Tim Barker of DataSift. "Image recognition technology has made huge strides these past few years. We've done a pretty thorough review of the market, and image recognition technology is now advanced enough to allow us to reliably identify brand logos in photos. Precision and recall is a very challenging problem in machine learning, and as data scientists, we don't want to be sending brands false positives as a result of faulty image recognition technology."
Facial recognition isn't quite there yet as a technology, adds Barker. It's far more in the public eye since people are worried about the privacy implications, but the reliable identification of faces is still some ways off.
Jim Hendler agrees with Barker's assessment of where image recognition is today: "Image recognition technology is making some major strides. There are some new techniques out there in practice that can find and identify things in photos. They are not trying to figure out, okay this is a photo of Bill sitting down and eating, but what they are trying to decipher is if this photo seems to be about Bill in a social situation or is this a business's logo."
Soon, marketers will be able to measure brand sentiment by analyzing images that contain their logo or merchandise. Brands will also be able to track how images associated with their brand are being shared across social networks — how often and by what sorts of audiences.
Marketers and advertisers will also gain a better understanding of what characteristics of a photo or video influence its virality. For example, images with a high level of texture (such as fabric) generate 79% more likes on Instagram than smooth objects (such as ice cream), according to Curalate, which used machine-learning techniques to analyze more than 8 million photos shared on Instagram.
Conversely, Curalate found that on Pinterest, images of smooth objects generate 17 times more repins on average than photos of highly textured objects.
We know brands need to personalize content for each specific digital channel. Machine learning helps us understand what the fundamental drivers of engagement are on each social network.
Gauging Brand Sentiment
Another challenge with social data is that it's very fragmentary and cryptic. Social media posts that tend to generate the most interactions on Facebook, Google+, and Twitter are around 10 or 20 characters in length (that's only a few words, but that's because they're usually accompanied by some form of media, such as images or videos), according to social media analytics firm Quintly. (See chart below.)
But the tendency toward very short posts, in word-length terms, poses a problem for machine-learning systems. The more characters and words there are for a machine-learning system to ingest, the easier it is for the system to identify clusters and correlations in how sentences are strung together to form meaning.
"The problem with machine learning on social media sites, particularly Twitter and Facebook [is that] the messages tend to be very short, and it's difficult to extract information such as user interests and preferences from these posts," explains Jim Hendler.
"So a lot of those platforms are relying on the network effect," Hendler adds. "What I mean by that is Twitter is saying, okay, John is following Sally on Twitter, so we can draw some correlations between their interests and then build a custom network from those connections."
Text mining is another field that's rapidly evolving. A team of Belgian computer science researchers developed what they call an opinion mining algorithm that can identify positive, negative, and neutral sentiment in Web content with 83% accuracy for English text. Accuracy fluctuates depending on the language of the text because of the variety of linguistic expressions. The more complex a language is, the more training a machine-learning system requires.
However, although machine-learning systems are being tasked with analyzing users on an individual basis, the value to marketers is grouping like-minded consumers together so that they can target people at scale.
Clustering Like-Minded Consumers
"So, we begin to look at information on two different levels," explains Hendler. "The first level is understanding who are these people as individuals — are they real, are they spam bots, and what do they talk about?"
Filtering out spam and spambots is another problem that machine-learning systems are working on, because incoherent messages confuse systems when they are trying to categorize content and infer a type of sentiment. Fortunately for social networks, email providers have been working on filtering out spam for years, but it's still a problem for some services such as Twitter, where spam is simply a part of the conversation on the network.
AI experts tell us that Twitter has gotten "pretty good" at detecting spam.
Once a system has filtered out spambots, it can begin building a network of information about human users based on certain parameters such as interests.
"The second level is grouping similar individuals together so that you have a network of say, 20,000 people, and then a machine-learning system will have enough raw data to accurately draw some conclusions about those people. At that point, we begin to move away from studying graph analytics and into a new study of understanding how large networks of people interact with each other," said Jim Hendler.
This part of Hendler's explanation has major implications for advertisers who want to target large groups of consumers by interests.
If AI-driven techniques are effective enough, advertisers won't have to rely on Facebook users identifying "skiing," or "travel," as one of their interests in their profile descriptions. Instead, they'll be able to dive into social media conversations and actually be able to target large groups of people, at scale, who are actually actively interested in a given pastime or product category.
Predicting Consumer Actions
Despite all the evident growth in social media advertising, some advertisers and agencies still lack confidence in its performance. One-third of advertisers and one-quarter of agencies believe social media advertising is a "promising new tactic, but its effectiveness is unknown," according to a recent Nielsen survey.
One obvious reason for marketers' lack of confidence in social media advertising is that they realize that many consumers have already made up their mind about a purchase decision by the time they first interact with a brand online.
"Sixty percent of a consumer's journey in deciding whether they want to purchase a product is already decided by the time he or she interacts with a brand," estimates Tim Barker of DataSift.
Currently, much social media advertising is based on past user actions. Facebook serves users ads based on their previous actions and data shared on the network and elsewhere on the Web. They're certainly personalized ads, as Facebook argues when it pitches its ad products, but they're based on where someone has been rather than on where they're going.
But what if social media can get good enough at predicting what products or services customers might become interested in purchasing in the future? Then the ads shown to users will be personalized and predictive, rather than trying to grab a consumer's attention once a purchase process has already begun, and the consumer's mind might already be made up.
If Facebook can accurately predict what a user will become interested in some time in the future, then that will add significantly more intrinsic value to its ad products. As we mentioned earlier, this is one of Mark Zuckerberg's core objectives for Facebook's new AI lab. Google, Pinterest, and other Web giants are also leveraging AI techniques to improve the performance of their ads.
Imagine an image recognition system. Working together with deep-learning systems, these might parse your Facebook feed and realize that you've just graduated college, and have become interested in sailing and Polynesian history. You might be looking at Facebook one day and see an ad for a sailing adventure tour through French Polynesia. Users might feel a knee-jerk suspicion at the idea of self-correcting algorithms mining their photos and thoughts for insights, but surely they'll be happier when they see ads that they're actually interested in, rather than spammy banner ads for weight-loss programs and random online learning courses.
If they fulfill their promise, developments in artificial intelligence and big data mining will transform social media advertising from a nascent industry into a powerful channel for targeting consumers and converting Internet users into customers.
THE BOTTOM LINE
Seventy-one percent of chief marketing officers around the globe believe their organization is unprepared to deal with the explosion of big data over the next few years, according to an IBM survey.
By 2015, research firm IDC predicts there will be more than 5,300 exabytes of unstructured digital consumer data stored in databases, and we expect a large share of that to be generated by social networks. For context, one exabyte equals 1 million terabytes, and Facebook's databases ingest approximately 500 terabytes of data each day.
But 90% of big data is "unstructured," meaning it's spontaneously generated and not easily captured and classified.
Artificial intelligence (AI) and deep learning — the study of how computer systems can be programmed to exhibit problem-solving and decision-making capabilities that emulate human intelligence — are helping marketers and advertisers glean insights from this vast ocean of unstructured consumer data.
Targeted and personalized marketing using social data are expected to be the business areas that benefit the most from mining big data — 61% of data professionals say big data will overhaul the practice for the better, according to Booz & Company.
Join the conversation about this story »