Say you’re a commodities trader, and you buy cereals for Kellogg’s. Or you’re Wal-Mart, and you make millions of retail transactions every minute. How do you make sense of all the information your business is collecting?
This is big data — assembling the large volumes of information that every business collects, using analytics to determine useful patterns within it and then making key decisions from that information. For instance, Kellogg’s may review cereal price data over time to cut down its supply-chain pricing. Or Wal-Mart may collect purchasing and shelving data to know when to stock which shelves with what particular merchandise in order to reduce waste and maximize sales.
“This is one of the most practical innovations in computing I’ve ever seen,” says Jamie Friedman, a senior analyst with Susquehanna Financial Group LLP of Bala Cynwyd, Pa.
Just a decade ago, big data was mostly the province of federal defense and intelligence agencies, but in the last few years it has become a growing segment of the technology business, because of innovations and the exponential proliferation of ways that users are collecting and creating data.
Northern Virginia’s technology corridor contains some of the nation’s biggest players in big data, Friedman says, including Falls Church-based Computer Sciences Corp. (CSC), Reston-based Leidos and Booz Allen Hamilton in Tysons Corner. Companies headquartered elsewhere such as Palo Alto, Calif.-based VMware and Massachusetts-based EMC Inc., also have major big data operations in Northern Virginia because of its proximity to federal agencies in the Washington, D.C., area.
Big opportunities
“The business applications for big data are enormous. Since we are in the shadows of the [federal] government, [which is] the largest producer of big data, the business opportunities in Virginia in big data are enormous, whether it’s in government or health care,” says Suresh V. Shenoy, executive vice president at Reston-based Information Management Consultants Inc. (IMC). “We are also fortunate to be the Internet hub of the rest of the world. We have more Internet traffic going through Ashburn, Va., than anywhere else in the world. We are the crossroads for all data traffic and … we’re just scratching the surface.”
In the last year CSC acquired two firms specializing in big data processing and analytics: 42Six Solutions LLC, a U.S. defense and intelligence contractor based in Columbia, Md., and Austin, Texas-based Infochimps. CSC won’t comment on the acquisitions or its big data work, but Friedman says the company, which brought on new CEO Mike Lawrie in 2012, was on target to do as much as $1.5 billion in big data-related business this year before its most recent acquisition. A publicly traded global tech company with 87,000 employees worldwide, CSC has annual revenues of nearly $14 billion.
With the advent of mobile computing, social media and a new generation of multimedia digital content, “a new generation of computing is capturing unstructured data such as audio, video, sensors, shapes,” Friedman says. “So this is a huge technological challenge and what CSC has the ability to do is to make sense of this information that their customers are capturing so that managers can use it to inform their decisions.”
CSC is also said to be working on the “holy grail” of big data: object-oriented computing. “Anyone can [process] rows and columns in a database, but object-oriented computing is a huge challenge,” Friedman says. For instance, he says, “a suitcase in an airport is completely benign until someone steps away from it, so how do you get the camera to [recognize] that?”
Focus on the four V’s
Big data focuses on four V’s: volume (or the amount of data collected), velocity (the speed of streaming data, which is being expanded by broadband delivery); variety (the types of data, which can be anything from spreadsheets and emails to multimedia content, social media posts or reports from sensors in cars and a panoply of other “smart” machines); and veracity (the accuracy and security of data), explains David Logsdon, senior director, federal civil public sector for the Tech-America technology trade association, which has worked on big data initiatives with the Obama administration.
Social media in particular represent “very much an unmined gold mine,” says Logsdon, just because of the sheer volume of information produced by it globally every day. New big-data applications are using social media information to help health-care providers and drug manufacturers track flu outbreaks, for instance, he says, and “the intelligence community … is realizing that they can look to social media to identify trends quicker than they have been able to do in the past, and they’re still trying to figure out how to do that.”
Virginia technology consulting and services firm Booz Allen Hamilton also is seeing booming business growth in big data among government and corporate clients. A publicly traded company, Booz Allen brought in $5.8 billion in revenues during the fiscal year ending last March and employs 26,000 workers worldwide.
“Our clients are using big data technologies and data science techniques … to do very advanced analytics to help them run their businesses and improve … to gain insight into how to do things differently or better,” says Peter Guerra, a principal in Booz Allen’s Strategic Innovation Group.
For example, he says, “we were asked by a large manufacturer to use big data and data science techniques to determine how they could increase the efficiency of yield of the product they were making, and our analysis allowed them to get a much better efficiency in their yield, which resulted in millions of dollars of savings throughout their process.
“When organizations embrace big data technologies and analytics, they really get to data-driven decisions. They can model what they want to do and better understand all the different factors to make decisions about how to move their businesses forward.”
Room for small operators
Even smaller technology contractors have entered the big data space. IMC is working on a federal contract to prevent fraud in the federal Lifeline program, which provides discounted phone service and prepaid cellphones to low-income people.
The company used big data techniques to uncover criminal fraud and waste in a Florida health-care system. “We found $8 [million] or $9 million worth of collusion and fraudulent activity,” Shenoy says. IMC also used analytics and big data to learn that a cabinet contractor hired by a hospital in the system was a twice-convicted pedophile. “He had changed his name a couple of times and moved to Florida and started his cabinet company. The CFO of the company was blown away. The liability for the hospital had something gone wrong would have been enormous.”
As adoption of big-data practices increases, there is a corresponding growing need for skilled workers. “They’re very much in demand,” says Logsdon, noting that a recent report by the Obama Administration’s federal big-data commission called on nation’s colleges and universities to offer more data sciences programs.
Last year, the University of Virginia founded its Big Data Institute, which is encouraging interdisciplinary approaches to big data across the university’s study fields. U.Va. also is offering a new master’s degree in data sciences. The first students in the program will graduate in 2015.
“It’s hard to imagine a field right now that isn’t experiencing a tremendous growth in the amount of data that’s being collected,” says the institute’s director, Don Brown, the university’s William Stansfield Calcott Professor of Engineering and Applied Science.
Leidos has donated funding to the institute to study the creation of cloud infrastructures that can handle large amounts of personal health information. Lockheed Martin also has a partnership with the university around big data, and Amazon is in talks with the institute.
Talking about big data, it’s hard to ignore that the biggest big-data story right now is the National Security Agency’s mass monitoring of phone and email records and online content to identify patterns of terrorist activity.
That’s why U.Va. also is focusing on the ethics of data science, Brown says, and discussions about that should be taking place not just in government but also in corporate boardrooms. “We need to discuss these issues. … We need to think about data ethics. These are critical and important questions and current events are making that quite clear.”