2014-10-22

Marketable machine learning components capability for “a new data science economy”: Predictive analytics components on the Azure Marketplace (“APIs”) consisting of predictive models that can plug into Azure Machine Learning as a web service

Introduced this summer and available now in preview, Microsoft Azure Machine Learning helps customers and partners rapidly design, test, automate and manage predictive analytics solutions in the cloud. For example, search engines, online product recommendations, credit card fraud prevention systems, GPS traffic directions and mobile phone personal assistants all use the power of machine learning to provide people with valuable insight.

On October 15th, Microsoft introduced new machine learning capabilities in the Azure Marketplace enabling customers and partners to access machine learning capabilities as Web services. These include a recommendation engine for adding product recommendations to a website, an anomaly detection service for predictive maintenance or fraud detection and a set of R packages, a popular programming language used by data scientists. These new capabilities will be available as finished examples for anyone to try.

Oct 17, 2014:
Joseph Sirosh keynote: “A New Data Science Economy” — Strata + Hadoop 2014

Software and the rise of cloud services have given rise to revolutionary new economies – creating new markets for everything from self-published books, music and videos to mobile apps. Only a few years ago, it would have been hard to imagine developers authoring a million apps for smartphones. But that’s history. Cloud-centric economies are permanently changing the way people author and create in the knowledge economy – whether it be authors or developers – and soon, even data scientists. Joseph Sirosh will share his conviction that the next big software economy will be the Data Science Economy – one where data scientists build predictive models and intelligent services that can be published and monetized as easily as apps for the mobile phone.

About Joseph Sirosh:

I am a Corporate Vice President at Microsoft, and head of the Information Management and Machine Learning group. Our talented team of scientists and engineers are developing Cloud ML services and tools to transform data at scale into intelligence. We are taking the wealth of ML capabilities in Microsoft Research and Product Groups and making it available commercially on Azure. Our first-class ML algorithms, services and tooling will help developers build amazing next-generation ML apps in the cloud and help ML become pervasive across a wide range of future scenarios. Prior to Microsoft I worked at Amazon as VP for Global Inventory Platform and CTO of the core retail business and I was VP of R&D at Fair Isaac Corporation before that. I am very passionate about ML and its applications and have been active in the field since 1990.

Real-time analytics for Azure HDInsight service: Enabling the real-time predictive analytics in HDInsight with support for Apache Storm clusters

Azure HDInsight cloud analytics service combines the best of Hadoop open source technology with the elasticity and manageability enterprises require. On October 15th, Microsoft announced Azure HDInsight will support Apache Storm clusters in public preview. Storm is an open source project in the Hadoop ecosystem which gives users access to an event-processing analytics platform that can reliably process millions of events. Now, users of Hadoop can gain insights as events happen, in addition to insights from past events. By bringing real-time analytics capabilities to HDInsight, Microsoft is opening up new customer scenarios such as the ability to analyze operational data in real time for predictive maintenance. Storm can also be used with machine learning solution that has previously been trained by batch processing, such as a solution based on Mahout. However its generic, distributed computation model also opens the door for stream-based machine learning solutions. For information on real world scenarios, read how companies are using Storm.



Apache Storm is a distributed, fault-tolerant, open source computation system that allows you to process data in realtime. Storm solutions can also provide guaranteed processing of data, with the ability to replay data that was not successfully processed the first time. HDInsight Storm is offered as a managed cluster integrated into the Azure environment, where it may be used as part of a larger Azure solution. For example, Storm might consume data from services such as ServiceBus Queues or Event Hub, and use Websites or Cloud Services to provide data visualization. HDInsight Storm clusters may also be configured on an Azure Virtual Network, which reduces latency communicating with other resources on the same Virtual Network and can also allow secure communication with resources within a private datacenter.

Additionally, they’ve also teamed up with Hortonworks to deliver hybrid data connectors between on-premises and cloud deployments. On-premises Hadoop customers using Hortonworks Data Platform 2.2 can move data from on-premises Hadoop into Azure HDInsight. This gives every on-premises Hadoop customer elastic cloud access for back up, burst capacity, and test/dev.

All that is follow-up to:

– Satya Nadella on “Digital Work and Life Experiences” supported by “Cloud OS” and “Device OS and Hardware” platforms–all from Microsoft [this same blog, July 23, 2014] for  Azure Machine Learning, Big Data, Cortana, in-memory BI, in-memory data warehousing, Power BI and Power Q&A

– Microsoft BUILD 2014 Day 2: “rebranding” to Microsoft Azure and moving toward a comprehensive set of fully-integrated backend services [this same blog, April 27, 2014] for Azure,  Hadoop 2.2, Hadoop infrastructure on Azure, HDinsight,  Microsoft Azure, Office 365 and Windows Azure

– An upcoming new era: personalised, pro-active search and discovery experiences for Office 365 (Oslo) [this same blog, April 2, 2014] for  machine learning

– The first “post-Ballmer” offering launched: with Power BI for Office 365 everyone can analyze, visualize and share data in the cloud [this same blog, Feb 10, 2014] for  Big Data, Business Intelligence, business intelligence models,  data insights,  insights from data, Power BI as the lead business solution, Power BI for Office 365, Power BI Jumpstart, Power BI Mobile App, Power BI Sites,  Q&A, Q&A of Power BI, self-service analytics, self-service BI, self-service business intelligence solution and visualization

– Satya Nadella’s (?the next Microsoft CEO?) next ten years’ vision of “digitizing everything”, Microsoft opportunities and challenges seen by him with that, and the case of Big Data [this same blog, Dec 13, 2013] for analysis of relational and non-relational data, Apache Hadoop, Big Data, Business Intelligence,  Data Explorer, data warehousing,  digitizing everything, Hadoop, Hadoop integration, HDinsight, join relational and Hadoop cluster tables, Massively Parallel Processing, Microsoft Parallel Data Warehouse, Microsoft PolyBase,  Parallel Data Warehouse, PDW, PolyBase, Power BI for Office 365, Power Map, Power Pivot, Power Query, Power View, Q&A,  self-service analytics, self-service BI,  the next Big Data revolution, tipping point for Big Data and Windows Azure

– Microsoft partners empowered with ‘cloud first’, high-value and next-gen experiences for big data, enterprise social, and mobility on wide variety of Windows devices and Windows Server + Windows Azure + Visual Studio as the platform [this same blog, July 10, 2013] for Azure Data Marketplace, Big Data, Hadoop infrastructure on Azure,  Office 365, Power BI for Office 365 Preview and Windows Azure

– BUILD 2012: Notes on Day 1 and 2 Keynotes [this same blog, Oct 31, 2012] for Hadoop



June 30, 2012: Career of the Future: Data Scientist Study Results Infographic by EMC. The explosion in digital data, bandwidth, and processing power – combined with new tools for analyzing the data – has sparked massive interest in the field of data science. Organizations of all sizes are turning to people who are capable of translating this trove of data – created by mobile sensors, social media, surveillance, medical imaging, smart grids, and the like – into predictive insights that lead to business value. Despite the growing opportunity, demand for data scientists is outpacing the supply of talent and will do so for the next five years. Who are data science practitioners, what skills do they need, and why are they so different?



July 8, 2013: Becoming a Data Scientist – Curriculum via Metromap by Swami Chandrasekaran. Data Science, Machine Learning, Big Data Analytics, Cognitive Computing …. well all of us have been avalanched with articles, skills demand info graph’s and point of views on these topics (yawn!). One thing is for sure; you cannot become a data scientist overnight. Its a journey, for sure a challenging one. But how do you go about becoming one? Where to start? When do you start seeing light at the end of the tunnel? What is the learning roadmap? What tools and techniques do I need to know? How will you know when you have achieved your goal? Given how critical visualization is for data science, ironically I was not able to find (except for a few), pragmatic and yet visual representation of what it takes to become a data scientist. So here is my modest attempt at creating a curriculum, a learning plan that one can use in this becoming a data scientist journey. I took inspiration from the metro maps and used it to depict the learning path. I organized the overall plan progressively into the following areas / domains, Fundamentals Statistics Programming Machine Learning Text Mining / Natural Language Processing Data Visualization Big Data Data Ingestion Data Munging Toolbox Each area / domain is represented as a “metro line”, with the stations depicting the topics you must learn / master / understand in a progressive fashion. The idea is you pick a line, catch a train and go thru all the stations (topics) till you reach the final destination (or) switch to the next line. I have progressively marked each station (line) 1 thru 10 to indicate the order in which you travel. You can use this as an individual learning plan to identify the areas you most want to develop and the acquire skills. By no means this is the end; but a solid start. Feel free to leave your comments and constructive feedback. PS: I did not want to impose the use of any commercial tools in this plan. I have based this plan on tools/libraries available as open source for the most part. If you have access to a commercial software such as IBM SPSS or SAS Enterprise Miner, by all means go for it. The plan still holds good. PS: I originally wanted to create an interactive visualization using D3.js or InfoVis. But wanted to get this out quickly. Maybe I will do an interactive map in the next iteration.

Web Services and Marketplaces Create a New Data Science Economy [Machine Learning Blog from Microsoft, Oct 16, 2014]
This blog post is authored by Joseph Sirosh, Corporate Vice President of Machine Learning at Microsoft.

Yesterday, at Strata + Hadoop World, we announced the expansion of our data services with support of real-time analytics for Apache Hadoop in Azure HDInsight and new machine learning (ML) capabilities in the Azure Marketplace. Today, I would like to expand on the new ML capabilities that we announced and share how this is an important step in our journey to jump-start the new data science economy. I’ll also be speaking more about this in my keynote presentation tomorrow at Strata.

Data scientists and their management are often frustrated by just how little of their work makes it into production deployments. Consider this hypothetical, although not uncommon scenario. A data scientist and his team are asked to create a new sales prediction model that can be run whenever needed. The data scientists perfect the sales model using popular statistical modeling language, “R”. The new model is presented to management who want to get the model up and running right away as a web app and as a mobile client. Unfortunately, engineering is unable to deploy the model as they don’t have R and the only option is to convert it all to Java – something that will take months to get up and running. So the data scientists end up preparing a batch job to run R code and mail reports on a daily basis, leaving everyone unsatisfied.

Well, now there’s a better way, thanks to Azure Machine Learning.

We built Azure ML to empower data science with all the benefits of the cloud. Data scientists can bring R code and use Microsoft’s world class ML algorithms in our web-based ML Studio. No software installs required for analysis or production – our browser UI works on any machine and operating system. Teams can collaborate in the cloud, share projects, experiment with world-class algorithms and include data from databases or blob storage. They can use enormous storage and compute resources in the cloud to develop the best models from their data, unrestrained by server or storage capacity.

Perhaps best of all, with just one-click, users can publish a web service with their data science code embedded in it. Data transformations and models can now run in a web service in the cloud – fully managed, secure, reliable, available, and callable from anywhere in the world.

These web service APIs can be invoked from Excel, as shown in this video, by using this simple plug-in. Now, instead of emailing reports, users can surprise management with cloud-hosted apps that are built in hours. Engineering can hook up APIs to any application easily and even create custom mobile apps. Users can publish as many web services as they like, test multiple models in production and update models with new data. The data science team just became several times more productive and engineering is happy because integration is so easy.

But wait, there’s still more.

Imagine a data scientist hits upon that perfect idea for an intelligent web service that everyone else in the world should be building into their apps. Maybe it is a great forecasting method, or a new churn prediction technique, or a novel approach to pattern recognition. Data scientists can now build that web service in Azure ML, publish the ML web service on the Azure Marketplace and start charging for it in over one hundred currencies. Published APIs can be found via search engines. Anyone in the world can pay and subscribe to them and use them in their apps.

For the first time, data scientists can monetize their know-how and creativity just as app developers do. When this happens, we start changing the dynamics of the industry – essentially, data scientists are able to “self-publish” their domain expertise as cloud services which can then be made accessible to billions of users via smartphone apps that tap into those services.

The Azure Marketplace already has an emerging selection of such services. In just a couple of weeks, four of our data scientists published over 15 analytics APIs into the marketplace by wrapping functions from CRAN. Among others, these include APIs for forecasting, survival analysis and sentiment analysis.

Our marketplace has much more than basic analytics APIs. For example, we went and built a set of finished end-to-end ML applications, all using Azure ML, to solve specific business needs. These ML apps do not require a data scientist or ML expertise to use – the science is already baked into our solution. Users can just bring their own data and start using them. These include APIs for recommendations, items that are frequently bought together as well as anomaly detection to spot anomalous events in time-series data such as server telemetry.

A similar anomaly detection API is used by Sumo Logic, a cloud-based machine data analytics company. They have collaborated with Microsoft to bring metric-based anomaly detection capability to their customers. Our metric-based anomaly detection perfectly complements Sumo Logic’s structure-based anomaly detection capabilities. Any Sumo Logic query which results in a numerical time-series now has a special “metric anomaly detection” button which sends the pre-aggregated time series data to Azure ML for analysis. The data is then annotated with labels provided by the Azure ML service indicating unusual spikes or level shifts. Sumo Logic is now offering this optional integration in a limited beta release.

Third parties too are starting to publish APIs into our marketplace. For instance, Versium, a predictive analytics startup, has published these three sophisticated customer scores, all based on public marketing data – Giving Score (which predicts customer propensity to donate), Green Score (predicts customer propensity to make environmentally conscious purchase decisions) and Wealth Score (helps companies estimate the net worth of customers and prospects). Versium offers these scores by analyzing and associating billions of LifeData® attributes and building predictive models using Azure ML.

Our marketplace also hosts a number of other exciting APIs that use ML, including the Bing Speech Recognition Control, Microsoft Translator, Bing Synonyms API and Bing Search API.

By bringing ML capabilities to the Azure Marketplace and making it easy for anyone to access, we are liberating data science from its confines. This two-minute video recaps how:

This video illustrates how the power of the cloud removes many of the barriers to advanced analytics today. Learn how Microsoft Azure Machine Learning enables deployment of R [the code for a popular programming language used by data scientists] as a web service in minutes and global scale with the Machine Learning Marketplace. Learn more: http://azure.microsoft.com/en-us/services/machine-learning/

Get going today – sign up for Azure ML and try out some of our easy to use samples.

A new future for machine learning is being born in the cloud.

Joseph

Follow me on Twitter.

Oct 20, 2014: Joseph Sirosh – BigDataNYC 2014 – theCUBE

Microsoft Targets IBM Watson with Azure Machine Learning in Big Data Race [Redmond Magazine, Oct 17, 2014]

Nearly a year after launching its Hadoop-based Azure HDInsight cloud analytics service, Microsoft believes it’s a better and broader solution for real-time analytics and predictive analysis than IBM’s widely touted Watson. Big Blue this year has begun commercializing its Watson technology, made famous in 2011 when it came out of the research labs to appear and win on the television game show Jeopardy.

Both companies had a large presence at this year’s Strata + Hadoop World Conference in New York, attended by 5,000 Big Data geeks. At the Microsoft booth, Eron Kelly, general manager for SQL Server product marketing, highlighted some key improvements to Microsoft’s overall Big Data portfolio since last year’s release of Azure HDInsight including SQL Server 2014 with support for in-memory processing, PowerBI and the launch in June of Azure Machine Learning.

In addition to bolstering the offering, Microsoft showcased Azure ML’s ability to perform real-time predictive analytics for the retail chain Pier One.

“I think it’s very similar,” in terms of the machine learning capabilities of Watson and Azure ML, Kelly said. “We look at our offering as a self-service on the Web solution where you grab a couple of predictive model clips and you’re in production. With Watson, you call in the consultants. It’s just a difference fundamentally [that] goes to market versus IBM. I think we have a good advantage of getting scale and broad reach.”

Not surprisingly, Anjul Bhambhri, vice president of Big Data for IBM’s software group disagreed. “There are certain applications which could be very complicated which require consulting to get it right,” she said. “There’s also a lot of innovation that IBM has brought to market around exploration, visualization and discovery of Big Data which doesn’t require any consulting.” In addition to Watson, IBM offers its InfoSphere BigInsights for Hadoop and Big SQL offerings.

As it broadens its approach with a new “data culture,” Microsoft has come on strong with Azure ML, noting it shares many of the real-time predictive analytics of the new personal assistant in Windows Phone called Cortana. Now Microsoft is looking to further broaden the reach of Azure ML with the launch of a new app store-type marketplace where Microsoft and its partners will offer APIs consisting of predictive models that can plug into Azure Machine Learning.

Kicking off the new marketplace, Joseph Sirosh, Microsoft’s corporate VP for information management and machine learning, gave a talk at the Strata + Hadoop conference this morning. “Now’s the time for us to try to build the new data science economy,” he said in his presentation. “Let’s see how we might be able to build that. What do data science and machine learning people do typically? They build analytical models. But can you buy them?”

Sirosh said with Microsoft’s new data section of the Azure Marketplace, marketplace developers and IT pros can search for predictive analytics components. It consists of APIs developed both by Microsoft and partners. Among those APIs from Microsoft are Frequently Bought Together, Anomaly Detection, Cluster Manager and Lexicon Sentiment Analysis. Third parties selling their APIs and models include Datafinder, MapMechanics and Versium Analytics.

Microsoft’s goal is to build up the marketplace for these data models. “As more of you data scientists publish APIs into that marketplace, that marketplace will become just like other online app stores — an enormous of selection of intelligent APIs. And we all know as data scientists that selection is important,” Sirosh said. “Imagine a million APIs appearing in a marketplace and a virtual cycle like this that us data scientists can tap into.”

Also enabling the real-time predictive analytics support is support for Apache Storm clusters, announced today. Though it’s in preview, Kelly said Microsoft is adhering to its SLAs with use of the Apache Storm capability, which enables complex event processing and stream analytics, providing much faster responses to queries.

Microsoft also said it would support the forthcoming Hortonworks Data Platform, which has automatic backup to Azure BLOB storage, Kelly said. “Any Hortonworks customer can back up all their data to an Azure Blob in a real low cost way of storing their data, and similarly once that data is in Azure, it makes it real easy for them to apply some of these machine learning models to it for analysis with Power BI [or other tools].”

Hortonworks is also bringing HDP to Azure Virtual Machines as an Azure certified partner. This will bring Azure HDInsight to customers who want more control over it in an infrastructure-as-a-service model, Kelly said. Azure HDInsight is currently a platform as a service that is managed by Microsoft.

Sept 18, 2014: Insider’s Introduction to Microsoft Azure Machine Learning (AzureML)

Microsoft has introduced a new technology for developing analytics applications in the cloud. The presenter has an insider’s perspective, having actively provided feedback to the Microsoft team which has been developing this technology over the past 2 years. This session will 1) provide an introduction to the Azure technology including licensing, 2) provide demos of using R version 3 with AzureML, and 3) provide best practices for developing applications with Azure Machine Learning.

Mark Tabladillo is a Microsoft MVP and SAS expert. He helps teams become more confident in making actionable business decisions through the use of data mining and analytics. Mark provides training and consulting for companies in the US and around the world. He also teaches part-time with the University of Phoenix. He tweets @marktabnet and blogs at http://marktab.net.

Attachments:

PASSBAVC_20140918.pptx

Insider’s Introduction to Microsoft Azure Machine Learning (AzureML)

Insider’s Introduction to Microsoft Azure Machine Learning.pdf

Filed under: Uncategorized

Show more