2016-03-28

Originally posted on: http://geekswithblogs.net/hroggero/archive/2016/03/28/accessing-no-sql-data-from-sql-server-2016-and-r-services.aspx

In this post, I will show how to easily extend SQL Server 2016 R Services to consume No-SQL data in real-time, such as Azure Tables. This example could easily be modified to access other No-SQL services such as Couchbase for example. As you may know, SQL Server 2016 (currently in Release Candidate 1) now offers the ability to call an R service for statistical analysis using an SQL statement. What is interesting is that you can execute an R script (that contains the statistical computation) directly from SQL Server 2016, and consume SQL Server data directly from within the R script. This means that you can more easily perform advanced mathematical computations in SQL Server by leveraging an external R service. However, there are situations in which parts of the data needed for the analysis is not readily available inside SQL Server, but in external data stores, such as a No-SQL environment; this usually translates into addition configuration to leverage an ETL tool to load the data into temporary or staging tables. In this blog post, I will show you how SQL Server 2016 with R Services can tap into a No-SQL data store directly using basic SQL statements by leveraging Enzo Unified, without using an ETL tool and without the need to create temporary/staging tables.

I am assuming that you have installed SQL Server 2016 (RC1), and the necessary R services on the same machine; please follow these links for further information about SQL Server 2016 and how to install the R services. Note that you will need to select the Advanced Analytics Extensions option during the installation of SQL Server 2016.

- SQL Server 2016 RC1: https://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2016

- Install R Services: https://msdn.microsoft.com/en-us/library/mt604883.aspx

- Post Installation Server Configuration: https://msdn.microsoft.com/en-us/library/mt590536.aspx

You will also need Enzo Unified installed on the same machine; to obtain Enzo Unified, contact info@enzounified.com. Enzo Unified looks like SQL Server on the network and provides instant access to a large number of endpoints using the SQL language natively, from SQL Server or applications directly. In this scenario, Enzo Unified is configured to allow access to an Azure Table using the following command:

SELECT * FROM AzureStorage.Table@table1

Note: Enzo Unified uses a specific convention to specify an Azure Table name: add the @ symbol followed by the name of the table. The above SQL command will return all available entities and columns from an Azure Table called “table1”.

In order to make this data available to an R script, I created a Linked Server definition in SQL Server that points to Enzo Unified. The Linked Server is called [localhost,9550]. This means that SQL Server can also access No-SQL data directly using the following command (note the addition of the Linked Server):

SELECT * FROM [localhost,9550].bsc.AzureStorage.Table@table1

However, Azure Tables do not have a firm schema definition; each entity in an Azure Table is an XML document with potentially different nodes; as a result, it is necessary to create a virtual table inside Enzo Unified that binds the table to a known schema. The following command creates a virtual table called “SalesData” and is created under a schema called “shard”; the command accepts the name of the virtual table, the connection information to the Azure Table (not shown here to simplify the example), a comment, and the columns names (schema) that it returns:

exec shard.CreateVirtualTable
'SalesData',        -- name of virtual table
'AllRecentSales',  -- name of the data source pointing to the desired Azure Table
'Retrieves all sales data',    -- comment
'int id|||x,datetime effdate|||x,itemname|||x,category|||x,statename|||x,decimal price|||x,int customerid|||x,int orderid|||x,__source__'      -- list of columns

Now that the virtual table has been created in Enzo Unified, the SalesData virtual table returns the desired columns from the Azure Table:

SELECT id, statename, price, customerid FROM [localhost,9550].bsc.shard.salesdata

Once the virtual table has been configured in Enzo Unified, SQL Server 2016 can present Azure Table data to an R script. The R script accepts an SQL command that will be used as the input data set.  For example, the following R script’s input data set is the SQL statement provided above that returns four columns: the id, statename, price and customerid fields from the Azure Table.



As you can see above, Enzo Unified has abstracted the Azure Table entirely, presenting No-SQL data as a SQL Server table to an R script; this sample script does not actually perform any statistical analysis; it only returns the data it reads to demonstrate the capability to access No-SQL data from an R script. Because the call to Enzo Unified is performed in real-time, this script could yield a different output every time it is executed if the underlying Azure Table data changes.

In conclusion, configuring SQL Server 2016 (RC1), R Services, and Enzo Unified allows you to perform statistical analysis on a large number of data sources, including No-SQL databases, thanks to Enzo Unified, without building temporary tables and complex ETL processes to being the data into SQL Server staging tables.

About Herve Roggero

Herve Roggero, Microsoft Azure MVP, @hroggero, is the founder of Enzo Unified (http://www.enzounified.com). Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" and “PRO SQL Server 2012 Practices” from Apress, a PluralSight author, and runs the Azure Florida Association.

About Enzo Unified

Enzo Unified is a data platform that helps companies reduce development and data integration project timelines, improve data quality, and increase operational efficiency by solving some of the most complex real-time data consumption challenges. For more information, contact info@enzounified.com, or visit http://www.enzounified.com/.

Show more