Statsblogs.com

Twelve advantages to calling R from the SAS/IML language

2013-11-25

(This article was originally published at The DO Loop, and syndicated at StatsBlogs.)

For several years, there has been interest in calling R from SAS software, primarily because of the large number of special-purpose R packages. The ability to call R from SAS has been available in SAS/IML since 2009. Previous blog posts about R include a video on how to call R from the SAS/IML language and a detailed example of calling R and importing the results back into SAS. The SAS/IML interface enables you to embed tabular output from R into SAS reports and to transfer matrices and data frames from R into SAS. You can display R graphics in the native R graphics window or tell R to write its graphics as an image file.

Recently I was asked about SAS macros that call R, such as the free %PROC_R macro by Xin Wei, or macros that were written by Phil Holland or Phil Rack. In addition to these general-purpose macros, several programmers have described how to call R from SAS for special purposes. See the examples by Charlie Huang on his SAS Analysis blog and by
Liang Xie on his blog. These macros do not require using the SAS/IML language, so they might be a reasonable choice for SAS customers who do not have a license for the SAS/IML product.

The %PROC_R macro, which is described by Wei (2012, p. 8), is typical of a macro that calls R. It implements the following steps:

The SAS DATA step writes an R script to a text file.

Any SAS data sets that should be made available to R are converted to a form that is readable by R, such as a CSV file. (Other macros might try to read the SAS file directly.)

The macro calls the R executable to execute the R file in batch mode, either by using a pipe or by using a SAS statement that calls the operating system, such as the X statement or the %SYSEXEC statement.

Optionally, R results are made available to SAS, usually by writing them a CSV file.

Optionally, R output is sent to a SAS Output window and error information is sent to the SAS Log.

What is the value of the SAS/IML interface to R?

In light of these macros, why should you use the SAS/IML interface to R? The answer is that the SAS/IML interface contains many features that are not available if you use a SAS macro to call R. Here are a dozen advantages to using the SAS/IML interface to call R:

When you call R from the SAS/IML language, the R session persists until you quit PROC IML. The R session—including all existing R functions and variables—remains active, which means that you can call R multiple times or even call R within a SAS/IML DO loop.

This feature is very powerful. It means that at the top of your SAS/IML program you can load your favorite R packages and functions (perhaps by using a single %INCLUDE call) and they will be available when you call R. The looping feature means that you can apply an R method to many data sets or vectors, or you can write an iterative algorithm in SAS/IML that calls R during each step of the iteration. The first call to R launches the R process; subsequent calls use the process that is already running. This is in stark contrast to the macro approach, for which each call to R starts and exits R, which leads to a lot of "overhead" costs. Consequently, if you need to call R multiple times, the SAS/IML interface has much better performance.

Missing values are automatically converted from SAS to R and from R to SAS. The following SAS/IML session sends a vector to R that contains missing values. It also reads an R array that contains R missing values. The printed output shows that all missing values are handled in a robust way.

You can pass parameters from SAS to R. This is very useful because it enables you to communicate options such as the name of the analysis variables to R. Last week I showed how to use this feature to construct a general-purpose SAS/IML module that passes arguments to R.

You can call R and then resume execution of a SAS/IML program. Because you can also call SAS procedures and DATA steps from within the SAS/IML language, the SAS/IML language can serve as "glue" to drive an analysis that uses SAS/IML functions, SAS procedures, and R functions.

You can easily transfer data in both directions. You can do this multiple times within your program, and you have complete control over the sequence of transfers. The interactive SAS/IML language enables you to dynamically specify the names of data frames and matrices at run time. Furthermore, the SAS/IML interface does not use CSV files to transfer data. Because CSV files are slow to read and write, the SAS/IML interface is faster at transferring data than a macro that uses CSV files as an intermediate format.

Date, time, and datetime values are automatically converted.

The ImportDataSetFromR call automatically converts R names to valid SAS names. (R permits names of variables that are not valid variable names in SAS.)

In contrast to the SAS macros, the SAS/IML interface to R does not use the X statement to issue commands to the operating system. The programs you write in the SAS/IML language are portable: they run on any operating system that supports both SAS and R. This is important because some SAS administrators disable calls to the operating system from within SAS. If this is the case at your site, the macros that use the X statement or pipes will not work.

In the SAS/IML Studio environment, SAS and R can run on different machines. You can run SAS on a huge server and run R on your local PC. Even though the two software packages are running on different machines, the functions for data transfer work without modification.
This can be an advantage for researchers who work for a large corporation. The macro approach requires that R be installed on the SAS server and that the installation of R contain all packages that might be used by any analyst. In contrast, SAS/IML Studio enables each analyst to install a local copy of R with only the packages that he or she needs. Analysts can upgrade their version of R or their collection of packages at any time without disrupting the work of their colleagues.

In many situations, you can interrupt a long-running R computation by clicking on the usual "Break" icon on the SAS GUI toolbar. This interrupts the R computation without killing the entire SAS process. After you regain control of the program, you can save your program, modify it, and resubmit it.

Along the same lines, the OK= option in the SUBMIT statement enables you to handle errors that occur in R. Depending on the severity of the error, you can either continue processing or choose to write an error message and abort the program. For a simple example, the following program creates an intentional error in the R program and prints an error message:

As with all SAS-supported features, free technical support is just a phone call or email away. If you are confused or your program is not behaving in the way that you expect, call SAS Technical Support for assistance.

So there you have it, a dozen reasons to use the SAS/IML interface to R. Although it is theoretically possible for a SAS macro to add features such as converting missing values, the most powerful features (passing parameters and calling R within a SAS/IML DO loop) are unique to the SAS/IML interface. The SAS/IML interface to R greatly enhances the ways that SAS and R can work together.

tags: R, Statistical Programming

Please comment on the article here: The DO Loop

The post Twelve advantages to calling R from the SAS/IML language appeared first on All About Statistics.