Wednesday, October 16, 2013

VBA front end for R

If you work in the analytics industry, I am sure you would have had this debate sometime or the other – pros and cons of R. While everyone agrees that R is quite powerful and has great graphics, most of us, especially those who have worked on GUI based tools like SASEG, etc agree that the text output of R can be pretty verbose. A colleague of mine ran a linear model and immediately exclaimed ‘it looks so bland!’

This set me exploring ways to beautify R – I found some interesting packages which would help in formatting the output – you can check out prettyR and html converter packages which can do wonders to the plain text output in R. However, my requirements were a little customized. We used excel in most of our day to day activities and VBA is quite powerful in parsing/formatting the results. So, why not use Excel and VBA to create a beautiful front end to run R? It could be a macro enabled tool which will read input from an excel sheet, run the regression code using RScript and display the formatted output on excel. Well, turns out that I was able to do all that and even more – this post explains the findings of my endeavor:

The following questions will be answered during the course of this article:
- How to run an RScript through MS Excel using VBA?
- How to run an RScript through command prompt? [in WINDOWS]
- How to pass arguments to an RScript through command line/external code? [in WINDOWS]
- How to read plain text files in MS Excel using VBA? [obviously WINDOWS :) ]

Just so that we know that our commands are executed correctly, let us write the following simple R code and save it in our directory ‘C:\R_code’ as ‘hello.R’

Contents of ‘C:\R_code\hello.R’
# Prints output to console
cat('Hello World')
var1<-5^3
var2<-7^3
cat('\nThe result of adding',var1,'to',var2,'is',var1+var2)

Running RScript through command prompt:

To be able to run R code through command prompt/other applications, you need to have the path of ‘R.exe’ and ‘RScript.exe’ in your system variable PATH. You can do this easily if you have admin rights to your system. Check this link to know how to do it on WINDOWS7. However, if you don’t have admin rights and want to add something to the PATH variable, don’t worry – you can easily add this to the USER variable PATH. Here are the steps on how to do this:

1. Suppose you have the ‘R.exe’ and ‘RScript.exe’ installed in the following directory: ‘C:\Program Files\R\R-2.15.3\bin\x64’. Copy this path to your clipboard.
2. Go to ‘Computer’ -- > Properties
3. On the left pane, click on ‘Advanced system settings’
4. On the ‘System properties’ dialog that opens up, navigate to the ‘Advanced’ tab and click on ‘Environment variables…’


5. On the ‘User variables’ click on ‘New…’


6. In the field ‘Variable Name:’, type PATH
7. In the field ‘Variable Value:’, paste the clipboard value, ie ‘C:\Program Files\R\R-2.15.3\bin\x64’. Add a semicolon ‘;’ after that.
8. Click on ‘Ok’ as many times to dismiss all dialog boxes.
9. Open command prompt and type ‘Rscript’ and hit ENTER. You will see the following:


Now that you have Rscript on your path, you can run R code from any directory on your system, including applications like MS-Excel through VBA. Just repeat step 9 by passing any *.R file as argument with the full path and it will execute as expected:


Running RScript through VBA:

To run this code in MS-Excel using VBA, you need to open a macro enabled worksheet (*.xlsm). To create a new macro enabled sheet, just create a new workbook and click on ‘Save As..’ and save as ‘Excel Macro-Enabled Workbook (*.xlsm)’. Once you have a macro-enabled workbook open, press the shortcut key combination ‘ALT + F11’ to open up the VBA editor. Once that is done, right click on the ‘Project Explorer’ to create a new module (which will be Module1 by default) and then type the following VBA code:

Sub RunRscript()
'runs an external R code through Shell
'The location of the RScript is 'C:\R_code'
'The script name is 'hello.R'

Dim shell As Object
Set shell = VBA.CreateObject("WScript.Shell")
Dim waitTillComplete As Boolean: waitTillComplete = True
Dim style As Integer: style = 1
Dim errorCode As Integer
Dim path As String
path = "RScript C:\R_code\hello.R"
errorCode = shell.Run(path, style, waitTillComplete)
End Sub

If you look at the VBA code carefully, it creates a Windows Shell object and invokes the R command through the shell. Also, the advantage of using Wscript.shell is that you can get VBA to wait till the execution is finished. To get more information on how to run a macro or use the VBA editor, you can refer to a lot of online tutorials that are easily available. A good place to start would be the MSDN tutorial which you can find here.

When you run this macro, you can see that a command window opens up, executes something and closes. But how do you know if the code has actually executed? A good way to redirect the console output on the R code to a file. You can use this by the sink function in R. Here is the modified R code which accomplishes the same:

# Re-directs the console output to a file 'hello.txt'
# The file is created in the directory 'C:\R_code'

sink('C:/R_code/hello.txt',append=F,type="output")
cat('Hello World')
var1<-5^3
var2<-7^3
cat('\nThe result of adding',var1,'to',var2,'is',var1+var2)
sink(NULL)

Once you’ve run the VBA macro, browse to C:\R_code and check if the ‘hello.txt’ has been created or not. If you can find the file there, then congratulations! You have run successfully used VBA to execute an R script.

Passing arguments to an RScript through command line/VBA:

Most of the work which we do requires us to pass inputs/parameters to a tool at runtime. In the code above, let’s say, we wanted ‘var1’ and ‘var2’ to be passed during runtime instead of being hardcoded the way they are right now. Let us create a simple excel tool which accepts two numbers and adds them, the front end would look like this:


Now, the only change in the VBA code would be to read inputs from cells D5 and F5 and pass it on to the RScript. The modified code would look like:

Sub RunRscript()
'runs an external R code through Shell
'The location of the RScript is 'C:\R_code'
'The script name is 'hello.R'
Dim shell As Object
Set shell = VBA.CreateObject("WScript.Shell")
Dim waitTillComplete As Boolean: waitTillComplete = True
Dim style As Integer: style = 1
Dim errorCode As Integer
Dim var1, var2 As Double
var1 = Sheet1.Range("D5").Value
var2 = Sheet1.Range("F5").Value

Dim path As String
path = "RScript C:\R_code\hello.R " & var1 & " " & var2

errorCode = shell.Run(path, style, waitTillComplete)

End Sub

Now, the VBA code is ready to pass two extra parameters to the Rscript and get it executed. But, the change on the input side means we will also have to change the R code to accept the input parameters and process them. This can be accomplished very well using the commandArgs function in R which will read the arguments and store it as a vector. The code changes as below:

# Accepts two numbers and adds them
# Re-directs the console output to a file 'hello.txt'
# The file is created in the directory 'C:\R_code'

args<-commandArgs(trailingOnly=T)
# cat(paste(args,collapse="\n"))
sink('C:/R_code/hello.txt',append=F,type="output")
cat('Hello World')
var1<-as.numeric(args[1])
var2<-as.numeric(args[2])
cat('\nThe result of adding',var1,'to',var2,'is',var1+var2)
sink(NULL)

Note the use of the ‘trailingOnly=T’ option in the commandArgs function. This would make the args vector store only those arguments which are passed by the USER. In addition to the USER arguments, RScript passes some system arguments by default. If you are interests in modifying those (like the directory of the R file, etc), then you would probably keep the trailingOnly argument to FALSE.

You now learnt how to invoke R from excel and how to pass data between R and excel. You can build on these two functionalities to develop some cool stuff which use Excel as front end and R as the backend. By the use of packages like ‘xlsx’ which can create data frames from excel sheets, you can go on to build so many applications like these:



Reading text/picture files in MS Excel using VBA:

Although this part does not contain any R codes, I am posting this for the sake of completeness. This way, you will have one complete tool to play with. Once you have the output of R in a text file/picture file, you can read it back into Excel using VBA and display the nicely formatted result in excel. This part will be particularly useful if you want to create a tool that reads data from excel, does some statistical analysis using R in the backend and then displays the summary of the analysis. Here is the VBA code you can use to parse through a text file:

this code will read from a file hello.txt and store the result Sheet2 starting from range A1 in consecutive rows
Dim sFile As String
sFile = "C:\R_code\hello.txt"

Dim rowNum As Integer
rowNum = 1
Set dest = Sheet2.Cells(rowNum, 1)

Open sFile For Input As #1
Do Until EOF(1)
    Input #1, ReadData
    If Not IsEmpty(ReadData) Then
        dest.Cells = ReadData
        rowNum = rowNum + 1
        Set dest = Sheet1.Cells(rowNum, 1)
    End If
Loop
Close #1 'close the opened file

And the below code can be used to copy pictures into VBA:

Inserts a picture located in R_code into Sheet2 at position A1 onwards
Sheet2.Range("$A$1").Select
Dim sFile As String
sFile = C:\R_code\mypicture1.jpg"

ActiveSheet.Pictures.Insert(sFile) _
        .Select
Selection.ShapeRange.Height = 324
Selection.ShapeRange.Width = 396
    With Selection.ShapeRange.Line
        .Visible = msoTrue
        .ForeColor.ObjectThemeColor = msoThemeColorText1
        .ForeColor.TintAndShade = 0
        .ForeColor.Brightness = 0
        .Transparency = 0
    End With

So, with just little bit of code to format your results, you can get nicely formatted results in the way that you want. Below is the sample of the output from a linear regression model showing model accuracy, beta coefficients (from text file) and residual plots (from picture):


This is just the beginning. Once you have integrated R with VBA and vice versa, there is no limit to what you can achieve. R can be a powerful backend for computations where excel fails and I am sure we all agree that excel is still the de-facto standard for sharing and displaying summary reports. By using the interface techniques mentioned in this post, you can make the two of these complement each other very well. I would encourage you to try this out and let me know your thoughts in the comments below. If you like this post, then please follow this blog for more interesting posts, and tell your friends too :)