Recently, I wrote several articles about the use of R code in Microsoft tools, gathered in the Talking about R series. The time has come to sum up this chapter. I think the perfect way to do this is with an article in which you will learn how to publish R scripts, how to manage them, and which tools to use for efficient operations.
Imagine that you are faced with the task of deploying R code in an R Server (Machine Learning Server) environment, and additionally as a web service. This is so that you can later easily set up request/response components for the logic implemented in R, using client’s application.
This is a very good example of using the DeployR package.
The DeployR Repository Manager is used to manage the entire DeployR service, which is a web interface supporting the central repository. With this tool you can build the R code as web service, and even test whether the input you are serving returns the desired result.
It is worth using when we are introducing an application that has data processing logic implemented in R. Then data scientists and developers have one centralized repository of access to the produced web service.
Let’s consider an example of applying the tool in a real business case.
Below are the steps for the process of introducing the scoring logic and models implemented in R that the application will use.
DATA SCIENTIST
APPLICATION DEVELOPER
This is one of the ways of working productively with R code. At Predica, I use these practices in the implementation of projects where solutions require R integration with clients’ applications
At this point, it is possible to combine two very fashionable buzzwords, machine learning and serverless – how to do it? The answer is – with Azure Functions.
We get many benefits from using this solution, and we are able to address specific business cases thanks to this approach. One of the most popular applications is to trigger and schedule simple R operations.
Azure Functions does not have native R support, however there are known ways to invoke R.
Azure Functions, and specifically the Kudu development tool, has an extension that needs to be installed. Then, in a properly prepared environment, running R will be possible through calling the R script in PowerShell code.
Here is a very good tutorial which shows how you can easily run R from Azure Functions.
If processing on a single thread takes a very long time, you may want to parallelize your processing. The more FOR instructions there are in your program, the more advisable it is to do so.
In such cases, it is worth transferring your solution to R Server and using parallelization with RevoScaleR, or Azure Batch parallel processing with doAzureParallel.
The principle of operation is very simple. The doAzureParallel package separates each iteration of the foreach loop into individual virtual machines. They are then combined in a cluster of virtual machines in Azure Batch.
This scenario is ideal for problems which include Monte Carlo simulations or parametric sweeps. It therefore fits perfectly into many financial modelling algorithms (back-testing, scenario modelling portfolio, as well as testing many algorithms to compare their performance).
The doAzureParallel package is available as an open-source MIT license. It is available on GitHub under this link.
There are many ways to publish solutions in R using Azure components, but which one to choose? Unfortunately, this should be considered on a case-by-case basis – it is impossible to determine one common scheme.
It is also worth carrying out a series of tests on the solution that will allow you to choose the right path for publishing it, especially as the entry barriers, both technological and cost, are very low.
If you have a problem with choosing the right path or installing packages (issues can appear particularly with doAzureParallel) – contact me immediately! I will be happy to help you solve this problem.
Read other similar articles