Invoking R from Python
Statistical analysis is often best conducted from R, be it within an integrated development environment like RStudio or simply from the command line. But from time to time, it is convenient to utilize R from within Python. These scenarios range from the simple, such as summarizing or plotting data with ggplot from a Python script, to the complex, such as using Python to orchestrate complex analytical workflows that incorporate R into a long chain of analysis.
The most popular method for doing this may be the rpy2 package. This package works by loading the R interpreter directly into the Python process and interacting with R through its C language API. This allows Python objects to be translated to R objects and back directly in memory, and even for Python code to call R functions as if they were Python functions. This tight integration is appealing for its high performance, and for a long time, MGET used rpy2’s predecessor rpy to interact with R.
Unfortunately, rpy2 has some drawbacks when trying to utilize it on Windows, which is a key requirement for MGET. By binding to R’s C API, a given compilation of rpy2 may only work against a single version of R, or a small range of versions, and if a newer version is needed, rpy2 must be recompiled. And by loading R and Python into the same process, there is a risk that R or one of its packages might attempt to load a different version of a shared library (a .DLL file) than Python has already loaded into the process, causing the process to crash. (This is known generically as the DLL Hell problem.) Finally, historically the rpy2 project has lacked a Windows maintainer and only rarely released Python wheels for Windows, leaving Windows users to build rpy2 themselves, a challenging and time-consuming task.
Back when MGET ran on Python 2, we relieved MGET users of this problem by maintaining our own build of rpy for Windows and including it in MGET. In software engineering, this approach is known as dependency vendoring, and it is often problematic. Whenever a new version of R was released, we had to rebuild and release MGET, so its copy of rpy was up to date, and users then had to reinstall MGET to use that version of R. This was a big chore, and MGET’s compatibility with new versions of R often lagged.
When we ported MGET from Python 2 to Python 3, we switched from using rpy to a solution based on the R plumber package. MGET now starts R as a child process and then interacts it over HTTP using plumber. This isolates R to its own process, eliminating shared library conflicts, and works with any version of R that supports plumber, eliminating the need to rebuild and reinstall MGET when a new version of R is released.
In this example, we’ll show you the basics of using MGET’s
RWorkerProcess class to interact with R from a Python
command prompt. You can learn a lot more details by reading its class
documentation (just click on the class in the preceding sentence). We also
have an example showing how to invoke R from ArcGIS
using MGET’s Evaluate R Statements geoprocessing tool.
To run this example, you must have R installed. We recommend a relatively recent version, but any version from the past few years should work.
Getting started
To get started, we recommend you first import MGET’s
Logger and initialize it. This will cause messages
generated by R code to be logged by MGET and then printed to the console. Then
import and instantiate RWorkerProcess:
>>> from GeoEco.Logging import Logger >>> Logger.Initialize() >>> from GeoEco.R import RWorkerProcess >>> r = RWorkerProcess()
Evaluating R expressions
Use the Eval() function to evaluate R
expressions and return the result to Python. Most commonly-used R data types
are translated into an appropriate Python type, for example:
>>> r.Eval('TRUE') True >>> r.Eval('123') 123 >>> r.Eval('pi') 3.141592653589793 >>> r.Eval('"Hello, world"') 'Hello, world' >>> r.Eval('Sys.time()') datetime.datetime(2025, 2, 5, 15, 13, 47, 641000, tzinfo=zoneinfo.ZoneInfo(key='America/New_York')) >>> r.Eval('NA') is None True >>> r.Eval('c(1, 2, NA, 3)') [1, 2, None, 3] >>> r.Eval('list(a=c(1,NA,3), b=4, c=c("A", "B", NA))') {'a': [1, None, 3], 'b': 4, 'c': ['A', 'B', None]}
R data frames are returned as pandas data frames:
>>> df = r.Eval('iris') >>> df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 150 entries, 0 to 149 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Sepal.Length 150 non-null float64 1 Sepal.Width 150 non-null float64 2 Petal.Length 150 non-null float64 3 Petal.Width 150 non-null float64 4 Species 150 non-null category dtypes: category(1), float64(4) memory usage: 5.1 KB >>> df.head() Sepal.Length Sepal.Width Petal.Length Petal.Width Species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa
Multiple expressions can be evaluated in one call. Separate them with semicolons or newline characters. The value of the last expression will be returned.
>>> r.Eval('x <- 6; y <- 7; x * y') 42
Getting and setting variables in R
You can get and set variables in the R interpreter through the dictionary
interface of the RWorkerProcess instance:
>>> r['my_variable'] = 42 # Set my_variable to 42 in the R interpreter
>>> print(r['my_variable']) # Get back the value of my_variable and print it
42
>>> print(list(r.keys())) # Print a list of the variables defined in the R interpreter
['my_variable']
>>> del r['my_variable'] # Delete my_variable from the R interpreter
>>> print(list(r.keys())) # Now it is gone
[]
Messages from R
Any output that R writes to its stdout pipe, such as that from the R cat()
function, is logged as INFO messages:
>>> r.Eval('cat("Hello\n")') 2025-02-12 11:56:06.257 INFO Hello
Output that r writes to stderr, such as that from the message() function,
is logged as WARNING messages:
>>> r.Eval('message("HELLO")') 2025-02-12 11:55:34.775 WARNING HELLO
(Note that messages written with cat() require a terminating newline
character (\n), while messages written with message() have a newline
added automatically.)
You can print objects in R and see the log messages in Python. But because the
R print() function returns a value, it will be returned by
Eval(). For example:
>>> r.Eval('print(summary(cars))') 2025-02-12 11:51:44.969 INFO speed dist 2025-02-12 11:51:44.969 INFO Min. : 4.0 Min. : 2.00 2025-02-12 11:51:44.969 INFO 1st Qu.:12.0 1st Qu.: 26.00 2025-02-12 11:51:44.969 INFO Median :15.0 Median : 36.00 2025-02-12 11:51:44.969 INFO Mean :15.4 Mean : 42.98 2025-02-12 11:51:44.969 INFO 3rd Qu.:19.0 3rd Qu.: 56.00 2025-02-12 11:51:44.969 INFO Max. :25.0 Max. :120.00 [['Min. : 4.0 ', 'Min. : 2.00 '], ['1st Qu.:12.0 ', '1st Qu.: 26.00 '], ['Median :15.0 ', 'Median : 36.00 '], ['Mean :15.4 ', 'Mean : 42.98 '], ['3rd Qu.:19.0 ', '3rd Qu.: 56.00 '], ['Max. :25.0 ', 'Max. :120.00 ']] >>>
What you get back depends on what was printed. In the example above, the R
summary() function returned an R table, which print() passed
through. Plumber and Eval() then translated
this into a list of lists. To pass None back instead, append
; NULL to the end of your expression:
>>> r.Eval('print(summary(cars)); NULL') 2025-02-12 11:51:44.969 INFO speed dist 2025-02-12 11:51:44.969 INFO Min. : 4.0 Min. : 2.00 2025-02-12 11:51:44.969 INFO 1st Qu.:12.0 1st Qu.: 26.00 2025-02-12 11:51:44.969 INFO Median :15.0 Median : 36.00 2025-02-12 11:51:44.969 INFO Mean :15.4 Mean : 42.98 2025-02-12 11:51:44.969 INFO 3rd Qu.:19.0 3rd Qu.: 56.00 2025-02-12 11:51:44.969 INFO Max. :25.0 Max. :120.00 >>>
Errors from R
Errors signaled in R are raised as Python RuntimeErrors:
>>> r.Eval('this_function_does_not_exist()') Traceback (most recent call last): File "<python-input-8>", line 1, in <module> r.Eval('this_function_does_not_exist()') ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jason/Development/MGET/src/GeoEco/R/_RWorkerProcess.py", line 1176, in Eval return(self._ProcessResponse(resp, parseReturnValue=True)) ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jason/Development/MGET/src/GeoEco/R/_RWorkerProcess.py", line 927, in _ProcessResponse raise RuntimeError(f'From R: {respJSON["message"]}') RuntimeError: From R: Error in this_function_does_not_exist(): could not find function "this_function_does_not_exist"
Next steps
To learn more, please review the class documentation for
RWorkerProcess.