Invoking R from Python

Statistical analysis is often best conducted from R, be it within an integrated development environment like RStudio or simply from the command line. But from time to time, it is convenient to utilize R from within Python. These scenarios range from the simple, such as summarizing or plotting data with ggplot from a Python script, to the complex, such as using Python to orchestrate complex analytical workflows that incorporate R into a long chain of analysis.

The most popular method for doing this may be the rpy2 package. This package works by loading the R interpreter directly into the Python process and interacting with R through its C language API. This allows Python objects to be translated to R objects and back directly in memory, and even for Python code to call R functions as if they were Python functions. This tight integration is appealing for its high performance, and for a long time, MGET used rpy2’s predecessor rpy to interact with R.

Unfortunately, rpy2 has some drawbacks when trying to utilize it on Windows, which is a key requirement for MGET. By binding to R’s C API, a given compilation of rpy2 may only work against a single version of R, or a small range of versions, and if a newer version is needed, rpy2 must be recompiled. And by loading R and Python into the same process, there is a risk that R or one of its packages might attempt to load a different version of a shared library (a .DLL file) than Python has already loaded into the process, causing the process to crash. (This is known generically as the DLL Hell problem.) Finally, historically the rpy2 project has lacked a Windows maintainer and only rarely released Python wheels for Windows, leaving Windows users to build rpy2 themselves, a challenging and time-consuming task.

Back when MGET ran on Python 2, we relieved MGET users of this problem by maintaining our own build of rpy for Windows and including it in MGET. In software engineering, this approach is known as dependency vendoring, and it is often problematic. Whenever a new version of R was released, we had to rebuild and release MGET, so its copy of rpy was up to date, and users then had to reinstall MGET to use that version of R. This was a big chore, and MGET’s compatibility with new versions of R often lagged.

When we ported MGET from Python 2 to Python 3, we switched from using rpy to a solution based on the R plumber package. MGET now starts R as a child process and then interacts it over HTTP using plumber. This isolates R to its own process, eliminating shared library conflicts, and works with any version of R that supports plumber, eliminating the need to rebuild and reinstall MGET when a new version of R is released.

In this example, we’ll show you the basics of using MGET’s RWorkerProcess class to interact with R from a Python command prompt. You can learn a lot more details by reading its class documentation (just click on the class in the preceding sentence). We also have an example showing how to invoke R from ArcGIS using MGET’s Evaluate R Statements geoprocessing tool.

To run this example, you must have R installed. We recommend a relatively recent version, but any version from the past few years should work.

Getting started

To get started, we recommend you first import MGET’s Logger and initialize it. This will cause messages generated by R code to be logged by MGET and then printed to the console. Then import and instantiate RWorkerProcess:

>>> from GeoEco.Logging import Logger
>>> Logger.Initialize()
>>> from GeoEco.R import RWorkerProcess
>>> r = RWorkerProcess()

Evaluating R expressions

Use the Eval() function to evaluate R expressions and return the result to Python. Most commonly-used R data types are translated into an appropriate Python type, for example:

>>> r.Eval('TRUE')
True
>>> r.Eval('123')
123
>>> r.Eval('pi')
3.141592653589793
>>> r.Eval('"Hello, world"')
'Hello, world'
>>> r.Eval('Sys.time()')
datetime.datetime(2025, 2, 5, 15, 13, 47, 641000, tzinfo=zoneinfo.ZoneInfo(key='America/New_York'))
>>> r.Eval('NA') is None
True
>>> r.Eval('c(1, 2, NA, 3)')
[1, 2, None, 3]
>>> r.Eval('list(a=c(1,NA,3), b=4, c=c("A", "B", NA))')
{'a': [1, None, 3], 'b': 4, 'c': ['A', 'B', None]}

R data frames are returned as pandas data frames:

>>> df = r.Eval('iris')
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype
---  ------        --------------  -----
 0   Sepal.Length  150 non-null    float64
 1   Sepal.Width   150 non-null    float64
 2   Petal.Length  150 non-null    float64
 3   Petal.Width   150 non-null    float64
 4   Species       150 non-null    category
dtypes: category(1), float64(4)
memory usage: 5.1 KB
>>> df.head()
   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

Multiple expressions can be evaluated in one call. Separate them with semicolons or newline characters. The value of the last expression will be returned.

>>> r.Eval('x <- 6; y <- 7; x * y')
42

Getting and setting variables in R

You can get and set variables in the R interpreter through the dictionary interface of the RWorkerProcess instance:

>>> r['my_variable'] = 42     # Set my_variable to 42 in the R interpreter
>>> print(r['my_variable'])   # Get back the value of my_variable and print it
42
>>> print(list(r.keys()))     # Print a list of the variables defined in the R interpreter
['my_variable']
>>> del r['my_variable']      # Delete my_variable from the R interpreter
>>> print(list(r.keys()))     # Now it is gone
[]

Messages from R

Any output that R writes to its stdout pipe, such as that from the R cat() function, is logged as INFO messages:

>>> r.Eval('cat("Hello\n")')
2025-02-12 11:56:06.257 INFO Hello

Output that r writes to stderr, such as that from the message() function, is logged as WARNING messages:

>>> r.Eval('message("HELLO")')
2025-02-12 11:55:34.775 WARNING HELLO

(Note that messages written with cat() require a terminating newline character (\n), while messages written with message() have a newline added automatically.)

You can print objects in R and see the log messages in Python. But because the R print() function returns a value, it will be returned by Eval(). For example:

>>> r.Eval('print(summary(cars))')
2025-02-12 11:51:44.969 INFO      speed           dist
2025-02-12 11:51:44.969 INFO  Min.   : 4.0   Min.   :  2.00
2025-02-12 11:51:44.969 INFO  1st Qu.:12.0   1st Qu.: 26.00
2025-02-12 11:51:44.969 INFO  Median :15.0   Median : 36.00
2025-02-12 11:51:44.969 INFO  Mean   :15.4   Mean   : 42.98
2025-02-12 11:51:44.969 INFO  3rd Qu.:19.0   3rd Qu.: 56.00
2025-02-12 11:51:44.969 INFO  Max.   :25.0   Max.   :120.00
[['Min.   : 4.0  ', 'Min.   :  2.00  '], ['1st Qu.:12.0  ', '1st Qu.: 26.00  '], ['Median :15.0  ', 'Median : 36.00  '], ['Mean   :15.4  ', 'Mean   : 42.98  '], ['3rd Qu.:19.0  ', '3rd Qu.: 56.00  '], ['Max.   :25.0  ', 'Max.   :120.00  ']]
>>>

What you get back depends on what was printed. In the example above, the R summary() function returned an R table, which print() passed through. Plumber and Eval() then translated this into a list of lists. To pass None back instead, append ; NULL to the end of your expression:

>>> r.Eval('print(summary(cars)); NULL')
2025-02-12 11:51:44.969 INFO      speed           dist
2025-02-12 11:51:44.969 INFO  Min.   : 4.0   Min.   :  2.00
2025-02-12 11:51:44.969 INFO  1st Qu.:12.0   1st Qu.: 26.00
2025-02-12 11:51:44.969 INFO  Median :15.0   Median : 36.00
2025-02-12 11:51:44.969 INFO  Mean   :15.4   Mean   : 42.98
2025-02-12 11:51:44.969 INFO  3rd Qu.:19.0   3rd Qu.: 56.00
2025-02-12 11:51:44.969 INFO  Max.   :25.0   Max.   :120.00
>>>

Errors from R

Errors signaled in R are raised as Python RuntimeErrors:

>>> r.Eval('this_function_does_not_exist()')
Traceback (most recent call last):
  File "<python-input-8>", line 1, in <module>
    r.Eval('this_function_does_not_exist()')
    ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jason/Development/MGET/src/GeoEco/R/_RWorkerProcess.py", line 1176, in Eval
    return(self._ProcessResponse(resp, parseReturnValue=True))
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jason/Development/MGET/src/GeoEco/R/_RWorkerProcess.py", line 927, in _ProcessResponse
    raise RuntimeError(f'From R: {respJSON["message"]}')
RuntimeError: From R: Error in this_function_does_not_exist(): could not find function "this_function_does_not_exist"

Next steps

To learn more, please review the class documentation for RWorkerProcess.