.. _arcgis-invoking-r: Invoking R from ArcGIS ====================== Statistical analysis is often best conducted from `R `__, be it within an integrated development environment like `RStudio `__ or simply from the command line. But from time to time, it is convenient to utilize R from within an ArcGIS geoprocessing workflow. In this example, we'll cover the basics of how this can be done with MGET's **Evaluate R Expression** tool. We also have an example showing how use MGET to :ref:`invoke R from Python `. To run this example, you must have R installed. We recommend a relatively recent version, but any version from the past few years should work. Create a project and add MGET ----------------------------- 1. Start ArcGIS Pro and create a new project. 2. Click **Project** and go to the **Package Manager**. Make sure the **Active Environment** is set to the one that has MGET installed into it. Note that if you change your active environment, you have to restart ArcGIS Pro for it to take effect. For more on activating environments, `click here `_. 3. :doc:`Add the MGET toolbox ` to the project's list of toolboxes, using the environment you activated above. Evaluating R expressions ------------------------ To give the tool a try without setting up a geoprocessing model: 1. In the geoprocessing pane's **Find Tools** box, search for the tool named **Evaluate R Expressions** and open it. 2. In the **R expressions** box, enter ``x <- 6`` and press Enter. Then enter a second expression ``y <- 7`` and a third ``print(x*y)``. 3. The tool should look similar to this: .. image:: images/ArcInvokingR1.png :align: center :width: 40% Click **Run**. 4. Click **View Details**. You can click it while the tool is running or after it completes: .. image:: images/ArcInvokingR2.png :align: center :width: 40% The first time you run the tool, it has to install some R packages to allow MGET to communicate with R. (To learn more about how that works, review the documention MGET's :class:`~GeoEco.R.RWorkerProcess` Python class.) This installation will take a few 10s of seconds and the Details window will contain many messages logging which packages were installed: .. image:: images/ArcInvokingR3.png :align: center 5. If you scroll down, you'll see the output from the R print function: ``[1] 42``. If you click **Run** and **View Details** again, it will run much faster and none of the package installation messages will be there: .. image:: images/ArcInvokingR4.png :align: center .. Important:: MGET executes R expressions using the Rscript program, which does not have a graphical user interface. Because of this, R functions like ``plot()`` will not cause anything to appear on the screen. To see plots, you must save them to a file. The next example illustrates one way to do it. Running an R script ------------------- It can be tedious to type many expressions into the geoprocessing tool's dialog boxes. Also, it is sometimes convenient for expressions to span multiple lines, but this can't be done directly in the tool because each text box must contain a complete R expression—it can't contain just part of one. Because of these limitations, once you need to do more than execute a few trivial expressions, you'll want to write a script instead with your favorite text editor, and then use the tool to execute the script. To illustrate this, I wrote a short script to read a feature class and make a plot, then used the tool to execute the script. Here I had a point feature class called ``Survey_608`` in a file geodatabase. The points represent the midpoints of segments of an aerial line-transect survey conducted off the southeast United States in spring 2012 by the NOAA Southeast Fisheries Science Center. .. image:: images/ArcInvokingR5.png :align: center :width: 80% | If you're interested, you can download the original survey data from `OBIS-SEAMAP `__. For this example, I already prepared the original data for analysis by splitting the transects into 5 km segments, excluding segments that had poor survey conditions, computing the centroids of the segments, and sampling a selection of oceanographic data. Here's part of the attribute table: .. image:: images/ArcInvokingR6.png :align: center :width: 80% | I wanted to create a four-panel plot showing the distributions of four oceanographic variables: **Depth**, **DistToShore**, **SST_HYCOM** and **SSS_HYCOM**. I wrote the following script to read the feature class with the `terra `__ package and create a 4-panel histogram with the `ggplot2 `__ package: .. code-block:: R # Load the packages we need. library(dplyr) library(ggplot2) library(terra) library(tidyr) # Load the feature class and print a summary. gdbPath <- "C:/Users/jjr8/Documents/ArcGIS/Projects/MGET_R_Example/MGET_R_Example.gdb" fcName <- "Survey_608" points <- vect(gdbPath, fcName) print(summary(points)) # Convert the SpatVector object to a data frame so tidyverse functions can # work with it, select the columns of interest, drop rows where any variable # is NA, and pivot the columns of interest to rows, to make ggploting # easier. variables <- c("Depth", "DistToShore", "SST_HYCOM", "SSS_HYCOM") df <- points |> as.data.frame() |> select(all_of(variables)) |> na.omit() |> pivot_longer(cols=everything(), names_to="Variable", values_to="Value") # Write a 4-panel plot to a PNG file named after the feature class. p <- ggplot(df, aes(x=Value)) + geom_histogram(bins=30) + facet_wrap(~Variable, scales="free") + labs(title=paste0("Distributions of Variables in ", fcName), x="Value", y="Count") pngPath <- file.path(dirname(gdbPath), paste0(fcName, '.png')) ggsave(pngPath, plot=p, width=8, height=6, dpi=96) Then, to run it, I used the R ``source()`` function to read and execute the script. I also entered the four packages I needed into the list of **Required R packages**: .. image:: images/ArcInvokingR7.png :align: center :width: 70% .. Important:: Be sure to use ``local=TRUE`` as a parameter to ``source()``. The ``local`` parameter controls whether or not the script is "sourced" into the R environment that invoked ``source()`` (``local=TRUE``) or whether it is sourced into the global environment (``local=FALSE``, the default). MGET executes your R expressions in an isolated environment, rather than the global environment, to try to prevent your code from accidentally breaking MGET's R code that manages the communication with Python. Unfortunately, the ``source()`` function operates against the global environment by default. By specifying ``local=TRUE``, you ensure your code operates against the isolated environment that MGET created for you. If you neglect to do this here, chances are you will be fine. But in the next example, we show how to pass in the outputs of geoprocessing tools as R variables. MGET always defines these in the isolated environment. If you then "source" your script into the global environment, it will not be able to access the variables MGET defines for you. .. Tip:: Include ``echo=TRUE`` as a parameter to ``source()`` and your script's expressions will be logged as they are executed. This let's you track progress and quickly identify which line of code failed when R reports an error. Here's the output: .. image:: images/ArcInvokingR8.png :align: center You may have noticed that the Geoprocessing dialog box said "Evaluate R Expressions completed with warnings." You can see them when you click **View Details**: .. image:: images/ArcInvokingR9.png :align: center These warnings were all output by the ``library()`` functions that loaded the packages. We can safely ignore these particular messages, which just advise us of package versions and that one package has functions that have the same names as a previously loaded package so the previously loaded functions are "masked". .. Tip:: While package loading messages can be useful, they are also regarded as a regular irritation, to the point that R has a special ``suppressPackageStartupMessages()`` function for suppressing them. To use it, we can just change this: .. code-block:: R library(dplyr) library(ggplot2) library(terra) library(tidyr) to this: .. code-block:: R suppressPackageStartupMessages({ library(dplyr) library(ggplot2) library(terra) library(tidyr) }) and all of those warnings will go away, and the tool will complete with a green check-mark rather than a yellow warning triangle. Passing geoprocessing outputs into the script --------------------------------------------- To integrate an R script or expressions into a geoprocessing workflow, it can be useful to pass outputs from previous geoprocessing steps into the **Evaluate R Expressions** tool. To do this, connect the outputs of interest to the **Variable values** parameter. Then open the tool, go into the **R variables to define** and enter corresponding names in the **Variable names** parameter. You must put a name there for each entry that appears in **Variable values**. For example, let's say that after I developed the script above, I wanted to run it on several other feature classes in my geodatabase that had the same columns, in addition to the original one. I decided to use the ArcGIS **Iterate Feature Classes** iterator like this: .. image:: images/ArcInvokingR10.png :align: center Then, in **Evaluate R Expressions**, I typed in ``fcPath`` for the variable name: .. image:: images/ArcInvokingR11.png :align: center :width: 35% and edited the script to extract the ``gdbPath`` and ``fcName`` from the ``fcPath``, which is the full path to the feature class, computed by **Iterate Feature Classes**: .. code-block:: R # Load the feature class and print a summary. gdbPath <- dirname(fcPath) fcName <- basename(fcPath) I then ran the workflow. I had three feature classes in my geodatabase; here are the plots generated for each of them, in case you are interested: .. image:: images/ArcInvokingR12.png :align: center Returning a value ----------------- If you check the **Return result** box, **Evaluate R Expressions** will return the value of the last R expression to be evaluated, which you can then use in further geoprocessing steps. For example, I wrote the following script to calculate the mean of a specified column (``columnName``) of a specified feature class (``fcPath``): .. code-block:: R # Load the packages we need. suppressPackageStartupMessages({ library(terra) }) # Load the feature class. gdbPath <- dirname(fcPath) fcName <- basename(fcPath) points <- vect(gdbPath, fcName) # Calculate the mean of the requested column. Note that the Evaluate R # Expressions tool returns the value of the last expression that was # evaluated. In our case, it will be the mean() function (below). mean(points[[columnName]][[1]], na.rm=TRUE) Here's a model showing this script in action: .. image:: images/ArcInvokingR13.png :align: center For **R expressions**, I provided a ``source()`` call to run the script. I then checked the **Return result** box, highlighted in this screenshot with the red arrow. For **Variable names** I provided the two variables we want to pass in from our model, ``fcName`` and ``columnName``. Then, for **Variable values** I provided the feature class and the column. (These could have come as outputs from prior geoprocessing tools, but I'm not illustrating that here.) After running the tool, I opened **Last expression result** and it was set to the floating point value of the mean (at full precision). This could then be used as input to another tool. As outputs, the tool can successfully return most basic data types as atomic values, including ``logical``, ``integer``, ``double``, and ``character``. Vectors of length 2 or more and unnamed R lists will be returned as Python lists, while named R lists will be returned as Python dictionaries. For more details of data type conversions, please see the :class:`~GeoEco.R.RWorkerProcess` documentation. That said, before returning complex data types, you should check whether the geoprocessing tool that will consume those outputs will accept the data types you intend on returning.