Overview

I’ve used MATLAB almost exclusively - both at work and at home - but we’ve migrated most of our analysis pipelines to Python at work, so I’ve largely done the same at home. The transition was relatively painless, thanks largely to the numpy and pandas libraries.

rpy2

I stumbled upon rpy2, which is a Python library that allows users to execute R code and access R functions directly from Python. It does this by running an embedded R process within Python, and providing a set of classes for passing data back and forth between the two.

Through rpy2, R is accesible through two interfaces: a high-level interface, with convenient classes for mapping data into the R space, and a low-level interface, with more generalized classes that are less convenient but measurably faster.

The high-level-interface is the easiest way to use rpy2. The high-level interface is instantiated by importing the rpy2.robjects module

import rpy2.robjects as ro

The robjects module provides wrappers for objects in the R space. By objects, I mean variables (like R lists and data.frames), functions (like t.test) and other R objects.

Usage

Using rpy2 with Python is usually a three-step process:

  1. Pass data from Python into R
  2. Manipulate data in R
  3. Pass data from R back into Python

This is best illustrated by examples.

Basic Example

Here’s an example performing calling t.test on some random vectors.

import rpy2.robjects as ro
import numpy as np

x = np.random.normal(size=10)
y = np.random.normal(size=10)

# 1. Pass data from Python into R
xr = ro.vectors.FloatVector(x)
yr = ro.vectors.FloatVector(y)

# 2. Call t.test on data in R
ttest = ro.r['t.test']
res = ttest(xr, yr, paired=True)
print(res)

# 3. Pass data from R back into Python
pval = res.rx2('p.value')[0]

Here I use the FloatVector constructor to pass the data into R. rpy2 has wrappers for all types of R data, and their names are self-explanatory (e.g. StrVector, FloatVector, ListVector, DataFrame, etc.). The function ro.r allows you to execute raw R code. Here I use it to create a Python variable linked to the t.test function, but you can also use it to execute as many lines of R code as you want (though I wouldn’t recommend doing so).

When the above code block is run, the console should print the value of the variable res:

R object with classes: ('htest',) mapped to:
<ListVector - Python:0x7f1c01d94688 / R:0x55c5630e8f18>
[FloatVector, FloatVector, FloatVector, FloatVector, ..., FloatVector, StrVector, StrVector, StrVector]
  statistic: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f1ba4863c48 / R:0x55c5640f7590>
[0.308857]
  parameter: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f1ba4863e88 / R:0x55c5640f7050>
[9.000000]
  p.value: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f1ba48634c8 / R:0x55c5640f7280>
[0.764460]
  conf.int: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f1ba4863888 / R:0x55c561e7ac28>
[-0.996924, 1.312192]
  estimate: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f1ba4863048 / R:0x55c5640f74e8>
[0.157634]
  null.value: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f1ba4804f88 / R:0x55c5640f7018>
[0.000000]
  alternative: <class 'rpy2.robjects.vectors.StrVector'>
  R object with classes: ('character',) mapped to:
<StrVector - Python:0x7f1ba49c9508 / R:0x55c5640ccd08>
['two.sided']
  method: <class 'rpy2.robjects.vectors.StrVector'>
  R object with classes: ('character',) mapped to:
<StrVector - Python:0x7f1ba49c9ac8 / R:0x55c5640cd8d8>
['Paired t-test']
  data.name: <class 'rpy2.robjects.vectors.StrVector'>
  R object with classes: ('character',) mapped to:
<StrVector - Python:0x7f1ba4863848 / R:0x55c5640e1eb8>
['c(-1.86332790..., '-0.1250831111..., '-0.3137351152...]

Not as clean as output you’d see in R, but it contains the same information. From the first two lines, we can see that the variable is an htest object mapped to a ListVector (e.g. R list). The remaining lines describe each element of the list, and its name. For instance, we see the statistic element is numeric and, as such, is mapped to a FloatVector.

We can access elements of these lists in Python using the rx2 method, which functions as the [[ operator in R.

print(res.rx2('p.value')[0])
# Output: 0.764460

The addition of the [0] at the end is to grab the value itself. In R, everything is a vector (including single numbers or strings) so res.rx2('p.value') would just return another FloatVector (since res$p.value in R would return a vector with a single float.

If you want to convert everything back into pure Python, you could convert the named list into a dict like so.

res = dict(zip(res.names, res))
print(res['p-value'][0])
# Output: 0.764460

Note that the values of the dict are still rpy2 vectors, so we include the [0] again to access the number itself.