Overview
I’ve used MATLAB almost exclusively - both at work and at home - but we’ve migrated most of our analysis pipelines to Python at work, so I’ve largely done the same at home. The transition was relatively painless, thanks largely to the numpy and pandas libraries.
rpy2
I stumbled upon rpy2, which is a Python library that allows users to execute R code and access R functions directly from Python. It does this by running an embedded R process within Python, and providing a set of classes for passing data back and forth between the two.
Through rpy2, R is accesible through two interfaces: a high-level interface, with convenient classes for mapping data into the R space, and a low-level interface, with more generalized classes that are less convenient but measurably faster.
The high-level-interface is the easiest way to use rpy2. The high-level
interface is instantiated by importing the rpy2.robjects
module
import rpy2.robjects as ro
The robjects
module provides wrappers for objects in the R space. By objects,
I mean variables (like R lists and data.frames), functions (like t.test
) and
other R objects.
Usage
Using rpy2 with Python is usually a three-step process:
- Pass data from Python into R
- Manipulate data in R
- Pass data from R back into Python
This is best illustrated by examples.
Basic Example
Here’s an example performing calling t.test
on some random vectors.
import rpy2.robjects as ro
import numpy as np
x = np.random.normal(size=10)
y = np.random.normal(size=10)
# 1. Pass data from Python into R
xr = ro.vectors.FloatVector(x)
yr = ro.vectors.FloatVector(y)
# 2. Call t.test on data in R
ttest = ro.r['t.test']
res = ttest(xr, yr, paired=True)
print(res)
# 3. Pass data from R back into Python
pval = res.rx2('p.value')[0]
Here I use the FloatVector
constructor to pass the data into R. rpy2 has wrappers for
all types of R data, and their names are self-explanatory (e.g. StrVector
,
FloatVector
, ListVector
, DataFrame
, etc.).
The function ro.r
allows you to execute raw R code. Here I use it to create a
Python variable linked to the t.test
function, but you can also use it to execute
as many lines of R code as you want (though I wouldn’t recommend doing so).
When the above code block is run, the console should print the value of the
variable res
:
R object with classes: ('htest',) mapped to:
<ListVector - Python:0x7f1c01d94688 / R:0x55c5630e8f18>
[FloatVector, FloatVector, FloatVector, FloatVector, ..., FloatVector, StrVector, StrVector, StrVector]
statistic: <class 'rpy2.robjects.vectors.FloatVector'>
R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f1ba4863c48 / R:0x55c5640f7590>
[0.308857]
parameter: <class 'rpy2.robjects.vectors.FloatVector'>
R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f1ba4863e88 / R:0x55c5640f7050>
[9.000000]
p.value: <class 'rpy2.robjects.vectors.FloatVector'>
R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f1ba48634c8 / R:0x55c5640f7280>
[0.764460]
conf.int: <class 'rpy2.robjects.vectors.FloatVector'>
R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f1ba4863888 / R:0x55c561e7ac28>
[-0.996924, 1.312192]
estimate: <class 'rpy2.robjects.vectors.FloatVector'>
R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f1ba4863048 / R:0x55c5640f74e8>
[0.157634]
null.value: <class 'rpy2.robjects.vectors.FloatVector'>
R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7f1ba4804f88 / R:0x55c5640f7018>
[0.000000]
alternative: <class 'rpy2.robjects.vectors.StrVector'>
R object with classes: ('character',) mapped to:
<StrVector - Python:0x7f1ba49c9508 / R:0x55c5640ccd08>
['two.sided']
method: <class 'rpy2.robjects.vectors.StrVector'>
R object with classes: ('character',) mapped to:
<StrVector - Python:0x7f1ba49c9ac8 / R:0x55c5640cd8d8>
['Paired t-test']
data.name: <class 'rpy2.robjects.vectors.StrVector'>
R object with classes: ('character',) mapped to:
<StrVector - Python:0x7f1ba4863848 / R:0x55c5640e1eb8>
['c(-1.86332790..., '-0.1250831111..., '-0.3137351152...]
Not as clean as output you’d see in R, but it contains the same information.
From the first two lines, we can see that the variable is an htest
object
mapped to a ListVector (e.g. R list
). The remaining lines describe each
element of the list, and its name. For instance, we see the statistic
element
is numeric and, as such, is mapped to a FloatVector.
We can access elements of these lists in Python using the rx2
method,
which functions as the [[
operator in R.
print(res.rx2('p.value')[0])
# Output: 0.764460
The addition of the [0]
at the end is to grab the value itself. In R, everything
is a vector (including single numbers or strings) so res.rx2('p.value')
would
just return another FloatVector (since res$p.value
in R would return a vector
with a single float
.
If you want to convert everything back into pure Python, you could convert the
named list into a dict
like so.
res = dict(zip(res.names, res))
print(res['p-value'][0])
# Output: 0.764460
Note that the values of the dict
are still rpy2 vectors, so we include the
[0]
again to access the number itself.