.. _r: ################################################################################ R ################################################################################ .. image:: images/RLogo.png :align: center R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. ........................................... Sample R Job ------------------------------------------------------------------------------- .. code-block:: shell #!/bin/bash #$ -M afs_id@nd.edu #$ -m abe module load R R CMD BATCH your_input_R_file.r your_output_R_file.out ........................................... Installing Local Packages ------------------------------------------------------------------------------- Due to the wide range of packages available for **R**, we are unable to install every one. Fortunately, it is easy for users to install additional libraries. To begin with, load the **R** module. If the default version is sufficient, this can be done with the command: .. code-block:: shell module load R For convenience, although not necessary, we suggest making a central directory to hold this and any future packages: .. code-block:: shell mkdir ~/myRlibs Next, there are two different ways of installing R packages: ........................................... Installing packages within R ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Open an R shell and execute the following command: .. code-block:: shell install.packages("package_name", lib="install_location", repos="mirror_location") library('package_name', lib.loc='install_location') For our example, this would be: .. code-block:: shell install.packages("bizdays", lib="~/myRlibs",repos='https://cran.us.r-project.org') library('bizdays', lib='~/myRlibs') To avoid having to specify the installation location every time you use this library, you can create an ``.Renviron`` file in your home directory using any text editor. Then, add the following line to it: .. code-block:: shell R_LIBS=install_location For our example, this would be: .. code-block:: shell R_LIBS=~/myRlibs Now, we can simply do: .. code-block:: shell install.packages("bizdays") library(bizdays) ........................................... Installing R packages from source code ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You will need to obtain the source code for the package you want to install. The most common repository of these are at `The Comprehensive R Archive Network (CRAN) `_. A simple method to get the package to the CRC is to copy the location of the file, usually through a right click sub-menu, and then use the ''wget'' command: .. code-block:: shell wget https://cran.r-project.org/src/contrib/bizdays_1.0.1.tar.gz Once we have the package, we will need to decide where to install it. Now, issue the following command to install the package: .. code-block:: shell R CMD INSTALL -l install_location package_name For our example, this would be: .. code-block:: shell R CMD INSTALL -l ~/myRlibs bizdays_1.0.1.tar.gz The last step is to tell **R** the location of our new installation. In a CSH environment, this is: .. code-block:: shell setenv R_LIBS install_location If you are using BASH, it would be: .. code-block:: shell export R_LIBS=install_location Add this command to your .cshrc or .bashrc file, respectively, to permanently set it. ........................................... Profiling R Code ------------------------------------------------------------------------------- Profiling R code can help determine which sections in the R code need to be optimized for better performance. In order to profile the R code, one needs to use the **Rprof()** function. **Rprof()** records how many seconds have been spent on each function of the R code. The functions that get timed are the ones that get executed after the **Rprof()** function gets declared. Any function before the **Rprof()** declaration will not be timed. One needs to pass a parameter to Rprof. The parameter is the name of the file that will contain the results. If only a section of the R code needs to be profiled, one can use the **Rprof()** to specify when to start profiling the functions and when to stop profiling the functions. To start profiling the functions, one should place **Rprof("file_name")** before the functions that need to be profiled get executed. In order to stop profiling the rest of the R Code, one needs to place **Rprof(NULL)** to stop profiling the rest of the R Code that does not need to be profiled. The following is an example on how **RProf()** is used in an actual R script. .. code-block:: shell # load sources dyn.load("readbfile3_crc.so") source("readbfile.r") source("snpsel24_data.r") Rprof("test1b.out") #Begin profiling functions # try to read in data dat.M <- read.bfile("hapmap_sim_chr1_test.bed") # try to run snpsel selmat.M <- snp_sel(dat.M,k=300,b=10,t=.1) Rprof(NULL) #Stop profiling functions # write selmat for reference write.table(selmat.M,file="test_selmat_v1.txt",quote=F,sep=" ",col.names=F,row.names=F) In the example above, the functions read.bfile() and snp_sel() as well a the functions within these functions will be profiled. The function write.table() will not be profiled by **Rprof()**. ........................................... Parallel Computing in R ------------------------------------------------------------------------------- R itself does not provide parallel execution. Therefore, in order to realize parallel computing in R, an appropriate parallel R package should be invoked. Test for Rmpi ------------------------------------------------------------------------------- Here is an Rmpi test file: .. code-block:: shell # Load the Rmpi pacakge: library(Rmpi) # Spawn N-1 workers ==> Don't need this on UGE so that commented out, by ISS on 04012019 # mpi.spawn.Rslaves(nslaves=mpi.universe.size()-1) # The command we want to run on all the nodes/processors we have mpi.remote.exec(paste("I am ", mpi.comm.rank(), " of ", mpi.comm.size(), " on ", Sys.info() [c("nodename")])) # Tell all slaves to close down, and exit the program mpi.close.Rslaves() mpi.quit() and save this with "Rmpi-test-on-CRC.R". A job script file for this Rmpi parallel test on the CRC Grid Engine: .. code-block:: shell #!/bin/tcsh # #$ -M Your_NetID@nd.edu #$ -m abe # #$ -pe mpi-24 48 # # Specify a queue name, for example, #$ -q debug # module load R/4.2.0 mpirun -np ${NSLOTS} Rscript Rmpi-test-on-CRC.R > Rmpi-test-on-CRC.out The R/4.2.0 version in the CRC R modules supports "foreach", "parallel", "doParallel", "snow", "snowfall",... parallel packages as a default. For a single node SMP parallel, you can easily download/install on your own space. For example, you can invoke the library with: .. code-block:: shell >library(parallel) in your R script and then can specify a number of core you want. Typically, we set the ``cores`` variable to the same value as that we set in our ``#$ -pe smp`` flag via an environment variable, ``NSLOTS``. For example, .. code-block:: shell >options(cores = Sys.getenv("NSLOTS")) >getOption('cores') Here is a typical example to compare single-core and multi-core parallel computing in R: .. code-block:: shell module load R R > library(parallel) > detectCores() [1] 24 > options(cores = 24) > getOption('cores') [1] 24 > test <- lapply(1:10,function(x) rnorm(100000)) > system.time(x <- lapply(test,function(x) loess.smooth(x,x))) <<<== single-core running > system.time(x <- mclapply(test,function(x) loess.smooth(x,x))) <<<== multi-core (24-core) running ........................................... Related Software ------------------------------------------------------------------------------- For a GUI IDE for R, see :ref:`rstudio`. ........................................... Further Information ------------------------------------------------------------------------------- See the official website: `R `__