R - general description, overview, and resources in the hydro lab

 

R is a statistical software package almost identical to S-Plus, with the notable exception that it is free and open source. It is installed on the hydro system, and can be run either interactively (by typing "R" without the quotes) or using batch files (creating a script in an editor like emacs, and running is by typing "R CMD BATCH <r-script-name>" at the command line. This makes the powerful statistical functions of R available to any script. You can quit R by typing "q()"

 

A straightforward way to develop a script is to use R interactively, writing the script in emacs, and running it in R by typing (at the R ">" prompt): source("<script name>") with the quotes. Typing the up arrow scrolls back through your command history, so it is easy to repeat runs.

 

R scripts allow making a file name (string manipulation).

R works easily with arrays, and can read in an ascii table with something like:

dam <- c("ftpk", "garr", "oahe")

for(k in 1:3) {inf <- paste (dam[k],"_local.txt",sep="")

local.df <- read.table (inf, header=F) }

 

If values will be filled into arrays they need to be initialized:

qavg <- rep(0,12) #for a vector

cpgrid <- array(0,dim=c(12,12) #for an array

 

Statistical summaries of arrays and vectors can be obtained quickly. This line obtains a set of 12 monthly averages from a long time series:

for (kk in 1:12) qavg[kk] <- mean(local.df[seq(kk,length(local.df[,1]),12),3])

 

Many types of plots can be generated (line plots, histograms, color spatial plots). An example of a partial autocorrelation function postscript plot:

postscript(file="plot.ps", horizontal=F,width=7, height=5)

acf(data, type="partial", main="title", xlim=c(0,12), ylim=c(-0.1,0.8), xlab="Lag, months")

graphics.off()

 

There are extensive packages for time series analysis, geostatistics,

multivariate statistics, and many, many other applications. These advanced functions are often included in the R distribution (and so are installed on the hydro system), but they are often not started up with the R command. You must type, for example, "library(ts)" to include the time series library functions at the beginning of your session. This can also be added to the top of a script using these functions, or it can also be added to a .environment file in your RHOME (check the documentation for the actual name and file location) directory to automatically load the desired libraries with the R command.

 

As can happen with popular open source code, lots of documentation is generatedby people and donated to the web site. For beginners there are documents like:

"Simple R"

"R for Beginners"

or, if you prefer French, "R pour les débutants"

These and many others are available at:

http://lib.stat.cmu.edu/R/CRAN/

There is also a document bridging the gap between people who have used Matlab

(or it's open source equivalent, Octave) and R:

"R and Octave"

 

On Dennis' shelf there is a thick book entitled "Environmental Statistics with S-Plus" that has many examples of how R can be used (all of the scripts should work, without modification, in R) for different analyses.

 

Google searches are invaluable -- since it is free, R is used by many people and institutions, and there are many university classes using the software.  Just searching under "R statistical software" (or "S-Plus") and the command or method you are trying to use usually brings back something useful.

 

A concise summary is provided with the one-page reference card also available on the CRAN web site.