R - general description, overview, and resources in the hydro lab
R
is a statistical software package almost identical to S-Plus, with the notable
exception that it is free and open source. It is installed on the hydro system,
and can be run either interactively (by typing "R" without the
quotes) or using batch files (creating a script in an editor like emacs, and
running is by typing "R CMD BATCH <r-script-name>" at the
command line. This makes the powerful statistical functions of R available to
any script. You can quit R by typing "q()"
A
straightforward way to develop a script is to use R interactively, writing the
script in emacs, and running it in R by typing (at the R ">"
prompt): source("<script name>") with the quotes. Typing the up
arrow scrolls back through your command history, so it is easy to repeat runs.
R
scripts allow making a file name (string manipulation).
R
works easily with arrays, and can read in an ascii table with something like:
dam
<- c("ftpk", "garr", "oahe")
for(k
in 1:3) {inf <- paste (dam[k],"_local.txt",sep="")
local.df
<- read.table (inf, header=F) }
If
values will be filled into arrays they need to be initialized:
qavg
<- rep(0,12) #for a vector
cpgrid
<- array(0,dim=c(12,12) #for an array
Statistical
summaries of arrays and vectors can be obtained quickly. This line obtains a
set of 12 monthly averages from a long time series:
for
(kk in 1:12) qavg[kk] <- mean(local.df[seq(kk,length(local.df[,1]),12),3])
Many
types of plots can be generated (line plots, histograms, color spatial plots).
An example of a partial autocorrelation function postscript plot:
postscript(file="plot.ps",
horizontal=F,width=7, height=5)
acf(data,
type="partial", main="title", xlim=c(0,12),
ylim=c(-0.1,0.8), xlab="Lag, months")
graphics.off()
There
are extensive packages for time series analysis, geostatistics,
multivariate statistics, and many, many other
applications. These advanced functions are often included in the R distribution
(and so are installed on the hydro system), but they are often not started up
with the R command. You must type, for example, "library(ts)" to include
the time series library functions at the beginning of your session. This can
also be added to the top of a script using these functions, or it can also be
added to a .environment file in your RHOME (check the documentation for the
actual name and file location) directory to automatically load the desired
libraries with the R command.
As
can happen with popular open source code, lots of documentation is generatedby
people and donated to the web site. For beginners there are documents like:
"Simple
R"
"R
for Beginners"
or,
if you prefer French, "R pour les débutants"
These
and many others are available at:
http://lib.stat.cmu.edu/R/CRAN/
There
is also a document bridging the gap between people who have used Matlab
(or
it's open source equivalent, Octave) and R:
"R
and Octave"
On
Dennis' shelf there is a thick book entitled "Environmental Statistics
with S-Plus" that has many examples of how R can be used (all of the
scripts should work, without modification, in R) for different analyses.
Google
searches are invaluable -- since it is free, R is used by many people and
institutions, and there are many university classes using the software. Just searching under "R statistical
software" (or "S-Plus") and the command or method you are trying
to use usually brings back something useful.
A
concise summary is provided with the one-page reference card also available on
the CRAN web site.