<-
to give a data (or other) object its values. ->, which
points the other way, can also be used, although the assignment is now from
left to right. A very common mistake (due
to conventions that used the =
sign for both comparison and
assignment) is to mix them up in R.
> x<-2
assigns the value of 2
to x
.
> y<-c(1,2,3,4,5)
assigns the vector of values shown to y
. Note here that you
must use the c
("combine") function. However, once you have
assigned the value of y
, you may then assign its value to other
data objects
> z<-y
The cryptically named c
will also combine character strings
> names<-c("Abe","Bob","Con")
into vectors or
The underscore character '_' also acted as an assignment oeprator until R v1.8.0. This was a real bummer if you used underscores in place of spaces in naming objects. Fortunately, it has been officially evicted from the pantheon of operators, but may still bedevil users of earlier versions.
One of the most straightforward ways to retrieve data is through plain text.
Almost all applications used for handling data will export data as a delimited
file in ASCII text, and this gives us a rough and ready way to
get the vast majority of data into R.
First, export the data, usually using a command like
Some spreadsheets export numeric fields with embedded
spaces. These usually are translated as factors, which is often not what you
want. Stripping out any embedded spaces with:
will usually fix things up. Text editors may also be used if they have a search
and replace facility, by searching for spaces and replacing them with nothing.
You may have a choice of
and want to import it.
What
Note how the assignment operator was used to assign the names to the data
frame.
Going beyond
For more information, see An Introduction to R: Reading data from files,
and the documentation from the
Getting pre-existing data into R
Manually entering data is only suitable for small data sets. How do you get
your rectangular data file or spreadsheet or data base table into
R? The foreign
package will
import many different data formats.Save As...
and selecting ASCII text, CSV
or just text
.tr -d '\40' < old.dat > new.dat
infert.dat
that looks like thiseducation,age,parity,induced,case,spontaneous,stratum,pooled.stratum
0-5yrs,26,6,1,1,2,1,3
0-5yrs,42,1,1,1,0,2,1
0-5yrs,39,6,2,1,0,3,4
...
> infert<-read.table("/home/jim/infert.dat",header=T,sep=",")
read.table
does is try to read data from the file named as the
first argument. If header
is specified as T
(True),
the first line will be read as the column names for the header
defaults to F
(False). If we had used something like TAB for a
delimiter, sep
would have been defined as a C-style
Getting it out again
The function write.table()
performs the opposite transformation,
writing out an R date frame object into a
rectangular data file. There are other output options like write()
to write out a matrix to a data file, and the functions in the
foreign()
package that let you write out data in proprietary
formats.Squeezing in big data sets
R uses a memory based model to process data.
This means that the amount of data that can be handled is critically
dependent upon how much memory is available. Earlier versions required the user
to increase the amount of memory available when starting up, but there is now a
dynamic allocation. However, if you still run out of memory while trying to
import a large data set, you scan()
to import the file will use less memory.
scan
isn't as easy to use, and you have to
enter the column names separately.> infert<-data.frame(scan("/home/jim/infert.dat",list("",0,0,0,0,0,0,0),skip=1))
Read 128 lines
> names(infert)<-c("education","age","parity","induced","case","spontaneous","stratum","pooled.stratum")
scan()
, there are methods to store your data in a
database table and access the table using the appropriate interface. This
enables the user to access huge amounts of data by only processing it in
bits.