Repetitive Execution: "for" loops, "repeat" and "while"
There is a for loop construction in R which has form
> for (name in expr_1) expr_2
where name is the loop variable. expr_1 is a vector expression, (often a sequence like 1:20
), and expr_2 is often a grouped expression with its sub-expressions written in terms of the dummy name. expr_2 is repeatedly evaluated as name ranges through the values in the vector result of expr_1.
Suppose, based on the ozone measurements in the airquality data set, we want to figure out which days were good air quality days (1) or bad air quality (0), based on a cutoff of ozone levels above 60ppb. Let us create a new vector called "goodair", which stores the information on good and bad air-quality days. We can do this using a for
loop.
> numdays = nrow(airquality)
> numdays
[1] 153
> goodair = numeric(numdays)
# creates an object which will store the vector
> for(i in 1:numdays)
if (airquality$Ozone[i] > 60) goodair[i] = 0 else goodair[i] = 1
## (Notice that we have an if statement here within a for loop.)
Does the command above work? Why/why not?
Let's check the Ozone variable. What do you notice below?
> airquality$Ozone
[1] 41 36 12 18 NA 28 23 19 8 NA 7 16 11 14 18 14 34 6
[19] 30 11 1 11 4 32 NA NA NA 23 45 115 37 NA NA NA NA NA
[37] NA 29 NA 71 39 NA NA 23 NA NA 21 37 20 12 13 NA NA NA
[55] NA NA NA NA NA NA NA 135 49 32 NA 64 40 77 97 97 85 NA
[73] 10 27 NA 7 48 35 61 79 63 16 NA NA 80 108 20 52 82 50
[91] 64 59 39 9 16 78 35 66 122 89 110 NA NA 44 28 65 NA 22
[109] 59 23 31 44 21 9 NA 45 168 73 NA 76 118 84 85 96 78 73
[127] 91 47 32 20 23 21 24 44 21 28 9 13 46 18 13 24 16 13
[145] 23 36 7 14 30 NA 14 18 20
When there are missing values, many operations in R fail. One way to get around this is to create a new data frame that deletes all the rows corresponding to observations with missing rows. This can be done by means of the command "na.omit"
> airqualfull = na.omit(airquality)
> dim(airqualfull)
[1] 111 6
> dim(airquality)
[1] 153 6
# How many cases were deleted because of missing data?
Sometimes deleting all cases with missing values is useful, and sometimes it is a horrible idea…
We could get around this without deleting missing cases with an ifelse statement within the for loop.
Now let's try doing this again with the data with the complete cases.
> numdays = nrow(airqualfull)
> numdays
[1] 111
> goodair = numeric(numdays) # initialize the vector
> for(i in 1:numdays)
if (airqualfull$Ozone[i] >60) goodair[i] = 0 else goodair[i] = 1
> goodair
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 1 0
[38] 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 1
[75] 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
At this point we might be interested in which days were the ones with good air quality. The "which" command returns a set of indices corresponding to the condition specified. We can then use the indices to find the day of the month this corresponds to.
> which(goodair == 1) ## notice the double "=" signs!
> goodindices <- which(goodair == 1)
> airqualfull[goodindices,]
Suppose we want to define a day with good quality air as one with ozone levels below 60ppb, and temperatures less than 80 degrees F. Write an R loop to do this, and output the resulting subset of the data to a file called goodquality.txt. (Hint: use an ifelse() statement inside the for loop.) |