Repetitive Execution: "for" loops, "repeat" and "while"


There is a for loop construction in R which has form

     > for (name in expr_1) expr_2

where name is the loop variable. expr_1 is a vector expression, (often a sequence like 1:20), and expr_2 is often a grouped expression with its sub-expressions written in terms of the dummy name. expr_2 is repeatedly evaluated as name ranges through the values in the vector result of expr_1.

Suppose, based on the ozone measurements in the airquality data set, we want to figure out which days were good air quality days (1) or bad air quality (0), based on a cutoff of ozone levels above 60ppb. Let us create a new vector called "goodair", which stores the information on good and bad air-quality days. We can do this using a for loop.

> numdays = nrow(airquality)

> numdays

[1] 153

 

> goodair = numeric(numdays)      

# creates an object which will store the vector

 

> for(i in 1:numdays)

    if (airquality$Ozone[i] > 60) goodair[i] = 0 else goodair[i] = 1

 

## (Notice that we have an if statement here within a for loop.)

 

Does the command above work? Why/why not?

 

Let's check the Ozone variable. What do you notice below?

 

>  airquality$Ozone

 [1]  41  36  12  18  NA  28  23  19   8  NA   7  16  11  14  18  14  34   6

 [19]  30  11   1  11   4  32  NA  NA  NA  23  45 115  37  NA  NA  NA  NA  NA

 [37]  NA  29  NA  71  39  NA  NA  23  NA  NA  21  37  20  12  13  NA  NA  NA

 [55]  NA  NA  NA  NA  NA  NA  NA 135  49  32  NA  64  40  77  97  97  85  NA

 [73]  10  27  NA   7  48  35  61  79  63  16  NA  NA  80 108  20  52  82  50

 [91]  64  59  39   9  16  78  35  66 122  89 110  NA  NA  44  28  65  NA  22

[109]  59  23  31  44  21   9  NA  45 168  73  NA  76 118  84  85  96  78  73

[127]  91  47  32  20  23  21  24  44  21  28   9  13  46  18  13  24  16  13

[145]  23  36   7  14  30  NA  14  18  20

 

When there are missing values, many operations in R fail. One way to get around this is to create a new data frame that deletes all the rows corresponding to observations with missing rows. This can be done by means of the command "na.omit"

> airqualfull = na.omit(airquality)

> dim(airqualfull)

[1] 111   6

> dim(airquality)  

[1] 153   6             

# How many cases were deleted because of missing data?

 

Sometimes deleting all cases with missing values is useful, and sometimes it is a horrible idea…

We could get around this without deleting missing cases with an ifelse statement within the for loop.

 

Now let's try doing this again with the data with the complete cases.

> numdays = nrow(airqualfull)

 

> numdays

[1] 111

 

> goodair = numeric(numdays)       # initialize the vector

    

> for(i in 1:numdays)

    if (airqualfull$Ozone[i] >60) goodair[i] = 0 else goodair[i] = 1

 

> goodair

  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 1 0

 [38] 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 1

 [75] 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

 

At this point we might be interested in which days were the ones with good air quality. The "which" command returns a set of indices corresponding to the condition specified. We can then use the indices to find the day of the month this corresponds to.

> which(goodair == 1)    ## notice the double "=" signs!

> goodindices <-  which(goodair == 1)

> airqualfull[goodindices,] 

Suppose we want to define a day with good quality air as one with ozone levels below 60ppb, and temperatures less than 80 degrees F. Write an R loop to do this, and output the resulting subset of the data to a file called goodquality.txt. (Hint: use an ifelse() statement inside the for loop.)