This tutorial looks at some of the common control structures in R. Control Structures define the flow of the program and decides which path the program takes. This decision is based on the evaluation of a variable. These are the basic types of control structures:
Introduction
- if a variable satisfies a particular condition follow path A , otherwise follow path B.(
if, if-else, ifelse
) - Continue evaluation a body of expressions till a specified condition is met. This is a looping construct and there are multiple ways to specify termination condition.(
for, repeat, while
and relatednext, break.
). - Select a block of code for evaluation depending on the value of a variable (
switch
) - R has overloaded certain operators so that they internally work on a looped structure. For example if you use a ‘+’ to add to arrays then R adds individual elements of the arrays.
- R has certain convenience or helper functions to ‘apply’ a function to multiple elements of a list (
apply, lapply, sapply, vapply, replicate, tapply, by
). These will be covered in the next tutorial.
if, if-else, ifelse
> b=2 > c=3 > d=4 # if b is 2 set d=b+c > if(b==2){d=b+c} > d [1] 5 # if b is 3 set d=b+c else set d=c > if(b==3){d=b+c}else{d=c} > d [1] 3
There is an easier way to do an if else
> d=ifelse(b==3,b+c,c) > d [1] 3
The ifelse function is read like this: evaluate the expression specified in the first argument, if it is true return the result of the second argument, else return the result of the third argument.
for and while
The for construct can be used to loop over values of a variable.
> d=c(1,2,3) > for(v in d) { cat(v) } 123 >
In the example above we iterate over the values of the list d. The list contains three element and each iteration the variable v is assigned the value of d during that iteration. Here, v is first assigned a value of 1 and then 2 and 3. During each loop we just print the value of v in that iteration.
It is possible to iterate over a matrix too
> d=matrix(1:6,nrow=2,ncol=3) > d [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 # iterate over all the elements in the matrix > for(v in d) { cat(v,',') } 1 ,2 ,3 ,4 ,5 ,6 ,
Iterating over a sequence of numbers
> for(v in 1:10) { cat(v,',') } 1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 , >
Lets see how iterating over a data frame looks like. In the loop we will not print the variable but will look at its structure
> d=data.frame(c1=c(1,2,3,4),c2=c(5,6,7,8)) > for(v in d) { str(v) } num [1:4] 1 2 3 4 num [1:4] 5 6 7 8
so each iteration gets a column. If you now want to iterate over the column then you use a
nested loop
> for(v in d) { for (k in v) {cat (k,',' ) }} 1 ,2 ,3 ,4 ,a ,b ,c ,d , >
so thats how you iterate over the data frame column first. How about row first? well, you can use the ‘apply’ or the ‘by’ function. We will look at them in the next tutorial.
We now look at the while loop.
> a=0; > b=0; > while (a<10) { b=b+a;a=a+1; } > b [1] 45 > a [1] 10
This is how it works. We initialize a to 0 and b to 0. The control enters the while loop and checks the value of a. The first time it is less than 10 so it enters the body, gets incremented by 1 and then the control returns to the expression inside while loop. At this point a =1, but its still less than 10 so the whole thing continues. Once a becomes 10 the while expression evaluates to false and then it leaves the body.
Be careful of the while loop. The expression in the while bracket has to evaluate to false at some point, otherwise the code just hangs.
next, break and repeat
> sum=0; > for(i in 1:10) + { + if(i%%2 == 0) + { + sum = sum + i; + next; + }; + cat(i) + } 13579 > sum [1] 30 >
This is how it works, when the number is even (modulus is 0) the control enters the body of if loop, increments sum by i and then exits the iteration. The next iteration then starts where i is odd. The control here does not enter the if body but prints i and then the iteration is complete
To explain the break statement lets just continue using the above iteration and say that we want to stop doing everything if the sum becomes greater than 20.
> sum=0; > for(i in 1:10) + { + if(i%%2 == 0) + { + sum = sum + i; + if (sum > 10) + break; + next; + }; + cat(i) + } 135 > sum [1] 12 >
In the above example when the sum becomes greater than 10 (12), we want to stop processing the get out of the for loop. the break statement does just that.
It now seems such a waste to specify an upper limit to i. We can use the repeat keyword instead,
> sum=0; > i=0; > repeat + { + i = i +1; + if(i%%2 == 0) + { + sum = sum + i; + if (sum > 10) + break; + next; + }; + cat(i) + } 135 >
The repeat function can be used to continue looping over a body. Remember to put in a condition of exit (break), otherwise the loop loops forever
Switch
We end this tutorial with a look at the switch function. Lets look at an example
> a=1 > switch (a,"a","b","c") [1] "a" > a=2; > switch (a,"a","b","c") [1] "b" > a=3 > switch (a,"a","b","c") [1] "c"
The function works like this: the first argument in the switch statement is evaluated. If it evaluates to 1 the first argument after the expression is evaluated (“a”) and so on. The expression can also evaluate to a character (and not a factor, in which case it is coerced to integer.)
> name="Robert" # The expression evaluated in the first argument is matched to the name of the other arguments > switch(name,Robert=20,James=22,Laura=21,24) [1] 20 #James is 22 > name="James" > switch(name,Robert=20,James=22,Laura=21,24) [1] 22 #if name is not present then the argument without name is returned. > name="Bob" > switch(name,Robert=20,James=22,Laura=21,24) [1] 24 # If there are two arguments without name then R throws an error. > switch(name,Robert=20,James=22,Laura=21,24,25) Error: duplicate 'switch' defaults: '24' and '25' > switch(name,Robert=20,James=22,Robert=25,Laura=21,24) [1] 24 #If the name matches two arguments then the first one is returned. > name="Robert" > switch(name,Robert=20,James=22,Robert=25,Laura=21,24) [1] 20 >
This completes our tutorial on the control structures, in the next tutorial lets look at the “apply” group of functions.