Datacamp - Intermediate R

Datacamp course notes on conditions, loops, functions, apply family, utilities

Loops

If statement inside for loops, with break (stop the loop) and next (skip a certain element)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# The linkedin vector has already been defined for you
linkedin = c(16, 9, 13, 5, 2, 17, 14)

# Extend the for loop
for (li in linkedin) {
if (li > 10) {
print("You're popular!")
} else {
print("Be more visible!")
}

# Add if statement with break
if (li > 16) {
print("This is ridiculous, I'm outta here!")
break
}
# Add if statement with next
if (li < 5) {
print("This is too embarrassing!")
next
}
print(li)
}

strsplit() can be used to split the string into a list of characters

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Pre-defined variables
rquote = "r's internals are irrefutably intriguing"
chars = strsplit(rquote, split = "")[[1]]

# Initialize rcount
rcount = 0

# Finish the for loop
for (char in chars) {
if (char == "r") {
rcount = rcount + 1
} else if (char == "u") {
break
}
}

# Print out rcount
print(rcount)

Functions

args() is useful in inspecting the arguments involved in a function.

1
args(mean)

Return() allows us to exit the function at any time

1
2
3
4
5
6
7
8
math_magic = function(a, b = 1) {
if (b == 0) {
return(0) #returns 0 and exit function
}
a+b + a/b #not reached if b is 0
}

math_magic(4, 0) #returns 0

We can use our self-defined function to define another function.
If we set an argument with default value, then it’s optional when called in the future.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# The linkedin and facebook vectors have already been created for you
linkedin = c(16, 9, 13, 5, 2, 17, 14)
facebook = c(17, 7, 5, 16, 8, 13, 14)

# The interpret() can be used inside interpret_all()
interpret <- function(num_views) {
if (num_views > 15) {
print("You're popular!")
return(num_views)
} else {
print("Try to be more visible!")
return(0)
}
}

# Define the interpret_all() function
# views: vector with data to interpret
# return_sum: return total number of views on popular days (Defining it as an optional argument)
interpret_all = function(views, return_sum = TRUE) {
count = 0

for (v in views) {
count = count + interpret(v)
}

if (return_sum == TRUE) {
return(count)
} else {
return(NULL)
}
}

# Call the interpret_all() function on both linkedin and facebook
interpret_all(linkedin)
interpret_all(facebook)

Remember to learn on data.table on Datacamp

The Apply Family

Summary

  • lapply(): apply function over list or vector. output = list
  • sapply(): apply function over list or vector. Try to simplify list to array.
  • vapply(): apply function over list or vector. Explicitly specify output format.

lapply

lapply takes a vector or list X, and applies the function FUN to each of its members. If FUN requires additional arguments, you pass them after you’ve specified X and FUN (...). The output of lapply() is a list, the same length as X, where each element is the result of applying FUN on the corresponding element of X.

lapply will always return a list. We can use unlist() to quickly turn a list into a vector. In this case, it’s easier to just use sapply(), which simply returns a vector that contains homogeneous elements.

1
2
cities=c("New York", "Paris", "London", "Tokyo", "Rio de Janeiro", "Cape Town")
unlist(lapply(cities, nchar))

Anonymous Functions

When you create a function, you can use the assignment operator to give the function a name. It’s perfectly possible, however, to not give the function a name. This is called an anonymous function:

1
2
3
4
5
6
7
8
# Named function
triple <- function(x) { 3 * x }

# Anonymous function with same implementation
function(x) { 3 * x }

# Use anonymous function inside lapply()
lapply(list(1,2,3), function(x) { 3 * x })

lapply can also be used with specification on certain arguments of a function

1
2
3
4
multiply <- function(x, factor) {
x * factor
}
lapply(list(1,2,3), multiply, factor = 3)

sapply

With sapply, the output is no longer a list.
The following example returns a vector (1 dimension). You can also determine whether you want the original element in cities to show as the name of the new vector element by USE.NAMES argument.

1
sapply(cities, nchar, USE.NAMES = FALSE)

The following example returns a matrix

1
2
3
4
5
6
7
8
9
first_and_last = function(name) {
name = gsub(" ", "", name) #substitute the " " in the string with ""
letters = strsplit(name, split = "")[[1]]
c(first = min(letters), last = max(letters))
}

first_and_last("New York")

sapply(cities, first_and_last)

vapply

It is worth noticing that sometimes the simplification will not work out since the elements of the output are not of equal dimension. In this case, sapply will still returns a list. vapply is better in this sense, since it specifically asks you to define the data type and length of the result, and will generate an error if the output does not satisfy the predefined type and length.

1
2
3
4
5
6
7
8
9
10
cities=c("New York", "Paris", "London", "Tokyo", "Rio de Janeiro", "Cape Town")
vapply(cities, nchar, numeric(1))

unique_letters = function(name) {
name = gsub(" ", "", name)
letters = strsplit(name, split = "")[[1]]
unique(letters)
}
sapply(cities, unique_letters)# This will generate a list of out vectors with different length.
vapply(cities, unique_letters, character(4), USE.NAMES = TRUE)# This will lead to an error. USE.NAMES is an optional argument with default value at TRUE.

vapply can also take anonymous function and additional arguments

1
2
3
4
vapply(temp, 
function(x, y) {mean(x) > y },
y = 5,
logical(1))

Utilities

List of useful functions used in this course

  1. unlist() turn a list into a vector.
  2. tolower() convert the character into lowercase letters (can also work on list).
  3. strsplit(x, split="") can split the string by the argument split="" and place all splitted characters into one vector in a list.
    1
    2
    3
    pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
    split <- strsplit(pioneers, split = ":")
    split_low <- lapply(split, tolower)

Here, the split_low contains 4 vectors, which each contains 2 elements.

  1. To call the first element in every vector in a list, simply use x[1].
  2. identical() checks whether the two object are identical to each other.
  3. runif(10) will generate 10 random deviates [0,1].