Datacamp course notes on conditions, loops, functions, apply family, utilities
Loops
If statement inside for loops, with break
(stop the loop) and next
(skip a certain element)1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23# The linkedin vector has already been defined for you
linkedin = c(16, 9, 13, 5, 2, 17, 14)
# Extend the for loop
for (li in linkedin) {
if (li > 10) {
print("You're popular!")
} else {
print("Be more visible!")
}
# Add if statement with break
if (li > 16) {
print("This is ridiculous, I'm outta here!")
break
}
# Add if statement with next
if (li < 5) {
print("This is too embarrassing!")
next
}
print(li)
}
strsplit()
can be used to split the string into a list of characters1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18# Pre-defined variables
rquote = "r's internals are irrefutably intriguing"
chars = strsplit(rquote, split = "")[[1]]
# Initialize rcount
rcount = 0
# Finish the for loop
for (char in chars) {
if (char == "r") {
rcount = rcount + 1
} else if (char == "u") {
break
}
}
# Print out rcount
print(rcount)
Functions
args()
is useful in inspecting the arguments involved in a function.1
args(mean)
Return()
allows us to exit the function at any time1
2
3
4
5
6
7
8math_magic = function(a, b = 1) {
if (b == 0) {
return(0) #returns 0 and exit function
}
a+b + a/b #not reached if b is 0
}
math_magic(4, 0) #returns 0
We can use our self-defined function to define another function.
If we set an argument with default value, then it’s optional when called in the future.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35# The linkedin and facebook vectors have already been created for you
linkedin = c(16, 9, 13, 5, 2, 17, 14)
facebook = c(17, 7, 5, 16, 8, 13, 14)
# The interpret() can be used inside interpret_all()
interpret <- function(num_views) {
if (num_views > 15) {
print("You're popular!")
return(num_views)
} else {
print("Try to be more visible!")
return(0)
}
}
# Define the interpret_all() function
# views: vector with data to interpret
# return_sum: return total number of views on popular days (Defining it as an optional argument)
interpret_all = function(views, return_sum = TRUE) {
count = 0
for (v in views) {
count = count + interpret(v)
}
if (return_sum == TRUE) {
return(count)
} else {
return(NULL)
}
}
# Call the interpret_all() function on both linkedin and facebook
interpret_all(linkedin)
interpret_all(facebook)
Remember to learn on data.table on Datacamp
The Apply Family
Summary
lapply()
: apply function over list or vector. output = listsapply()
: apply function over list or vector. Try to simplify list to array.vapply()
: apply function over list or vector. Explicitly specify output format.
lapply
lapply
takes a vector or list X, and applies the function FUN
to each of its members. If FUN
requires additional arguments, you pass them after you’ve specified X
and FUN
(...
). The output of lapply()
is a list, the same length as X
, where each element is the result of applying FUN
on the corresponding element of X
.
lapply
will always return a list. We can use unlist()
to quickly turn a list into a vector. In this case, it’s easier to just use sapply()
, which simply returns a vector that contains homogeneous elements.
1 | cities=c("New York", "Paris", "London", "Tokyo", "Rio de Janeiro", "Cape Town") |
Anonymous Functions
When you create a function, you can use the assignment operator to give the function a name. It’s perfectly possible, however, to not give the function a name. This is called an anonymous function:1
2
3
4
5
6
7
8# Named function
triple <- function(x) { 3 * x }
# Anonymous function with same implementation
function(x) { 3 * x }
# Use anonymous function inside lapply()
lapply(list(1,2,3), function(x) { 3 * x })
lapply
can also be used with specification on certain arguments of a function1
2
3
4multiply <- function(x, factor) {
x * factor
}
lapply(list(1,2,3), multiply, factor = 3)
sapply
With sapply
, the output is no longer a list.
The following example returns a vector (1 dimension). You can also determine whether you want the original element in cities
to show as the name of the new vector element by USE.NAMES
argument.1
sapply(cities, nchar, USE.NAMES = FALSE)
The following example returns a matrix1
2
3
4
5
6
7
8
9first_and_last = function(name) {
name = gsub(" ", "", name) #substitute the " " in the string with ""
letters = strsplit(name, split = "")[[1]]
c(first = min(letters), last = max(letters))
}
first_and_last("New York")
sapply(cities, first_and_last)
vapply
It is worth noticing that sometimes the simplification will not work out since the elements of the output are not of equal dimension. In this case, sapply
will still returns a list. vapply
is better in this sense, since it specifically asks you to define the data type and length of the result, and will generate an error if the output does not satisfy the predefined type and length.
1 | cities=c("New York", "Paris", "London", "Tokyo", "Rio de Janeiro", "Cape Town") |
vapply
can also take anonymous function and additional arguments1
2
3
4vapply(temp,
function(x, y) {mean(x) > y },
y = 5,
logical(1))
Utilities
List of useful functions used in this course
unlist()
turn a list into a vector.tolower()
convert the character into lowercase letters (can also work on list).strsplit(x, split="")
can split the string by the argumentsplit=""
and place all splitted characters into one vector in a list.1
2
3pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- strsplit(pioneers, split = ":")
split_low <- lapply(split, tolower)
Here, the split_low
contains 4 vectors, which each contains 2 elements.
- To call the first element in every vector in a list, simply use
x[1]
. identical()
checks whether the two object are identical to each other.runif(10)
will generate 10 random deviates [0,1].