DataCamp - Python Data Science Toolbox (Part1)

Datacamp course notes on writing functions and error handling.

User-Defined Funtions

Defining a function

Return
If we don’t want to directly print the value, but want to return the value and assign it to a value: use return.
Note: It is important to remember that assigning a variable to a function that prints a value but does not return a value will results in that variable being of type NoneType.

Docstrings

  • desctibe what your function does
  • serve as documentation for your function
  • placed in the immediate line after the function header, in between triple double quotes """
1
2
3
4
5
6
def square(value):
"""Return the square of a value""" # Docstring
new_value = value ** 2
return new_value
num = square(4)
print(num)

Multiple function parameters and returns

To have multiple parameters, simply accept more than 1 parameters when defining the funtion. The number of arguments in the functions equals to teh number of parameters.

To have multiple returns, we need to use Tuples.
Tuples:

  • Like a list - can contain multiple values
  • Immutable - can’t modify values, this means you cannot update the element in the tuple with x[0] = a
  • Constructe using parentheses ()

Unpack a tuple into several variables

1
2
3
even_nums = (2, 4, 6)
a, b, c = even_nums
print(a) #will return 2

Accessing tuple elements as with lists using zero-indexing

1
2
print(even_nums[1])
second_num = even_nums[1]

Example

1
2
3
4
5
6
def raise_both(value1, value2) # Function header
"""Raise value1 to the power of value2 and vice versa"""
new_value1 = value1 ** value2 # Function body
new_value2 = value2 ** value1
new_tuple = (new_value1, new_value2)
return new_tuple

Case Study: tweeter language counts

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Define count_entries()
def count_entries(df, col_name):
"""Return a dictionary with counts of
occurrences as value for each key."""

# Initialize an empty dictionary: langs_count
langs_count = {}

# Extract column from DataFrame: col
col = df[col_name]

# Iterate over lang column in DataFrame
for entry in col:

# If the language is in langs_count, add 1
if entry in langs_count.keys():
langs_count[entry] += 1
# Else add the language to langs_count, set the value to 1
else:
langs_count[entry] = 1

# Return the langs_count dictionary
return(langs_count)

# Call count_entries(): result
result = count_entries(tweets_df, 'lang')

# Print the result
print(result)

Scope and User-Defined Functions

Scope is the part of the program where an object or name may be accessible, sinnce not all objects are accessible everywhere in a script.

  • Global scope: defined in the main body of a script or python program
  • Local scope: defined inside a function. Once the execution is done, any name inside the local scope cease to exist
  • Built-in scope: names in the pre-defined built-ins module that python provides, such as print() and sum()

The sequence of scopes that Python will look in when calling a name is: local scope -> Enclosing functions (if any) -> global scope -> built-in scope. (LEGB Rule)

Examples
First we define a function:

1
2
3
4
5
def square(value):
"""Return the square of a value"""
new_value = value ** 2
return new_value
new_value #will return error

We cannot access the variable new_value outside the function, since this variable is defined only within the local scope of the function, and is not defined globally.

Below, we define the name globally before defining and calling the function.

1
2
3
4
5
6
7
new_value = 10
def square(value):
"""Return the square of a value"""
new_value = value ** 2
return new_value
square(3) #results in 9
new_value #will return 10

  • Anytime we call the name in the global scope, we will access the name in the global scope.
  • Anytime we call the name in the local scope of the function, Python will look first in the local scope (that’s why square(3) results in 9 instead of 10). If Python cannot find the name in the local scope, it will then, and only then, look in the global scope.

Below, we access new_val, which is defined globally, within the function square. Note that the global value accessed is the value at the time the function is called, not the value when the function is defined.

Thus, if we reassigned a value to new_val, and call the function square again, we can see that the new value of new_val is accessed.

1
2
3
4
5
6
7
8
new_val = 10
def square(value):
"""Return the square of a value"""
new_value2 = new_val ** 2 #referring to the name `new_val` in the global scope
return new_value2
square(3) #results in 100
new_val = 20
square(3) #will return 400

What if we want to alter the value of a global name within a function call? We can use global to specify that.

1
2
3
4
5
6
7
8
new_val = 10
def square(value):
"""Return the square of a value"""
global new_val # this is the variable that we wishe to access and alter
new_val = new_val ** 2 #referring to the name `new_val` in the global scope
return new_val
square(3) #results in 100
new_val #returns 100

Another example

1
2
3
4
5
6
7
8
num = 5
def func2():
global num
double_num = num * 2
num = 6
print(double_num)
func2() #will return 10
num #will return 6

Nested functions

It helps when multiple similar computations are needed.

1
2
3
4
5
6
7
8
def mod2plus5(x1, x2, x3):
"""Returns the remainder plys 5 of three values."""

def inner(x):
"""Returns the remainder plus 5 of a value."""
return x % 2 + 5
return (inner(x1), inner(x2), inner(x3))
print(mod2plus5(1,2,3))

Also, it can be used to return a function:

1
2
3
4
5
6
7
8
9
10
11
12
13
def raise_val(n):
"""Return the inner function."""

def inner(x):
"""Raise x to the power of n."""
raised = x ** n
return raised

return inner

square = raise_val(2)
cube = raise_val(3)
print(square(2), cube(4)) # 4, 64

Using nonlocal to access and alter names in an enlosing scope:

1
2
3
4
5
6
7
8
9
10
11
12
13
def outer():
"""Prints the value of n."""
n = 1

def inner():
nonlocal n # same as global, to access and alter the name in the enclosing scope
n = 2
print(n)

inner()
print(n)

outer() #returns 2, instead of 1

Default and Flexible Arguements

Add a default argument:

1
2
3
4
5
6
def power(number, pow = 1):
"""Raise number to the power of pow"""
new_value = number ** pow
return new_value
power(9 ,2) # 81
power(9) #9 with default pow = 1

Flexible arguments

We use flexible arguments when we are not sure about the specific arguments added to the function with *args

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def add_all(*args):
"""Sum all values in the *args together, irrespective of how many they are."""

# Initialize sum
sum_all = 0

# Accumulate the sum
for num in args:
sum_all += num

return sum_all
add_all(1) # 1
add_all(1, 2) # 3
add_all(5, 10, 15, 20) # 50

We can also pass arbitrary number of keyword arguments with **kwargs, which is arguments preceded by identifiers. (dictionary)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Define report_status
def report_status(**kwargs):
"""Print out the status of a movie character."""

print("\nBEGIN: REPORT\n")

# Iterate over the key-value pairs of kwargs
for key, value in kwargs.items():
# Print out the keys and values, separated by a colon ':'
print(key + ": " + value)

print("\nEND REPORT")

# First call to report_status()
report_status(name = 'luke',
affiliation = 'jedi',
status = 'missing')

# Second call to report_status()
report_status(name = "anakin",
affiliation = "sith lord",
status = "deceased")

Case study

Generate the previous function to count the occurance in any column in the dataframe.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Define count_entries()
def count_entries(df, *args):
"""Return a dictionary with counts of
occurrences as value for each key."""

#Initialize an empty dictionary: cols_count
cols_count = {}

# Iterate over column names in args
for col_name in args:

# Extract column from DataFrame: col
col = df[col_name]

# Iterate over the column in DataFrame
for entry in col:

# If entry is in cols_count, add 1
if entry in cols_count.keys():
cols_count[entry] += 1

# Else add the entry to cols_count, set the value to 1
else:
cols_count[entry] = 1

# Return the cols_count dictionary
return cols_count

# Call count_entries(): result1
result1 = count_entries(tweets_df, 'lang')

# Call count_entries(): result2
result2 = count_entries(tweets_df, 'lang', 'source')

# Print result1 and result2
print(result1)
print(result2)

Lambda Functions & Error-Handling

Lambda Functions

Lambda allows you to write a function in a quick and potentially dirty way.

1
2
raise_to_power = lambda x, y: x ** y
raise_to_power(2, 3) #returns 8

map()

  • takes two arguments: map(func, seq)
  • applies the function the all elements in the sequence
  • In this case, the function does not even need to have a name, and is thus referred as anonymous function.
    1
    2
    3
    4
    nums = [48, 6, 9, 21, 1]
    square_all = map(lambda num: num ** 2, nums)
    print(square_all) # can only show that this is a map object, but cannot see the content in this object.
    print(list(square_all)) #this turn the results to a list and thus is printable. [2304, 36, 81, 441, 1]

filter()

  • filter out elements from a list that don’t satisfy certain criteria
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    # Create a list of strings: fellowship
    fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

    # Use filter() to apply a lambda function over fellowship: result
    result = filter(lambda member: len(member) > 6, fellowship)

    # Convert result to a list: result_list
    result_list = list(result)

    # Convert result into a list and print it
    print(result_list)

reduce()

  • useful for performing some computation on a list and returns a single value as a result
  • need to be imported from the functools module before use
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    # Import reduce from functools
    from functools import reduce

    # Create a list of strings: stark
    stark = ['robb', 'sansa', 'arya', 'eddard', 'jon']

    # Use reduce() to apply a lambda function over stark: result
    result = reduce(lambda item1, item2: item1 + item2, stark)

    # Print the result
    print(result)

Error Handling

We should endeavor to produce useful error messages for the functions that we write: catch exceptions during execution with try & except clause

  • Runs the code following try
  • If there’s an exception, run the code following except
1
2
3
4
5
6
7
8
9
10
def sqrt(x)
"""Returns the square root of a number."""
try:
return x ** 0.5
except:
print('x must be an int or float')

sqrt(4) #2.0
sqrt(10.0) #3.162277...
sqrt('hi') #'x must be an int or float'

If we only wish to catch type errors, and let other errors pass through: (more errors can be specified - refer to online documentation)

1
2
3
4
5
6
def sqrt(x)
"""Returns the square root of a number."""
try:
return x ** 0.5
except TypeError:
print('x must be an int or float')

If we don’t want our function to work when some specific criteria are met, we can manually raise an error with an if clause (e.g. the input must be positive):

1
2
3
4
5
6
7
8
9
def sqrt(x)
"""Returns the square root of a number."""
if x < 0:
raise ValueError('x must be non-negative')
try:
return x ** 0.5
except TypeError:
print('x must be an int or float')
sqrt(-2) #will return a value error and a error message saying that 'x must be non-negative'

Case Study

  1. Filtering out all the retweets

    1
    2
    3
    4
    5
    6
    7
    8
    9
    # Select retweets from the Twitter DataFrame: result
    result = filter(lambda x: x[0:2] == 'RT', tweets_df['text'])

    # Create list from filter object result: res_list
    res_list = list(result)

    # Print all retweets in res_list
    for tweet in res_list:
    print(tweet)
  2. Add a try-except block to the function defined in previous case

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    # Define count_entries()
    def count_entries(df, col_name='lang'):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    cols_count = {}

    # Add try block
    try:
    col = df[col_name]
    for entry in col:
    if entry in cols_count.keys():
    cols_count[entry] += 1
    else:
    cols_count[entry] = 1
    return cols_count

    # Add except block
    except:
    print('The DataFrame does not have a ' + col_name + ' column.')

    # Call count_entries(): result1
    result1 = count_entries(tweets_df, 'lang')
    print(result1)

    # Call count_entries(): result2
    result2 = count_entries(tweets_df, 'lang1') #error message in except
  3. Raise a ValueError with if clause.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    # Define count_entries()
    def count_entries(df, col_name='lang'):
    """Return a dictionary with counts of
    occurrences as value for each key."""

    # Raise a ValueError if col_name is NOT in DataFrame
    if col_name not in df.columns:
    raise ValueError('The DataFrame does not have a ' + col_name + ' column.')

    cols_count = {}
    col = df[col_name]
    for entry in col:
    if entry in cols_count.keys():
    cols_count[entry] += 1
    else:
    cols_count[entry] = 1

    return cols_count
    result1 = count_entries(tweets_df, 'lang')
    print(result1)
    count_entries(tweets_df, 'lang1') # ValueError with specified message