Mastering purrr: From Basic Maps to Functional Magic in R
Welcome back to the world of purrr
! Last time (about a year ago), we spun a metaphorical yarn about the wonders of purrr
in R. Today, we’re rolling up our sleeves and diving into a hands-on tutorial. We’re going to explore how purrr
makes working with lists and vectors a breeze, transforming and manipulating them like a data wizard.
With purrr
, you can apply functions to each element of a list or vector, manipulate them, check conditions, and so much more. It’s all about making your data dance to your commands with elegance and efficiency. Ready to unleash some functional magic?
Are map
Functions Like apply
Functions?
You might be wondering, “Aren’t map
functions just fancy versions of apply
functions?” It’s a fair question! Both map
and apply
functions help you apply a function to elements in a data structure, but purrr
takes it to a whole new level.
Here’s why purrr
and its map
functions are worth your attention:
- Consistency:
purrr
functions have a consistent naming scheme, making them easier to learn and remember. - Type Safety:
map
functions inpurrr
return outputs of consistent types, reducing unexpected errors. - Integration: Seamlessly integrate with other tidyverse packages, making your data wrangling pipeline smoother.
Let’s see a quick comparison:
library(tidyverse)
# Using lapply (base R)
<- list(1, 2, 3, 4, 5)
numbers <- lapply(numbers, function(x) x^2)
squared_lapply
# Using map (purrr)
<- map(numbers, ~ .x^2)
squared_map
print(squared_lapply)
1]]
[[1] 1
[
2]]
[[1] 4
[
3]]
[[1] 9
[
4]]
[[1] 16
[
5]]
[[1] 25
[
print(squared_map)
1]]
[[1] 1
[
2]]
[[1] 4
[
3]]
[[1] 9
[
4]]
[[1] 16
[
5]]
[[1] 25 [
Both do the same thing, but purrr
’s map
function is more readable and concise, especially when paired with the tidyverse syntax.
Here’s another example with a built-in dataset:
# Using lapply with a built-in dataset
<- split(iris, iris$Species)
iris_split <- lapply(iris_split, function(df) mean(df$Sepal.Length))
mean_sepal_length_lapply
# Using map with a built-in dataset
<- map(iris_split, ~ mean(.x$Sepal.Length))
mean_sepal_length_map
print(mean_sepal_length_lapply)
$setosa
1] 5.006
[
$versicolor
1] 5.936
[
$virginica
1] 6.588
[
print(mean_sepal_length_map)
$setosa
1] 5.006
[
$versicolor
1] 5.936
[
$virginica
1] 6.588 [
Again, the purrr
version is cleaner and easier to understand at a glance.
Convinced? Let’s move on to explore simple maps and their variants to see more of purrr
’s magic. Ready?
Simple Maps and Their Variants
Now that we know why purrr
’s map
functions are so cool, let’s dive into some practical examples. The map
function family is like a Swiss Army knife for data transformation. It comes in different flavors depending on the type of output you want: logical, integer, character, or double.
Let’s start with the basic map
function:
library(tidyverse)
# Basic map example
<- list(1, 2, 3, 4, 5)
numbers <- map(numbers, ~ .x^2)
squared_numbers squared_numbers
Easy, right? Yes, but we have one twist here. Result is returned as list, and we don’t always need list. So now, let’s look at the type-specific variants. These functions ensure that the output is of a specific type, which can help avoid unexpected surprises in your data processing pipeline.
- Logical (
map_lgl
):
# Check if each number is even
<- map_lgl(numbers, ~ .x %% 2 == 0)
is_even
is_even
1] FALSE TRUE FALSE TRUE FALSE
[
# it is not list anymore, it is logical vector
- Integer (
map_int
):
# Double each number and return as integers
<- map_int(numbers, ~ .x * 2)
doubled_integers
doubled_integers
1] 2 4 6 8 10 [
- Character (
map_chr
):
# Convert each number to a string
<- map_chr(numbers, ~ paste("Number", .x))
number_strings
number_strings
1] "Number 1" "Number 2" "Number 3" "Number 4" "Number 5" [
- Double (
map_dbl
):
# Half each number and return as doubles
<- map_dbl(numbers, ~ .x / 2)
halved_doubles
halved_doubles
1] 0.5 1.0 1.5 2.0 2.5 [
Let’s apply this to a built-in dataset to see it in action:
# Using map_dbl on the iris dataset to get the mean of each numeric column
<- iris %>%
iris_means select(-Species) %>%
map_dbl(mean)
iris_means
Sepal.Length Sepal.Width Petal.Length Petal.Width 5.843333 3.057333 3.758000 1.199333
Here, we’ve calculated the mean of each numeric column in the iris
dataset, and the result is a named vector of doubles.
Pretty neat, huh? The map
family makes it easy to ensure your data stays in the format you expect.
Ready to see how purrr
handles multiple vectors with map2
and pmap
?
Not Only One Vector: map2
and pmap
+ Variants
So far, we’ve seen how map
functions work with a single vector or list. But what if you have multiple vectors and want to apply a function to corresponding elements from each? Enter map2
and pmap
.
map2
: This function applies a function to corresponding elements of two vectors or lists.pmap
: This function applies a function to corresponding elements of multiple lists.
Let’s start with map2
:
library(tidyverse)
# Two vectors to work with
<- c(1, 2, 3)
vec1 <- c(4, 5, 6)
vec2
# Adding corresponding elements of two vectors
<- map2(vec1, vec2, ~ .x + .y)
sum_vecs
sum_vecs
1]]
[[1] 5
[
2]]
[[1] 7
[
3]]
[[1] 9 [
Here, map2
takes elements from vec1
and vec2
and adds them together.
Now, let’s step it up with pmap
:
# Creating a tibble for multiple lists
<- tibble(
df a = 1:3,
b = 4:6,
c = 7:9
)
# Summing corresponding elements of multiple lists
<- pmap(df, ~ ..1 + ..2 + ..3)
sum_pmap
sum_pmap
1]]
[[1] 12
[
2]]
[[1] 15
[
3]]
[[1] 18 [
In this example, pmap
takes elements from columns a
, b
, and c
of the tibble and sums them up.
Look at syntax in those two examples. In map2
, we give two vectors or lists, and then we are reffering to them as .x and .y. Further in pmap
example we have data.frame, but it can be a list of lists, and we need to refer to them with numbers like ..1, ..2 and ..3 (and more if needed).
Variants of map2
and pmap
Just like map
, map2
and pmap
have type-specific variants. Let’s see a couple of examples using data structures already defined above:
map2_dbl:
# Multiplying corresponding elements of two vectors and returning doubles
<- map2_dbl(vec1, vec2, ~ .x * .y)
product_vecs
product_vecs
1] 4 10 18 [
pmap_chr:
# Concatenating corresponding elements of multiple lists into strings
<- pmap_chr(df, ~ paste(..1, ..2, ..3, sep = "-"))
concat_pmap
concat_pmap
1] "1-4-7" "2-5-8" "3-6-9" [
These variants ensure that your results are of the expected type, just like the basic map
variants.
With map2
and pmap
, you can handle more complex data transformations involving multiple vectors or lists with ease.
Ready to move on and see what lmap
and imap
can do for you?
Using imap
for Indexed Mapping and Conditional Maps with _if
and _at
Let’s combine our exploration of imap
with the conditional mapping functions map_if
and map_at
. These functions give you more control over how and when functions are applied to your data, making your code more precise and expressive.
imap
: Indexed Mapping
The imap
function is a handy tool when you need to include the index or names of elements in your function calls. This is particularly useful for tasks where the position or name of an element influences the operation performed on it.
Here’s a practical example with a named list:
library(tidyverse)
# A named list of scores
<- list(math = 90, science = 85, history = 78)
named_scores
# Create descriptive strings for each score
<- imap(named_scores, ~ paste(.y, "score is", .x))
score_descriptions
score_descriptions
$math
1] "math score is 90"
[
$science
1] "science score is 85"
[
$history
1] "history score is 78" [
In this example:
- We have a named list
named_scores
with subject scores. - We use
imap
to create a descriptive string for each score that includes the subject name and the score.
Conditional Maps with map_if
and map_at
Sometimes, you don’t want to apply a function to all elements of a list or vector — only to those that meet certain conditions. This is where map_if
and map_at
come into play.
map_if
: Conditional Mapping
Use map_if
to apply a function to elements that satisfy a specific condition (predicate).
# Mixed list of numbers and characters
<- list(1, "a", 3, "b", 5)
mixed_list
# Double only the numeric elements
<- map_if(mixed_list, is.numeric, ~ .x * 2)
doubled_numbers
doubled_numbers
1]]
[[1] 2
[
2]]
[[1] "a"
[
3]]
[[1] 6
[
4]]
[[1] "b"
[
5]]
[[1] 10 [
In this example:
- We have a mixed list of numbers and characters.
- We use
map_if
to double only the numeric elements, leaving the characters unchanged.
map_at
: Specific Element Mapping
Use map_at
to apply a function to specific elements of a list or vector, identified by their indices or names.
# A named list of mixed types
<- list(a = 1, b = "hello", c = 3, d = "world")
specific_list
# Convert only the character elements to uppercase
<- map_at(specific_list, c("b", "d"), ~ toupper(.x))
uppercase_chars
uppercase_chars
$a
1] 1
[
$b
1] "HELLO"
[
$c
1] 3
[
$d
1] "WORLD" [
In this example:
- We have a named list with mixed types.
- We use
map_at
to convert only the specified character elements to uppercase.
Combining imap
, map_if
, and map_at
allows you to handle complex data transformation tasks with precision and clarity. These functions make it easy to tailor your operations to the specific needs of your data.
Shall we move on to the next chapter to explore walk
and its friends for side-effect operations?
Make Something Happen Outside of Data: walk
and Its Friends
Sometimes, you want to perform operations that have side effects, like printing, writing to a file, or plotting, rather than returning a transformed list or vector. This is where the walk
family of functions comes in handy. These functions are designed to be used for their side effects, as they return NULL
.
walk
The basic walk
function applies a function to each element of a list or vector and performs actions like printing or saving files.
library(tidyverse)
# A list of numbers
<- list(1, 2, 3, 4, 5)
numbers
# Print each number
walk(numbers, ~ print(.x))
1] 1
[1] 2
[1] 3
[1] 4
[1] 5 [
In this example, walk
prints each element of the numbers
list.
walk2
When you have two lists or vectors and you want to perform side-effect operations on their corresponding elements, walk2
is your friend.
# Two vectors to work with
<- c("apple", "banana", "cherry")
vec1 <- c("red", "yellow", "dark red")
vec2
# Print each fruit with its color
walk2(vec1, vec2, ~ cat(.x, "is", .y, "\n"))
apple is red
banana is yellow cherry is dark red
Here, walk2
prints each fruit with its corresponding color.
iwalk
iwalk
is the side-effect version of imap
. It includes the index or names of the elements, which can be useful for logging or debugging.
# A named list of scores
<- list(math = 90, science = 85, history = 78)
named_scores
# Print each subject with its score
iwalk(named_scores, ~ cat("The score for", .y, "is", .x, "\n"))
for math is 90
The score for science is 85
The score for history is 78 The score
In this example, iwalk
prints each subject name with its corresponding score.
Practical Example with Built-in Data
Let’s use a built-in dataset and perform some side-effect operations. Suppose you want to save plots of each numeric column in the mtcars
dataset to separate files.
# Directory to save plots
dir.create("plots")
# Save histograms of each numeric column to files
walk(names(mtcars), ~ {
if (is.numeric(mtcars[[.x]])) {
<- paste0("plots/", .x, "_histogram.png")
plot_path png(plot_path)
hist(mtcars[[.x]], main = paste("Histogram of", .x), xlab = .x)
dev.off()
} })
In this example:
- We create a directory called “plots”.
- We use
walk
to iterate over the names of themtcars
dataset. - For each numeric column, we save a histogram to a PNG file.
This is a practical demonstration of how walk
can be used for side-effect operations such as saving files.
Why Do We Need modify
Then?
Sometimes you need to tweak elements within a list or vector without completely transforming them. This is where modify
functions come in handy. They allow you to make specific changes to elements while preserving the overall structure of your data.
modify
The modify
function applies a transformation to each element of a list or vector and returns the modified list or vector.
library(tidyverse)
# A list of numbers
<- list(1, 2, 3, 4, 5)
numbers
# Add 10 to each number
<- modify(numbers, ~ .x + 10)
modified_numbers
modified_numbers
1]]
[[1] 11
[
2]]
[[1] 12
[
3]]
[[1] 13
[
4]]
[[1] 14
[
5]]
[[1] 15 [
In this example, modify
adds 10 to each element of the numbers
list.
modify_if
modify_if
is used to conditionally modify elements that meet a specified condition (predicate).
# Modify only the even numbers by multiplying them by 2
<- modify_if(numbers, ~ .x %% 2 == 0, ~ .x * 2)
modified_if
modified_if
1]]
[[1] 1
[
2]]
[[1] 4
[
3]]
[[1] 3
[
4]]
[[1] 8
[
5]]
[[1] 5 [
Here, modify_if
multiplies only the even numbers by 2.
modify_at
modify_at
allows you to specify which elements to modify based on their indices or names.
# A named list of mixed types
<- list(a = 1, b = "hello", c = 3, d = "world")
named_list
# Convert only the specified elements to uppercase
<- modify_at(named_list, c("b", "d"), ~ toupper(.x))
modified_at
modified_at
$a
1] 1
[
$b
1] "HELLO"
[
$c
1] 3
[
$d
1] "WORLD" [
In this example, modify_at
converts the specified character elements to uppercase.
modify
with Built-in Dataset
Let’s use the iris
dataset to demonstrate how modify
functions can be applied in a practical scenario. Suppose we want to normalize numeric columns by dividing each value by the maximum value in its column.
# Normalizing numeric columns in the iris dataset
<- iris %>%
normalized_iris modify_at(vars(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width),
~ .x / max(.x))
head(normalized_iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 0.6455696 0.7954545 0.2028986 0.08 setosa
2 0.6202532 0.6818182 0.2028986 0.08 setosa
3 0.5949367 0.7272727 0.1884058 0.08 setosa
4 0.5822785 0.7045455 0.2173913 0.08 setosa
5 0.6329114 0.8181818 0.2028986 0.08 setosa
6 0.6835443 0.8863636 0.2463768 0.16 setosa
head(iris)
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
In this example:
- We use
modify_at
to specify the numeric columns of theiris
dataset. - Each value in these columns is divided by the maximum value in its respective column, normalizing the data.
modify
functions offer a powerful way to make targeted changes to your data, providing flexibility and control.
Predicates: Does Data Satisfy Our Assumptions? every
, some
, and none
When working with data, it’s often necessary to check if certain conditions hold across elements in a list or vector. This is where predicate functions like every
, some
, and none
come in handy. These functions help you verify whether elements meet specified criteria, making your data validation tasks easier and more expressive.
every
The every
function checks if all elements in a list or vector satisfy a given predicate. If all elements meet the condition, it returns TRUE
; otherwise, it returns FALSE
.
library(tidyverse)
# A list of numbers
<- list(2, 4, 6, 8)
numbers
# Check if all numbers are even
<- every(numbers, ~ .x %% 2 == 0)
all_even
all_even
1] TRUE [
In this example, every
checks if all elements in the numbers
list are even.
some
The some
function checks if at least one element in a list or vector satisfies a given predicate. If any element meets the condition, it returns TRUE
; otherwise, it returns FALSE
.
# Check if any number is greater than 5
<- some(numbers, ~ .x > 5)
any_greater_than_five
any_greater_than_five
1] TRUE [
Here, some
checks if any element in the numbers
list is greater than 5.
none
The none
function checks if no elements in a list or vector satisfy a given predicate. If no elements meet the condition, it returns TRUE
; otherwise, it returns FALSE
.
# Check if no number is odd
<- none(numbers, ~ .x %% 2 != 0)
none_odd
none_odd
1] TRUE [
In this example, none
checks if no elements in the numbers
list are odd.
Practical Example with Built-in Dataset
Let’s use the mtcars
dataset to demonstrate how these predicate functions can be applied in a practical scenario. Suppose we want to check various conditions on the columns of this dataset.
# Check if all cars have more than 10 miles per gallon (mpg)
<- mtcars %>%
all_mpg_above_10 select(mpg) %>%
map_lgl(~ every(.x, ~ .x > 10))
all_mpg_above_10
mpgTRUE
# Check if some cars have more than 150 horsepower (hp)
<- mtcars %>%
some_hp_above_150 select(hp) %>%
map_lgl(~ some(.x, ~ .x > 150))
some_hp_above_150
hpTRUE
# Check if no car has more than 8 cylinders
<- mtcars %>%
none_cyl_above_8 select(cyl) %>%
map_lgl(~ none(.x, ~ .x > 8))
none_cyl_above_8
cylTRUE
In this example:
- We check if all cars in the
mtcars
dataset have more than 10 mpg usingevery
. - We check if some cars have more than 150 horsepower using
some
. - We check if no car has more than 8 cylinders using
none
.
These predicate functions provide a straightforward way to validate your data against specific conditions, making your analysis more robust.
What If Not: keep
and discard
When you’re working with lists or vectors, you often need to filter elements based on certain conditions. The keep
and discard
functions from purrr
are designed for this purpose. They allow you to retain or remove elements that meet specified criteria, making it easy to clean and subset your data.
keep
The keep
function retains elements that satisfy a given predicate. If an element meets the condition, it is kept; otherwise, it is removed.
library(tidyverse)
# A list of mixed numbers
<- list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
numbers
# Keep only the even numbers
<- keep(numbers, ~ .x %% 2 == 0)
even_numbers
even_numbers
1]]
[[1] 2
[
2]]
[[1] 4
[
3]]
[[1] 6
[
4]]
[[1] 8
[
5]]
[[1] 10 [
In this example, keep
retains only the even numbers from the numbers
list.
discard
The discard
function removes elements that satisfy a given predicate. If an element meets the condition, it is discarded; otherwise, it is kept.
# Discard the even numbers
<- discard(numbers, ~ .x %% 2 == 0)
odd_numbers
odd_numbers
1]]
[[1] 1
[
2]]
[[1] 3
[
3]]
[[1] 5
[
4]]
[[1] 7
[
5]]
[[1] 9 [
Here, discard
removes the even numbers, leaving only the odd numbers in the numbers
list.
Practical Example with Built-in Dataset
Let’s use the iris
dataset to demonstrate how keep
and discard
can be applied in a practical scenario. Suppose we want to filter rows based on specific conditions for the Sepal.Length
column.
library(tidyverse)
# Keep rows where Sepal.Length is greater than 5.0
<- iris %>%
iris_keep split(1:nrow(.)) %>%
keep(~ .x$Sepal.Length > 5.0) %>%
bind_rows()
head(iris_keep)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa
2 5.4 3.9 1.7 0.4 setosa
3 5.4 3.7 1.5 0.2 setosa
4 5.8 4.0 1.2 0.2 setosa
5 5.7 4.4 1.5 0.4 setosa
6 5.4 3.9 1.3 0.4 setosa
# Discard rows where Sepal.Length is less than or equal to 5.0
<- iris %>%
iris_discard split(1:nrow(.)) %>%
discard(~ .x$Sepal.Length <= 5.0) %>%
bind_rows()
head(iris_discard)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa
2 5.4 3.9 1.7 0.4 setosa
3 5.4 3.7 1.5 0.2 setosa
4 5.8 4.0 1.2 0.2 setosa
5 5.7 4.4 1.5 0.4 setosa
6 5.4 3.9 1.3 0.4 setosa
In this example:
- We split the
iris
dataset into a list of rows. - We apply
keep
to retain rows whereSepal.Length
is greater than 5.0. - We apply
discard
to remove rows whereSepal.Length
is less than or equal to 5.0. - Finally, we use
bind_rows()
to combine the list back into a data frame.
Combining keep
and discard
with mtcars
Similarly, let’s fix the mtcars
example:
# Keep cars with mpg greater than 20 and discard cars with hp less than 100
<- mtcars %>%
filtered_cars split(1:nrow(.)) %>%
keep(~ .x$mpg > 20) %>%
discard(~ .x$hp < 100) %>%
bind_rows()
filtered_cars
mpg cyl disp hp drat wt qsec vs am gear carb21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Mazda RX4 Wag 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Lotus Europa 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 Volvo
In this combined example:
- We split the
mtcars
dataset into a list of rows. - We use
keep
to retain cars withmpg
greater than 20. - We use
discard
to remove cars withhp
less than 100. - We combine the filtered list back into a data frame using
bind_rows()
.
Do Things in Order of List/Vector: accumulate
, reduce
Sometimes, you need to perform cumulative or sequential operations on your data. This is where accumulate
and reduce
come into play. These functions allow you to apply a function iteratively across elements of a list or vector, either accumulating results at each step or reducing the list to a single value.
accumulate
The accumulate
function applies a function iteratively to the elements of a list or vector and returns a list of intermediate results.
Let’s start with a simple example:
library(tidyverse)
# A list of numbers
<- list(1, 2, 3, 4, 5)
numbers
# Cumulative sum of the numbers
<- accumulate(numbers, `+`)
cumulative_sum
cumulative_sum
1] 1 3 6 10 15 [
reduce
The reduce
function applies a function iteratively to reduce the elements of a list or vector to a single value.
Here’s a basic example:
# Sum of the numbers
<- reduce(numbers, `+`)
total_sum
total_sum
1] 15 [
Practical Example with Built-in Dataset
Let’s use the mtcars
dataset to demonstrate how accumulate
and reduce
can be applied in a practical scenario.
Using accumulate
with mtcars
Suppose we want to calculate the cumulative sum of the miles per gallon (mpg) for each car.
# Cumulative sum of mpg values
<- mtcars %>%
cumulative_mpg pull(mpg) %>%
accumulate(`+`)
cumulative_mpg
1] 21.0 42.0 64.8 86.2 104.9 123.0 137.3 161.7 184.5 203.7 221.5 237.9 255.2 270.4 280.8 291.2 305.9 338.3 368.7
[20] 402.6 424.1 439.6 454.8 468.1 487.3 514.6 540.6 571.0 586.8 606.5 621.5 642.9 [
In this example, accumulate
gives us a cumulative sum of the mpg
values for the cars in the mtcars
dataset.
Using reduce
with mtcars
Now, let’s say we want to find the product of all mpg
values:
# Product of mpg values
<- mtcars %>%
product_mpg pull(mpg) %>%
reduce(`*`)
product_mpg
1] 1.264241e+41 [
In this example, reduce
calculates the product of all mpg
values in the mtcars
dataset.
Do It Another Way: compose
and negate
Creating flexible and reusable functions is a hallmark of efficient programming. purrr
provides tools like compose
and negate
to help you build and manipulate functions more effectively. These tools allow you to combine multiple functions into one or invert the logic of a predicate function.
compose
The compose
function combines multiple functions into a single function that applies them sequentially. This can be incredibly useful for creating pipelines of operations.
Here’s a basic example:
library(tidyverse)
# Define some simple functions
<- function(x) x + 1
add1 <- function(x) x * x
square
# Compose them into a single function
<- compose(square, add1)
add1_and_square
# Apply the composed function
<- add1_and_square(2) # (2 + 1)^2 = 9
result
result
1] 9 [
In this example:
- We define two simple functions:
add1
andsquare
. - We use
compose
to create a new function,add1_and_square
, which first adds 1 to its input and then squares the result. - We apply the composed function to the number 2, yielding 9.
Practical Example with Built-in Dataset
Let’s use compose
with a more practical example involving the mtcars
dataset. Suppose we want to create a function that first scales the horsepower (hp
) by 10 and then calculates the logarithm.
# Define scaling and log functions
<- function(x) x * 10
scale_by_10 <- safely(log, otherwise = NA)
safe_log
# Compose them into a single function
<- compose(safe_log, scale_by_10)
scale_and_log
# Apply the composed function to the hp column
<- mtcars %>%
mtcars mutate(log_scaled_hp = map_dbl(hp, ~ scale_and_log(.x)$result))
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb log_scaled_hp21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 7.003065
Mazda RX4 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 7.003065
Mazda RX4 Wag 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 6.835185
Datsun 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 7.003065
Hornet 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 7.467371
Hornet Sportabout 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 6.956545 Valiant
In this example:
- We define two functions:
scale_by_10
andsafe_log
. - We compose these functions into
scale_and_log
. - We apply the composed function to the
hp
column of themtcars
dataset and add the results as a new column.
negate
The negate
function creates a new function that returns the logical negation of a predicate function. This is useful when you want to invert the logic of a condition.
Here’s a simple example:
# Define a simple predicate function
<- function(x) x %% 2 == 0
is_even
# Negate the predicate function
<- negate(is_even)
is_odd
# Apply the negated function
<- map_lgl(1:10, is_odd)
results
results
1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE [
In this example:
- We define a predicate function
is_even
to check if a number is even. - We use
negate
to create a new functionis_odd
that returns the opposite result. - We apply
is_odd
to the numbers 1 through 10.
Practical Example with Built-in Dataset
Let’s use negate
in a practical scenario with the iris
dataset. Suppose we want to filter out rows where the Sepal.Length
is not greater than 5.0.
# Define a predicate function
<- function(x) x > 5.0
is_long_sepal
# Negate the predicate function
<- negate(is_long_sepal)
is_not_long_sepal
# Filter out rows where Sepal.Length is not greater than 5.0
<- iris %>%
iris_filtered split(1:nrow(.)) %>%
discard(~ is_not_long_sepal(.x$Sepal.Length)) %>%
bind_rows()
head(iris_filtered)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa
2 5.4 3.9 1.7 0.4 setosa
3 5.4 3.7 1.5 0.2 setosa
4 5.8 4.0 1.2 0.2 setosa
5 5.7 4.4 1.5 0.4 setosa
6 5.4 3.9 1.3 0.4 setosa
In this example:
- We define a predicate function
is_long_sepal
to check ifSepal.Length
is greater than 5.0. - We use
negate
to create a new functionis_not_long_sepal
that returns the opposite result. - We use
discard
to remove rows whereSepal.Length
is not greater than 5.0, then combine the filtered list back into a data frame.
With compose
and negate
, you can create more flexible and powerful functions, allowing for more concise and readable code.
Conclusion
Congratulations! You’ve journeyed through the world of purrr
, mastering a wide array of functions and techniques to manipulate and transform your data. From basic mapping to creating powerful function compositions, purrr
equips you with tools to make your data wrangling tasks more efficient and expressive.
Whether you’re applying functions conditionally, dealing with side effects, or validating your data, purrr
has you covered. Keep exploring and experimenting with these functions to unlock the full potential of functional programming in R.
Gift for patient readers
I decided to give you some useful, yet not trivial use cases of purrr functions.
Define list of function to apply on data
<- function(x, ...) purrr::map_dbl(list(...), ~ .x(x)) apply_funs
Want to apply multiple functions to a single vector and get a tidy result? Meet apply_funs
, your new best friend! This nifty little function takes a value and a bunch of functions, then maps each function to the vector, returning the results as a neat vector.
Let’s break it down:
x
: The value you want to transform....
: A bunch of functions you want to apply tox
.purrr::map_dbl
: Maps each function in the list tox
and returns the results as a vector of doubles.
Suppose that you want to apply 3 summary functions on vector of numbers. Here’s how you can do it:
<- 1:48
number
<- apply_funs(number, mean, median, sd)
results
results
1] 24.5 24.5 14.0 [
Using pmap as equivalent of Python’s zip
Sometimes you need to zip two tables or columns together. In Python there is zip function for it, but we do not have twin function in R, unless you use pmap. I will not make it longer, so check it out in one of my previous articles.
Rendering parameterized RMarkdown reports
Assuming that you have kind of report you use for each salesperson, there is possibility, that you are changing parameters manually to generate report for person X, for date range Y, for product Z. Why not prepare lists of people, time range, and list of products, and then based on them generate series of reports by one click only.