Logical vectors

tidyverse
R for Data Science (2e)
Solutions. Chapter 12, R for Data Science (2e)
Author

Gladymar Pérez Chacón

Published

April 6, 2025

A soft intro to logical vectors in R

In this chapter we will work with nycflights13 and tidyverse packages

Excercises 12.2.4

Q1. How does dplyr::near() work? Type near() to see the source code. is sqrt(2)^2 near 2?

Step 1. To begin, I asked Google: “How can I view the source code of a function in R?”.

Typing near in the console returns the following:

near
function (x, y, tol = .Machine$double.eps^0.5) 
{
    abs(x - y) < tol
}
<bytecode: 0x000001b7c23a4650>
<environment: namespace:dplyr>

Typing view(near) in the console displays the same output in a separate tab.

Step 2. Compare the two vectors provided in Chapter 12.

x <- c(1 / 49 * 49, sqrt(2) ^ 2)
x
[1] 1 2

Rewrite the vector x in a simpler way, to better understand how near works. For example:

y <- c(1,2)

Are x and y identical?

x == y
[1] FALSE FALSE

Not really! computers store numbers with a fixed number of decimal points. See below.

print(x, digits = 16)
[1] 0.9999999999999999 2.0000000000000004
print (y, digits = 16)
[1] 1 2

And dplyr::near ignores small differences.

abs(x - y) < .Machine$double.eps^0.5
[1] TRUE TRUE

Q2. Use mutate(), is.na(), and count() together to describe how the missing values in dep_time, sched_dep_time and dep_delay are connected.

One way to figure this out is:

nycflights13::flights |> 
  mutate(missing_sched_dep_time = case_when(is.na(sched_dep_time) ~ TRUE, TRUE ~ FALSE),
         missing_dep_time = case_when(is.na(dep_time) ~ TRUE, TRUE ~ FALSE),
         missing_dep_delay = case_when(is.na(dep_delay) ~ TRUE, TRUE ~ FALSE)) |> 
  group_by(missing_sched_dep_time, missing_dep_time, missing_dep_delay) |> count()
# A tibble: 2 × 4
  missing_sched_dep_time missing_dep_time missing_dep_delay      n
  <lgl>                  <lgl>            <lgl>              <int>
1 FALSE                  FALSE            FALSE             328521
2 FALSE                  TRUE             TRUE                8255

Another way, without dplyr::mutate()

nycflights13::flights|> group_by(is.na(sched_dep_time), is.na(dep_time), is.na(dep_delay)) |> count()
# A tibble: 2 × 4
  `is.na(sched_dep_time)` `is.na(dep_time)` `is.na(dep_delay)`      n
  <lgl>                   <lgl>             <lgl>               <int>
1 FALSE                   FALSE             FALSE              328521
2 FALSE                   TRUE              TRUE                 8255

Flights with missing data on ‘departure time’ also had missing data on ‘departure delay’. However, scheduled departure time was always reported (i.e., no missing data).