5  Appendix

5.1 More About Comparisons

5.1.1 Equality

The == operator is the primary way to test whether two values are equal, as explained in Section 1.2.3. Nonetheless, equality can be defined in many different ways, especially when dealing with computers. As a result, R also provides several different functions to test for different kinds of equality. This describes tests of equality in more detail, and also describes some other important details of comparisons.

5.1.1.1 The == Operator

The == operator tests whether its two arguments have the exact same representation as a binary number in your computer’s memory. Before testing the arguments, the operator applies R’s rules for vectorization (Section 2.1.3), recycling (Section 2.1.4), and implicit coercion (Section 2.2.2). Until you’ve fully internalized these three rules, some results from the equality operator may seem surprising. For example:

# Recycling:
c(1, 2) == c(1, 2, 1, 2)
[1] TRUE TRUE TRUE TRUE
# Implicit coercion:
TRUE == 1
[1] TRUE
TRUE == "TRUE"
[1] TRUE
1 == "TRUE"
[1] FALSE

The length of the result from the equality operator is usually the same as its longest argument (with some exceptions).

5.1.1.2 The all.equal Function

The all.equal function tests whether its two arguments are equal up to some acceptable difference called a tolerance. Computer representations for decimal numbers are inherently imprecise, so it’s necessary to allow for very small differences between computed numbers. For example:

x = 0.5 - 0.3
y = 0.3 - 0.1

# FALSE on most machines:
x == y
[1] FALSE
# TRUE:
all.equal(x, y)
[1] TRUE

The all.equal function does not apply R’s rules for vectorization, recycling, or implicit coercion. The function returns TRUE when the arguments are equal, and returns a string summarizing the differences when they are not. For instance:

all.equal(1, c(1, 2, 1))
[1] "Numeric: lengths (1, 3) differ"

The all.equal function is often used together with the isTRUE function, which tests whether the result is TRUE:

all.equal(3, 4)
[1] "Mean relative difference: 0.3333333"
isTRUE(all.equal(3, 4))
[1] FALSE

You should generally use the all.equal function when you want to compare decimal numbers.

5.1.1.3 The identical Function

The identical function checks whether its arguments are completely identical, including their metadata (names, dimensions, and so on). For instance:

x = list(a = 1)
y = list(a = 1)
z = list(1)

identical(x, y)
[1] TRUE
identical(x, z)
[1] FALSE

The identical function does not apply R’s rules for vectorization, recycling, or implicit coercion. The result is always a single logical value.

You’ll generally use the identical function to compare non-vector objects such as lists or data frames. The function also works for vectors, but most of the time the equality operator == is sufficient.

5.1.2 The %in% Operator

Another common comparison is to check whether elements of one vector are contained in another vector at any position. For instance, suppose you want to check whether 1 or 2 appear anywhere in a longer vector x. Here’s how to do it:

x = c(3, 4, 2, 7, 3, 7)
c(1, 2) %in% x
[1] FALSE  TRUE

R returns FALSE for the 1 because there’s no 1 in x, and returns TRUE for the 2 because there is a 2 in x.

Notice that this is different from comparing with the equality operator ==. If you use use the equality operator, the shorter vector is recycled until its length matches the longer one, and then compared element-by-element. For the example, this means only the elements at odd-numbered positions are compared to 1, and only the elements at even-numbered positions are compared to 2:

c(1, 2) == x
[1] FALSE FALSE FALSE FALSE FALSE FALSE

5.1.3 Summarizing Comparisons

The comparison operators are vectorized, so they compare their arguments element-by-element:

c(1, 2, 3) < c(1, 3, -3)
[1] FALSE  TRUE FALSE
c("he", "saw", "her") == c("she", "saw", "him")
[1] FALSE  TRUE FALSE

What if you want to summarize whether all the elements in a vector are equal (or unequal)? You can use the all function on any logical vector to get a summary. The all function takes a vector of logical values and returns TRUE if all of them are TRUE, and returns FALSE otherwise:

all(c(1, 2, 3) < c(1, 3, -3))
[1] FALSE

The related any function returns TRUE if any one element is TRUE, and returns FALSE otherwise:

any(c("hi", "hello") == c("hi", "bye"))
[1] TRUE

5.1.4 Other Pitfalls

New programmers sometimes incorrectly think they need to append == TRUE to their comparisons. This is redundant, makes your code harder to understand, and wastes computational time. Comparisons already return logical values. If the result of the comparison is TRUE, then TRUE == TRUE is again just TRUE. If the result is FALSE, then FALSE == TRUE is again just FALSE. Likewise, if you want to invert a condition, choose an appropriate operator rather than appending == FALSE.

5.2 The drop Parameter

If you use two-dimensional indexing with [ to select exactly one column, you get a vector:

result = banknotes[1:3, 2]
class(result)
[1] "character"

The container is dropped, even though the indexing operator [ usually keeps containers. This also occurs for matrices. You can control this behavior with the drop parameter:

result = banknotes[1:3, 2, drop = FALSE]
class(result)
[1] "data.frame"

The default is drop = TRUE.