dplyr – Page 19 – Tarik Billa

Change value of variable with dplyr

November 29, 2022 by Tarik

We can use replace to change the values in ‘mpg’ to NA that corresponds to cyl==4. mtcars %>% mutate(mpg=replace(mpg, cyl==4, NA)) %>% as.data.frame()

Filter rows which contain a certain string

November 26, 2022 by Tarik

The answer to the question was already posted by the @latemail in the comments above. You can use regular expressions for the second and subsequent arguments of filter like this: dplyr::filter(df, !grepl(“RTB”,TrackingPixel)) Since you have not provided the original data, I will add a toy example using the mtcars data set. Imagine you are only … Read more

Can dplyr join on multiple columns or composite key?

November 21, 2022 by Tarik

Updating to use tibble() You can pass a named vector of length greater than 1 to the by argument of left_join(): library(dplyr) d1 <- tibble( x = letters[1:3], y = LETTERS[1:3], a = rnorm(3) ) d2 <- tibble( x2 = letters[3:1], y2 = LETTERS[3:1], b = rnorm(3) ) left_join(d1, d2, by = c(“x” = “x2”, … Read more

How to select the rows with maximum values in each group with dplyr? [duplicate]

November 16, 2022 by Tarik

Try this: result <- df %>% group_by(A, B) %>% filter(value == max(value)) %>% arrange(A,B,C) Seems to work: identical( as.data.frame(result), ddply(df, .(A, B), function(x) x[which.max(x$value),]) ) #[1] TRUE As pointed out in the comments, slice may be preferred here as per @RoyalITS’ answer below if you strictly only want 1 row per group. This answer will … Read more

Sum across multiple columns with dplyr

November 16, 2022 by Tarik

dplyr >= 1.0.0 using across sum up each row using rowSums (rowwise works for any aggreation, but is slower) df %>% replace(is.na(.), 0) %>% mutate(sum = rowSums(across(where(is.numeric)))) sum down each column df %>% summarise(across(everything(), ~ sum(., is.na(.), 0))) dplyr < 1.0.0 sum up each row df %>% replace(is.na(.), 0) %>% mutate(sum = rowSums(.[1:5])) sum down … Read more

Group by multiple columns in dplyr, using string vector input

October 24, 2022 by Tarik

Just so as to write the code in full, here’s an update on Hadley’s answer with the new syntax: library(dplyr) df <- data.frame( asihckhdoydk = sample(LETTERS[1:3], 100, replace=TRUE), a30mvxigxkgh = sample(LETTERS[1:3], 100, replace=TRUE), value = rnorm(100) ) # Columns you want to group by grp_cols <- names(df)[-3] # Convert character vector to list of symbols … Read more

Summarizing multiple columns with dplyr? [duplicate]

October 24, 2022 by Tarik

In dplyr (>=1.00) you may use across(everything() in summarise to apply a function to all variables: library(dplyr) df %>% group_by(grp) %>% summarise(across(everything(), list(mean))) #> # A tibble: 3 x 5 #> grp a b c d #> <int> <dbl> <dbl> <dbl> <dbl> #> 1 1 3.08 2.98 2.98 2.91 #> 2 2 3.03 3.04 2.97 … Read more

Remove duplicated rows using dplyr

October 22, 2022 by Tarik

Here is a solution using dplyr >= 0.5. library(dplyr) set.seed(123) df <- data.frame( x = sample(0:1, 10, replace = T), y = sample(0:1, 10, replace = T), z = 1:10 ) > df %>% distinct(x, y, .keep_all = TRUE) x y z 1 0 1 1 2 1 0 2 3 1 1 4

Select first and last row from grouped data

October 21, 2022 by Tarik

There is probably a faster way: df %>% group_by(id) %>% arrange(stopSequence) %>% filter(row_number()==1 | row_number()==n())

How to interpret dplyr message `summarise()` regrouping output by ‘x’ (override with `.groups` argument)?

October 20, 2022 by Tarik

It is just a friendly warning message. By default, if there is any grouping before the summarise, it drops one group variable i.e. the last one specified in the group_by. If there is only one grouping variable, there won’t be any grouping attribute after the summarise and if there are more than one i.e. here … Read more