data.table – Page 10

Left join using data.table

February 21, 2023 by Tarik

If you want to add the b values of B to A, then it’s best to join A with B and update A by reference as follows: A[B, on = ‘a’, bb := i.b] which gives: > A a b bb 1: 1 12 NA 2: 2 13 13 3: 3 14 14 4: 4 … Read more

Summarizing multiple columns with data.table

February 10, 2023 by Tarik

You can use a simple lapply statement with .SD dt[, lapply(.SD, sum, na.rm=TRUE), by=category ] category index a b z c d 1: c 19 51.13289 48.49994 42.50884 9.535588 11.53253 2: b 9 17.34860 20.35022 10.32514 11.764105 10.53127 3: a 27 25.91616 31.12624 0.00000 29.197343 31.71285 If you only want to summarize over certain columns, … Read more

What you can do with a data.frame that you can’t with a data.table?

February 7, 2023 by Tarik

From the data.table FAQ FAQ 1.8 OK, I’m starting to see what data.table is about, but why didn’t you enhance data.frame in R? Why does it have to be a new package? As FAQ 1.1 highlights, j in [.data.table is fundamentally different from j in [.data.frame. Even something as simple as DF[,1] would break existing … Read more

Add multiple columns to R data.table in one function call?

February 7, 2023 by Tarik

Since data.table v1.8.3, you can do this: DT[, c(“new1″,”new2”) := myfun(y,v)] Another option is storing the output of the function and adding the columns one-by-one: z <- myfun(DT$y,DT$v) head(DT[,new1:=z$r1][,new2:=z$r2]) # x y v new1 new2 # [1,] a 1 42 43 -41 # [2,] a 3 42 45 -39 # [3,] a 6 42 48 … Read more

Error: package or namespace load failed for ggplot2 and for data.table

February 3, 2023 by Tarik

This solved the issue: remove.packages(c(“ggplot2”, “data.table”)) install.packages(‘Rcpp’, dependencies = TRUE) install.packages(‘ggplot2’, dependencies = TRUE) install.packages(‘data.table’, dependencies = TRUE)

How to create a lag variable within each group?

January 30, 2023 by Tarik

You could do this within data.table library(data.table) data[, lag.value:=c(NA, value[-.N]), by=groups] data # time groups value lag.value #1: 1 a 0.02779005 NA #2: 2 a 0.88029938 0.02779005 #3: 3 a -1.69514201 0.88029938 #4: 1 b -1.27560288 NA #5: 2 b -0.65976434 -1.27560288 #6: 3 b -1.37804943 -0.65976434 #7: 4 b 0.12041778 -1.37804943 For multiple columns: … Read more

Convert a data frame to a data.table without copy

January 30, 2023 by Tarik

This is available from v1.9.0+. From NEWS: o Following this S.O. post, a function setDT is now implemented that takes a list (named and/or unnamed), data.frame (or data.table) as input and returns the same object as a data.table by reference (without any copy). See ?setDT examples for more. This is in accordance with data.table naming … Read more

Using data.table package inside my own package

January 27, 2023 by Tarik

Andrie’s guess is right, +1. There is a FAQ on it (see vignette(“datatable-faq”)), as well as a new vignette on importing data.table: FAQ 6.9: I have created a package that depends on data.table. How do I ensure my package is data.table-aware so that inheritance from data.frame works? Either i) include data.table in the Depends: field … Read more