Left join using data.table
If you want to add the b values of B to A, then it’s best to join A with B and update A by reference as follows: A[B, on = ‘a’, bb := i.b] which gives: > A a b bb 1: 1 12 NA 2: 2 13 13 3: 3 14 14 4: 4 … Read more
If you want to add the b values of B to A, then it’s best to join A with B and update A by reference as follows: A[B, on = ‘a’, bb := i.b] which gives: > A a b bb 1: 1 12 NA 2: 2 13 13 3: 3 14 14 4: 4 … Read more
I’ve added this to the list here. And hopefully we’ll be able to deliver as planned. The reason is most likely that by=.EACHI is a recent feature (since 1.9.4), but what it does isn’t. Let me explain with an example. Suppose we have two data.tables X and Y: X = data.table(x = c(1,1,1,2,2,5,6), y = … Read more
You can use a simple lapply statement with .SD dt[, lapply(.SD, sum, na.rm=TRUE), by=category ] category index a b z c d 1: c 19 51.13289 48.49994 42.50884 9.535588 11.53253 2: b 9 17.34860 20.35022 10.32514 11.764105 10.53127 3: a 27 25.91616 31.12624 0.00000 29.197343 31.71285 If you only want to summarize over certain columns, … Read more
From the data.table FAQ FAQ 1.8 OK, I’m starting to see what data.table is about, but why didn’t you enhance data.frame in R? Why does it have to be a new package? As FAQ 1.1 highlights, j in [.data.table is fundamentally different from j in [.data.frame. Even something as simple as DF[,1] would break existing … Read more
Since data.table v1.8.3, you can do this: DT[, c(“new1″,”new2”) := myfun(y,v)] Another option is storing the output of the function and adding the columns one-by-one: z <- myfun(DT$y,DT$v) head(DT[,new1:=z$r1][,new2:=z$r2]) # x y v new1 new2 # [1,] a 1 42 43 -41 # [2,] a 3 42 45 -39 # [3,] a 6 42 48 … Read more
This solved the issue: remove.packages(c(“ggplot2”, “data.table”)) install.packages(‘Rcpp’, dependencies = TRUE) install.packages(‘ggplot2’, dependencies = TRUE) install.packages(‘data.table’, dependencies = TRUE)
Use by=list(adShown,url) instead of by=c(“adShown”,”url”) Example: set.seed(007) DF <- data.frame(X=1:20, Y=sample(c(0,1), 20, TRUE), Z=sample(0:5, 20, TRUE)) library(data.table) DT <- data.table(DF) DT[, Mean:=mean(X), by=list(Y, Z)] X Y Z Mean 1: 1 1 3 1.000000 2: 2 0 1 9.333333 3: 3 0 5 7.400000 4: 4 0 5 7.400000 5: 5 0 5 7.400000 6: 6 … Read more
You could do this within data.table library(data.table) data[, lag.value:=c(NA, value[-.N]), by=groups] data # time groups value lag.value #1: 1 a 0.02779005 NA #2: 2 a 0.88029938 0.02779005 #3: 3 a -1.69514201 0.88029938 #4: 1 b -1.27560288 NA #5: 2 b -0.65976434 -1.27560288 #6: 3 b -1.37804943 -0.65976434 #7: 4 b 0.12041778 -1.37804943 For multiple columns: … Read more
This is available from v1.9.0+. From NEWS: o Following this S.O. post, a function setDT is now implemented that takes a list (named and/or unnamed), data.frame (or data.table) as input and returns the same object as a data.table by reference (without any copy). See ?setDT examples for more. This is in accordance with data.table naming … Read more
Andrie’s guess is right, +1. There is a FAQ on it (see vignette(“datatable-faq”)), as well as a new vignette on importing data.table: FAQ 6.9: I have created a package that depends on data.table. How do I ensure my package is data.table-aware so that inheritance from data.frame works? Either i) include data.table in the Depends: field … Read more