Why is rbindlist “better” than rbind?

rbindlist is an optimized version of do.call(rbind, list(…)), which is known for being slow when using rbind.data.frame Where does it really excel Some questions that show where rbindlist shines are Fast vectorized merge of list of data.frames by row Trouble converting long list of data.frames (~1 million) to single data.frame using do.call and ldply These … Read more

Why were pandas merges in python faster than data.table merges in R in 2012?

The reason pandas is faster is because I came up with a better algorithm, which is implemented very carefully using a fast hash table implementation – klib and in C/Cython to avoid the Python interpreter overhead for the non-vectorizable parts. The algorithm is described in some detail in my presentation: A look inside pandas design … Read more

How to delete a row by reference in data.table?

Good question. data.table can’t delete rows by reference yet. data.table can add and delete columns by reference since it over-allocates the vector of column pointers, as you know. The plan is to do something similar for rows and allow fast insert and delete. A row delete would use memmove in C to budge up the … Read more

Fastest way to replace NAs in a large data.table

Here’s a solution using data.table’s := operator, building on Andrie and Ramnath’s answers. require(data.table) # v1.6.6 require(gdata) # v2.8.2 set.seed(1) dt1 = create_dt(2e5, 200, 0.1) dim(dt1) [1] 200000 200 # more columns than Ramnath’s answer which had 5 not 200 f_andrie = function(dt) remove_na(dt) f_gdata = function(dt, un = 0) gdata::NAToUnknown(dt, un) f_dowle = function(dt) … Read more

What does .SD stand for in data.table in R

.SD stands for something like “Subset of Data.table”. There’s no significance to the initial “.”, except that it makes it even more unlikely that there will be a clash with a user-defined column name. If this is your data.table: DT = data.table(x=rep(c(“a”,”b”,”c”),each=2), y=c(1,3), v=1:6) setkey(DT, y) DT # x y v # 1: a 1 … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)