Lua vs Embedded Lisp and potential other candidates. for set based data processing

I strongly agree with @jpjacobs’s points. Lua is an excellent choice for embedding, unless there’s something very specific about lisp that you need (for instance, if your data maps particularly well to cons-cells). I’ve used lisp for many many years, BTW, and I quite like lisp syntax, but these days I’d generally pick Lua. While … Read more

how to use pandas filter with IQR

As far as I know, the most compact notation seems to be brought by the query method. # Some test data np.random.seed(33454) df = ( # A standard distribution pd.DataFrame({‘nb’: np.random.randint(0, 100, 20)}) # Adding some outliers .append(pd.DataFrame({‘nb’: np.random.randint(100, 200, 2)})) # Reseting the index .reset_index(drop=True) ) # Computing IQR Q1 = df[‘nb’].quantile(0.25) Q3 = … Read more

Large scale data processing Hbase vs Cassandra [closed]

As a Cassandra developer, I’m better at answering the other side of the question: Cassandra scales better. Cassandra is known to scale to over 400 nodes in a cluster; when Facebook deployed Messaging on top of HBase they had to shard it across 100-node HBase sub-clusters. Cassandra supports hundreds, even thousands of ColumnFamilies. “HBase currently … Read more

tech