Experience with using h5py to do analytical work on big data in Python?

We use Python in conjunction with h5py, numpy/scipy and boost::python to do data analysis. Our typical datasets have sizes of up to a few hundred GBs. HDF5 advantages: data can be inspected conveniently using the h5view application, h5py/ipython and the h5* commandline tools APIs are available for different platforms and languages structure data using groups … Read more

Nested ifelse statement

If you are using any spreadsheet application there is a basic function if() with syntax: if(<condition>, <yes>, <no>) Syntax is exactly the same for ifelse() in R: ifelse(<condition>, <yes>, <no>) The only difference to if() in spreadsheet application is that R ifelse() is vectorized (takes vectors as input and return vector on output). Consider the … Read more

Large, persistent DataFrame in pandas

Wes is of course right! I’m just chiming in to provide a little more complete example code. I had the same issue with a 129 Mb file, which was solved by: import pandas as pd tp = pd.read_csv(‘large_dataset.csv’, iterator=True, chunksize=1000) # gives TextFileReader, which is iterable with chunks of 1000 rows. df = pd.concat(tp, ignore_index=True) … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)