Pythonic way of detecting outliers in one dimensional observation data

The problem with using percentile is that the points identified as outliers is a function of your sample size. There are a huge number of ways to test for outliers, and you should give some thought to how you classify them. Ideally, you should use a-priori information (e.g. “anything above/below this value is unrealistic because…”) … Read more

ValueError: numpy.dtype has the wrong size, try recompiling

(to expand a bit on my comment) Numpy developers follow in general a policy of keeping a backward compatible binary interface (ABI). However, the ABI is not forward compatible. What that means: A package, that uses numpy in a compiled extension, is compiled against a specific version of numpy. Future version of numpy will be … Read more

Weighted standard deviation in NumPy

How about the following short “manual calculation”? def weighted_avg_and_std(values, weights): “”” Return the weighted average and standard deviation. values, weights — Numpy ndarrays with the same shape. “”” average = numpy.average(values, weights=weights) # Fast and numerically precise: variance = numpy.average((values-average)**2, weights=weights) return (average, math.sqrt(variance))

Run an OLS regression with Pandas Data Frame

I think you can almost do exactly what you thought would be ideal, using the statsmodels package which was one of pandas‘ optional dependencies before pandas‘ version 0.20.0 (it was used for a few things in pandas.stats.) >>> import pandas as pd >>> import statsmodels.formula.api as sm >>> df = pd.DataFrame({“A”: [10,20,30,40,50], “B”: [20, 30, … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)