In Scipy how and why does curve_fit calculate the covariance of the parameter estimates

OK, I think I found the answer. First the solution: cov_x*s_sq is simply the covariance of the parameters which is what you want. Taking sqrt of the diagonal elements will give you standard deviation (but be careful about covariances!). Residual variance = reduced chi square = s_sq = sum[(f(x)-y)^2]/(N-n), where N is number of data … Read more

How to correctly use scipy’s skew and kurtosis functions?

These functions calculate moments of the probability density distribution (that’s why it takes only one parameter) and doesn’t care about the “functional form” of the values. These are meant for “random datasets” (think of them as measures like mean, standard deviation, variance): import numpy as np from scipy.stats import kurtosis, skew x = np.random.normal(0, 2, … Read more

Python out of memory on large CSV file (numpy)

As other folks have mentioned, for a really large file, you’re better off iterating. However, you do commonly want the entire thing in memory for various reasons. genfromtxt is much less efficient than loadtxt (though it handles missing data, whereas loadtxt is more “lean and mean”, which is why the two functions co-exist). If your … Read more

SciPy build/install Mac Osx

Your problem is that you need to install a Fortran compiler to build scipy. Also, if you already have a numpy that’s built with Fortran support disabled, you may have to replace it. Some of Apple’s pre-installed Python versions have such a numpy build pre-installed. The easiest way to get Fortran is with Homebrew. As … Read more

Show confidence limits and prediction limits in scatter plot

Here’s what I put together. I tried to closely emulate your screenshot. Given import numpy as np import scipy as sp import scipy.stats as stats import matplotlib.pyplot as plt %matplotlib inline # Raw Data heights = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65]) weights = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45]) Two detailed options to plot confidence intervals: def plot_ci_manual(t, s_err, n, x, x2, y2, ax=None): … Read more

Calculating Slopes in Numpy (or Scipy)

The fastest and the most efficient way would be to use a native scipy function from linregress which calculates everything: slope : slope of the regression line intercept : intercept of the regression line r-value : correlation coefficient p-value : two-sided p-value for a hypothesis test whose null hypothesis is that the slope is zero … Read more