statistics – Page 15

How to calculate cumulative normal distribution?

December 12, 2022 by Tarik

Here’s an example: >>> from scipy.stats import norm >>> norm.cdf(1.96) 0.9750021048517795 >>> norm.cdf(-1.96) 0.024997895148220435 In other words, approximately 95% of the standard normal interval lies within two standard deviations, centered on a standard mean of zero. If you need the inverse CDF: >>> norm.ppf(norm.cdf(1.96)) array(1.9599999999999991)

How do I calculate r-squared using Python and Numpy?

December 8, 2022 by Tarik

A very late reply, but just in case someone needs a ready function for this: scipy.stats.linregress i.e. slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x, y) as in @Adam Marples’s answer.

Statistics: combinations in Python

December 1, 2022 by Tarik

See scipy.special.comb (scipy.misc.comb in older versions of scipy). When exact is False, it uses the gammaln function to obtain good precision without taking much time. In the exact case it returns an arbitrary-precision integer, which might take a long time to compute.

Multiple linear regression in Python

November 21, 2022 by Tarik

sklearn.linear_model.LinearRegression will do it: from sklearn import linear_model clf = linear_model.LinearRegression() clf.fit([[getattr(t, ‘x%d’ % i) for i in range(1, 8)] for t in texts], [t.y for t in texts]) Then clf.coef_ will have the regression coefficients. sklearn.linear_model also has similar interfaces to do various kinds of regularizations on the regression.

How to make execution pause, sleep, wait for X seconds in R?

November 12, 2022 by Tarik

See help(Sys.sleep). For example, from ?Sys.sleep testit <- function(x) { p1 <- proc.time() Sys.sleep(x) proc.time() – p1 # The cpu usage should be negligible } testit(3.7) Yielding > testit(3.7) user system elapsed 0.000 0.000 3.704

Compute a confidence interval from sample data

October 23, 2022 by Tarik

import numpy as np import scipy.stats def mean_confidence_interval(data, confidence=0.95): a = 1.0 * np.array(data) n = len(a) m, se = np.mean(a), scipy.stats.sem(a) h = se * scipy.stats.t.ppf((1 + confidence) / 2., n-1) return m, m-h, m+h You can calculate like this.

Fitting empirical distribution to theoretical ones with Scipy (Python)?

October 21, 2022 by Tarik

Distribution Fitting with Sum of Square Error (SSE) This is an update and modification to Saullo’s answer, that uses the full list of the current scipy.stats distributions and returns the distribution with the least SSE between the distribution’s histogram and the data’s histogram. Example Fitting Using the El Niño dataset from statsmodels, the distributions are … Read more

Workflow for statistical analysis and report writing

October 20, 2022 by Tarik

I generally break my projects into 4 pieces: load.R clean.R func.R do.R load.R: Takes care of loading in all the data required. Typically this is a short file, reading in data from files, URLs and/or ODBC. Depending on the project at this point I’ll either write out the workspace using save() or just keep things … Read more

Calculating Pearson correlation and significance in Python

October 12, 2022 by Tarik

You can have a look at scipy.stats: from pydoc import help from scipy.stats.stats import pearsonr help(pearsonr) >>> Help on function pearsonr in module scipy.stats.stats: pearsonr(x, y) Calculates a Pearson correlation coefficient and the p-value for testing non-correlation. The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each … Read more

Find p-value (significance) in scikit-learn LinearRegression

October 11, 2022 by Tarik

This is kind of overkill but let’s give it a go. First lets use statsmodel to find out what the p-values should be import pandas as pd import numpy as np from sklearn import datasets, linear_model from sklearn.linear_model import LinearRegression import statsmodels.api as sm from scipy import stats diabetes = datasets.load_diabetes() X = diabetes.data y … Read more