pandas columns correlation with statistical significance

To calculate all the p-values at once, you can use calculate_pvalues function (code below): df = pd.DataFrame({‘A’:[1,2,3], ‘B’:[2,5,3], ‘C’:[5,2,1], ‘D’:[‘text’,2,3] }) calculate_pvalues(df) The output is similar to the corr() (but with p-values): A B C A 0 0.7877 0.1789 B 0.7877 0 0.6088 C 0.1789 0.6088 0 Details: Column D is automatically ignored as it … Read more

making square axes plot with log2 scales in matplotlib

Just specify basex=2 or basey=2. import matplotlib.pyplot as plt fig, ax = plt.subplots() ax.set_xscale(‘log’, basex=2) ax.set_yscale(‘log’, basey=2) ax.plot(range(1024)) plt.show() For the zero-crossing behavior, what you’re referring to is a “Symmetric Log” plot (a.k.a. “symlog”). For whatever it’s worth, data isn’t filtered out, it’s just a linear plot near 0 and a log plot everywhere else. … Read more

How to plot a 3D density map in python with matplotlib

Thanks to mwaskon for suggesting the mayavi library. I recreated the density scatter plot in mayavi as follows: import numpy as np from scipy import stats from mayavi import mlab mu, sigma = 0, 0.1 x = 10*np.random.normal(mu, sigma, 5000) y = 10*np.random.normal(mu, sigma, 5000) z = 10*np.random.normal(mu, sigma, 5000) xyz = np.vstack([x,y,z]) kde = … Read more

Bézier curve fitting with SciPy

Here’s a way to do Bezier curves with numpy: import numpy as np from scipy.special import comb def bernstein_poly(i, n, t): “”” The Bernstein polynomial of n, i as a function of t “”” return comb(n, i) * ( t**(n-i) ) * (1 – t)**i def bezier_curve(points, nTimes=1000): “”” Given a set of control points, … Read more

Parse a Pandas column to Datetime when importing table from SQL database and filtering rows by date

Pandas is aware of the object datetime but when you use some of the import functions it is taken as a string. So what you need to do is make sure the column is set as the datetime type not as a string. Then you can make your query. df[‘date’] = pd.to_datetime(df[‘date’]) df_masked = df[(df[‘date’] … Read more

How is a Pandas crosstab different from a Pandas pivot_table?

The main difference between the two is the pivot_table expects your input data to already be a DataFrame; you pass a DataFrame to pivot_table and specify the index/columns/values by passing the column names as strings. With cross_tab, you don’t necessarily need to have a DataFrame going in, as you just pass array-like objects for index/columns/values. … Read more

Fitting a gamma distribution with (python) Scipy

Generate some gamma data: import scipy.stats as stats alpha = 5 loc = 100.5 beta = 22 data = stats.gamma.rvs(alpha, loc=loc, scale=beta, size=10000) print(data) # [ 202.36035683 297.23906376 249.53831795 …, 271.85204096 180.75026301 # 364.60240242] Here we fit the data to the gamma distribution: fit_alpha, fit_loc, fit_beta=stats.gamma.fit(data) print(fit_alpha, fit_loc, fit_beta) # (5.0833692504230008, 100.08697963283467, 21.739518937816108) print(alpha, loc, … Read more

ANOVA in python using pandas dataframe with statsmodels or scipy?

I set up a direct comparison to test them, found that their assumptions can differ slightly , got a hint from a statistician, and here is an example of ANOVA on a pandas dataframe matching R’s results: import pandas as pd import statsmodels.api as sm from statsmodels.formula.api import ols # R code on R sample … Read more

Fit sigmoid function (“S” shape curve) to data using Python

After great help from @Brenlla the code was modified to: def sigmoid(x, L ,x0, k, b): y = L / (1 + np.exp(-k*(x-x0))) + b return (y) p0 = [max(ydata), np.median(xdata),1,min(ydata)] # this is an mandatory initial guess popt, pcov = curve_fit(sigmoid, xdata, ydata,p0, method=’dogbox’) The parameters optimized are L, x0, k, b, who are … Read more