scipy – Page 7 – Tarik Billa

pandas columns correlation with statistical significance

September 14, 2023 by Tarik

To calculate all the p-values at once, you can use calculate_pvalues function (code below): df = pd.DataFrame({‘A’:[1,2,3], ‘B’:[2,5,3], ‘C’:[5,2,1], ‘D’:[‘text’,2,3] }) calculate_pvalues(df) The output is similar to the corr() (but with p-values): A B C A 0 0.7877 0.1789 B 0.7877 0 0.6088 C 0.1789 0.6088 0 Details: Column D is automatically ignored as it … Read more

Determining the byte size of a scipy.sparse matrix?

September 10, 2023 by Tarik

A sparse matrix is constructed from regular numpy arrays, so you can get the byte count for any of these just as you would a regular array. If you just want the number of bytes of the array elements: >>> from scipy.sparse import csr_matrix >>> a = csr_matrix(np.arange(12).reshape((4,3))) >>> a.data.nbytes 88 If you want the … Read more

making square axes plot with log2 scales in matplotlib

September 7, 2023 by Tarik

Just specify basex=2 or basey=2. import matplotlib.pyplot as plt fig, ax = plt.subplots() ax.set_xscale(‘log’, basex=2) ax.set_yscale(‘log’, basey=2) ax.plot(range(1024)) plt.show() For the zero-crossing behavior, what you’re referring to is a “Symmetric Log” plot (a.k.a. “symlog”). For whatever it’s worth, data isn’t filtered out, it’s just a linear plot near 0 and a log plot everywhere else. … Read more

How to plot a 3D density map in python with matplotlib

September 1, 2023 by Tarik

Thanks to mwaskon for suggesting the mayavi library. I recreated the density scatter plot in mayavi as follows: import numpy as np from scipy import stats from mayavi import mlab mu, sigma = 0, 0.1 x = 10*np.random.normal(mu, sigma, 5000) y = 10*np.random.normal(mu, sigma, 5000) z = 10*np.random.normal(mu, sigma, 5000) xyz = np.vstack([x,y,z]) kde = … Read more

Bézier curve fitting with SciPy

August 30, 2023 by Tarik

Here’s a way to do Bezier curves with numpy: import numpy as np from scipy.special import comb def bernstein_poly(i, n, t): “”” The Bernstein polynomial of n, i as a function of t “”” return comb(n, i) * ( t**(n-i) ) * (1 – t)**i def bezier_curve(points, nTimes=1000): “”” Given a set of control points, … Read more

Parse a Pandas column to Datetime when importing table from SQL database and filtering rows by date

August 28, 2023 by Tarik

Pandas is aware of the object datetime but when you use some of the import functions it is taken as a string. So what you need to do is make sure the column is set as the datetime type not as a string. Then you can make your query. df[‘date’] = pd.to_datetime(df[‘date’]) df_masked = df[(df[‘date’] … Read more

How is a Pandas crosstab different from a Pandas pivot_table?

August 28, 2023 by Tarik

The main difference between the two is the pivot_table expects your input data to already be a DataFrame; you pass a DataFrame to pivot_table and specify the index/columns/values by passing the column names as strings. With cross_tab, you don’t necessarily need to have a DataFrame going in, as you just pass array-like objects for index/columns/values. … Read more

Fitting a gamma distribution with (python) Scipy

August 28, 2023 by Tarik

Generate some gamma data: import scipy.stats as stats alpha = 5 loc = 100.5 beta = 22 data = stats.gamma.rvs(alpha, loc=loc, scale=beta, size=10000) print(data) # [ 202.36035683 297.23906376 249.53831795 …, 271.85204096 180.75026301 # 364.60240242] Here we fit the data to the gamma distribution: fit_alpha, fit_loc, fit_beta=stats.gamma.fit(data) print(fit_alpha, fit_loc, fit_beta) # (5.0833692504230008, 100.08697963283467, 21.739518937816108) print(alpha, loc, … Read more

ANOVA in python using pandas dataframe with statsmodels or scipy?

August 25, 2023 by Tarik

I set up a direct comparison to test them, found that their assumptions can differ slightly , got a hint from a statistician, and here is an example of ANOVA on a pandas dataframe matching R’s results: import pandas as pd import statsmodels.api as sm from statsmodels.formula.api import ols # R code on R sample … Read more

Fit sigmoid function (“S” shape curve) to data using Python

August 25, 2023 by Tarik

After great help from @Brenlla the code was modified to: def sigmoid(x, L ,x0, k, b): y = L / (1 + np.exp(-k*(x-x0))) + b return (y) p0 = [max(ydata), np.median(xdata),1,min(ydata)] # this is an mandatory initial guess popt, pcov = curve_fit(sigmoid, xdata, ydata,p0, method=’dogbox’) The parameters optimized are L, x0, k, b, who are … Read more