Leaving values blank if not passed in str.format

You can follow the recommendation in PEP 3101 and use a subclass Formatter: import string class BlankFormatter(string.Formatter): def __init__(self, default=””): self.default=default def get_value(self, key, args, kwds): if isinstance(key, str): return kwds.get(key, self.default) else: return string.Formatter.get_value(key, args, kwds) kwargs = {“name”: “mark”, “adj”: “mad”} fmt=BlankFormatter() print fmt.format(“My name is {name} and I’m really {adj}.”, **kwargs) # … Read more

Randomly insert NA’s values in a pandas dataframe

Here’s a way to clear exactly 10% of cells (or rather, as close to 10% as can be achieved with the existing data frame’s size). import random ix = [(row, col) for row in range(df.shape[0]) for col in range(df.shape[1])] for row, col in random.sample(ix, int(round(.1*len(ix)))): df.iat[row, col] = np.nan Here’s a way to clear cells … Read more

python scikit-learn clustering with missing data

I think you can use an iterative EM-type algorithm: Initialize missing values to their column means Repeat until convergence: Perform K-means clustering on the filled-in data Set the missing values to the centroid coordinates of the clusters to which they were assigned Implementation import numpy as np from sklearn.cluster import KMeans def kmeans_missing(X, n_clusters, max_iter=10): … Read more

Fill in missing pandas data with previous non-missing value, grouped by key

You could perform a groupby/forward-fill operation on each group: import numpy as np import pandas as pd df = pd.DataFrame({‘id’: [1,1,2,2,1,2,1,1], ‘x’:[10,20,100,200,np.nan,np.nan,300,np.nan]}) df[‘x’] = df.groupby([‘id’])[‘x’].ffill() print(df) yields id x 0 1 10.0 1 1 20.0 2 2 100.0 3 2 200.0 4 1 20.0 5 2 200.0 6 1 300.0 7 1 300.0

Pandas Dataframe: Replacing NaN with row average

As commented the axis argument to fillna is NotImplemented. df.fillna(df.mean(axis=1), axis=1) Note: this would be critical here as you don’t want to fill in your nth columns with the nth row average. For now you’ll need to iterate through: m = df.mean(axis=1) for i, col in enumerate(df): # using i allows for duplicate columns # … Read more

Missing values in scikits machine learning

Missing values are simply not supported in scikit-learn. There has been discussion on the mailing list about this before, but no attempt to actually write code to handle them. Whatever you do, don’t use NaN to encode missing values, since many of the algorithms refuse to handle samples containing NaNs. The above answer is outdated; … Read more

Pandas: print column name with missing values

df.isnull().any() generates a boolean array (True if the column has a missing value, False otherwise). You can use it to index into df.columns: df.columns[df.isnull().any()] will return a list of the columns which have missing values. df = pd.DataFrame({‘A’: [1, 2, 3], ‘B’: [1, 2, np.nan], ‘C’: [4, 5, 6], ‘D’: [np.nan, np.nan, np.nan]}) df Out: … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)