Pandas – check if ALL values are NaN in Series
Yes, that’s correct, but I think a more idiomatic way would be: mys.isnull().all()
Yes, that’s correct, but I think a more idiomatic way would be: mys.isnull().all()
If the relevant entries in Charge_Per_Line are empty (NaN) when you read into pandas, you can use df.dropna: df = df.dropna(axis=0, subset=[‘Charge_Per_Line’]) If the values are genuinely -, then you can replace them with np.nan and then use df.dropna: import numpy as np df[‘Charge_Per_Line’] = df[‘Charge_Per_Line’].replace(‘-‘, np.nan) df = df.dropna(axis=0, subset=[‘Charge_Per_Line’])
If you’re sticking to using pure-pandas, for both performance and practicality I think you should use regex for this task. However, you will need to properly escape any special characters in the substrings first to ensure that they are matched literally (and not used as regex meta characters). This is easy to do using re.escape: … Read more
Here’s a simple method using only pandas functions: import pandas as pd s = pd.Series([ [‘slim’, ‘waist’, ‘man’], [‘slim’, ‘waistline’], [‘santa’]]) Then s.apply(pd.Series).stack().reset_index(drop=True) gives the desired output. In some cases you might want to save the original index and add a second level to index the nested elements, e.g. 0 0 slim 1 waist 2 … Read more
You can just use map: In [7]: df = pd.DataFrame({‘Status’:[‘Delivered’, ‘Delivered’, ‘Undelivered’, ‘SomethingElse’]}) In [8]: df Out[8]: Status 0 Delivered 1 Delivered 2 Undelivered 3 SomethingElse In [9]: d = {‘Delivered’: True, ‘Undelivered’: False} In [10]: df[‘Status’].map(d) Out[10]: 0 True 1 True 2 False 3 NaN Name: Status, dtype: object
DataFrame/Series.to_string These methods have a variety of arguments that allow you configure what, and how, information is displayed when you print. By default Series.to_string has name=False and dtype=False, so we additionally specify index=False: s = pd.Series([‘race’, ‘gender’], index=[311, 317]) print(s.to_string(index=False)) # race # gender If the Index is important the default is index=True: print(s.to_string()) #311 … Read more
You can create a dict and pass this as the data param to the dataframe constructor: In [235]: df = pd.DataFrame({‘Gene’:s.index, ‘count’:s.values}) df Out[235]: Gene count 0 Ezh2 2 1 Hmgb 7 2 Irf1 1 Alternatively you can create a df from the series, you need to call reset_index as the index will be used … Read more
Index has a special meaning in Pandas. It’s used to optimise specific operations and can be used in various methods such as merging / joining data. Therefore, make a choice: If it’s “just another column”, use reset_index and treat it as another column. If it’s genuinely used for indexing, keep it as an index and … Read more
It looks like you may have some nulls in the column. You can drop them with df = df.dropna(subset=[‘item’]). Then df[‘item’].value_counts().max() should give you the max counts, and df[‘item’].value_counts().idxmax() should give you the most frequent value.
Two easy ways to accomplish just the filtering are: Using filter: names = filter(lambda name: name[-5:] != “Smith”, names) Using list comprehensions: names = [name for name in names if name[-5:] != “Smith”] Note that both cases keep the values for which the predicate function evaluates to True, so you have to reverse the logic … Read more