series – Page 3 – Tarik Billa

Pandas – check if ALL values are NaN in Series

June 29, 2023 by Tarik

Yes, that’s correct, but I think a more idiomatic way would be: mys.isnull().all()

Delete rows if there are null values in a specific column in Pandas dataframe [duplicate]

June 1, 2023 by Tarik

If the relevant entries in Charge_Per_Line are empty (NaN) when you read into pandas, you can use df.dropna: df = df.dropna(axis=0, subset=[‘Charge_Per_Line’]) If the values are genuinely -, then you can replace them with np.nan and then use df.dropna: import numpy as np df[‘Charge_Per_Line’] = df[‘Charge_Per_Line’].replace(‘-‘, np.nan) df = df.dropna(axis=0, subset=[‘Charge_Per_Line’])

Pandas filtering for multiple substrings in series

May 31, 2023 by Tarik

If you’re sticking to using pure-pandas, for both performance and practicality I think you should use regex for this task. However, you will need to properly escape any special characters in the substrings first to ensure that they are matched literally (and not used as regex meta characters). This is easy to do using re.escape: … Read more

Convert Pandas series containing string to boolean

May 15, 2023 by Tarik

You can just use map: In [7]: df = pd.DataFrame({‘Status’:[‘Delivered’, ‘Delivered’, ‘Undelivered’, ‘SomethingElse’]}) In [8]: df Out[8]: Status 0 Delivered 1 Delivered 2 Undelivered 3 SomethingElse In [9]: d = {‘Delivered’: True, ‘Undelivered’: False} In [10]: df[‘Status’].map(d) Out[10]: 0 True 1 True 2 False 3 NaN Name: Status, dtype: object

Remove name, dtype from pandas output of dataframe or series

April 28, 2023 by Tarik

DataFrame/Series.to_string These methods have a variety of arguments that allow you configure what, and how, information is displayed when you print. By default Series.to_string has name=False and dtype=False, so we additionally specify index=False: s = pd.Series([‘race’, ‘gender’], index=[311, 317]) print(s.to_string(index=False)) # race # gender If the Index is important the default is index=True: print(s.to_string()) #311 … Read more

assigning column names to a pandas series

April 10, 2023 by Tarik

You can create a dict and pass this as the data param to the dataframe constructor: In [235]: df = pd.DataFrame({‘Gene’:s.index, ‘count’:s.values}) df Out[235]: Gene count 0 Ezh2 2 1 Hmgb 7 2 Irf1 1 Alternatively you can create a df from the series, you need to call reset_index as the index will be used … Read more

Accessing a Pandas index like a regular column

April 9, 2023 by Tarik

Index has a special meaning in Pandas. It’s used to optimise specific operations and can be used in various methods such as merging / joining data. Therefore, make a choice: If it’s “just another column”, use reset_index and treat it as another column. If it’s genuinely used for indexing, keep it as an index and … Read more

How to get the number of the most frequent value in a column?

April 7, 2023 by Tarik

It looks like you may have some nulls in the column. You can drop them with df = df.dropna(subset=[‘item’]). Then df[‘item’].value_counts().max() should give you the max counts, and df[‘item’].value_counts().idxmax() should give you the most frequent value.

Elegant way to remove items from sequence in Python? [duplicate]

April 6, 2023 by Tarik

Two easy ways to accomplish just the filtering are: Using filter: names = filter(lambda name: name[-5:] != “Smith”, names) Using list comprehensions: names = [name for name in names if name[-5:] != “Smith”] Note that both cases keep the values for which the predicate function evaluates to True, so you have to reverse the logic … Read more