pandas – Page 3 – Tarik Billa

Counting duplicate values in Pandas DataFrame

April 9, 2024 by Tarik

You can use groupby with function size. Then I reset index with rename column 0 to count. print df Month LSOA code Longitude Latitude Crime type 0 2015-01 E01000916 -0.106453 51.518207 Bicycle theft 1 2015-01 E01000914 -0.111497 51.518226 Burglary 2 2015-01 E01000914 -0.111497 51.518226 Burglary 3 2015-01 E01000914 -0.111497 51.518226 Other theft 4 2015-01 E01000914 … Read more

pandas plot value counts barplot in descending manner [duplicate]

April 9, 2024 by Tarik

You can do it by changing your plotting line like this df.letters.value_counts().sort_values().plot(kind = ‘barh’)

dataframe, set index from list

April 9, 2024 by Tarik

Change it to list before assigning it to index df.index = list(df[“First”])

Pandas: Remove NaN only at beginning and end of dataframe

April 9, 2024 by Tarik

Use the built in first_valid_index and last_valid_index they are designed specifically for this and slice your df: In [5]: first_idx = df.first_valid_index() last_idx = df.last_valid_index() print(first_idx, last_idx) df.loc[first_idx:last_idx] 1950 1954 Out[5]: sum 1950 5 1951 3 1952 NaN 1953 4 1954 8

DataFrame modified inside a function

April 9, 2024 by Tarik

def test(df): df = df.copy(deep=True) df[‘tt’] = np.nan return df If you pass the dataframe into a function and manipulate it and return the same dataframe, you are going to get the same dataframe in modified version. If you want to keep your old dataframe and create a new dataframe with your modifications then by … Read more

How to check if a pandas dataframe contains only numeric values column-wise?

April 9, 2024 by Tarik

You can check that using to_numeric and coercing errors: pd.to_numeric(df[‘column’], errors=”coerce”).notnull().all() For all columns, you can iterate through columns or just use apply df.apply(lambda s: pd.to_numeric(s, errors=”coerce”).notnull().all()) E.g. df = pd.DataFrame({‘col’ : [1,2, 10, np.nan, ‘a’], ‘col2’: [‘a’, 10, 30, 40 ,50], ‘col3’: [1,2,3,4,5.0]}) Outputs col False col2 False col3 True dtype: bool

Plotting Pandas Multiindex Bar Chart

April 9, 2024 by Tarik

import pandas as pd data = pd.DataFrame([ (‘Q1′,’Blue’,100), (‘Q1′,’Green’,300), (‘Q2′,’Blue’,200), (‘Q2′,’Green’,350), (‘Q3′,’Blue’,300), (‘Q3′,’Green’,400), (‘Q4′,’Blue’,400), (‘Q4′,’Green’,450), ], columns=[‘quarter’, ‘company’, ‘value’] ) data = data.set_index([‘quarter’, ‘company’]).value data.unstack().plot(kind=’bar’, stacked=True) If you don’t want to stack your bar chart: data.unstack().plot(kind=’bar’)

Grouped Bar graph Pandas

April 8, 2024 by Tarik

Using pandas: import pandas as pd groups = [[23,135,3], [123,500,1]] group_labels = [‘views’, ‘orders’] # Convert data to pandas DataFrame. df = pd.DataFrame(groups, index=group_labels).T # Plot. pd.concat( [ df.mean().rename(‘average’), df.min().rename(‘min’), df.max().rename(‘max’) ], axis=1, ).plot.bar()

How to select rows in Pandas dataframe where value appears more than once

April 8, 2024 by Tarik

You can use value_counts + isin – v = df.Parameter.value_counts() df[df.Parameter.isin(v.index[v.gt(5)])] For example, where K = 2 (get all items which have more than 2 readings) – df ID Parameter Value 0 0 A 4.3 1 1 B 3.1 2 2 C 8.9 3 3 A 2.1 4 4 A 3.9 5 5 B 4.5 … Read more