dataframe – Page 115

how to sort pandas dataframe from one column

September 13, 2022 by Tarik

Use sort_values to sort the df by a specific column’s values: In [18]: df.sort_values(‘2’) Out[18]: 0 1 2 4 85.6 January 1.0 3 95.5 February 2.0 7 104.8 March 3.0 0 354.7 April 4.0 8 283.5 May 5.0 6 238.7 June 6.0 5 152.0 July 7.0 1 55.4 August 8.0 11 212.7 September 9.0 10 … Read more

How to check if a column exists in Pandas

September 13, 2022 by Tarik

This will work: if ‘A’ in df: But for clarity, I’d probably write it as: if ‘A’ in df.columns:

Selecting a row of pandas series/dataframe by integer index

September 13, 2022 by Tarik

echoing @HYRY, see the new docs in 0.11 http://pandas.pydata.org/pandas-docs/stable/indexing.html Here we have new operators, .iloc to explicity support only integer indexing, and .loc to explicity support only label indexing e.g. imagine this scenario In [1]: df = pd.DataFrame(np.random.rand(5,2),index=range(0,10,2),columns=list(‘AB’)) In [2]: df Out[2]: A B 0 1.068932 -0.794307 2 -0.470056 1.192211 4 -0.284561 0.756029 6 1.037563 … Read more

Quickly reading very large tables as dataframes

September 12, 2022 by Tarik

An update, several years later This answer is old, and R has moved on. Tweaking read.table to run a bit faster has precious little benefit. Your options are: Using vroom from the tidyverse package vroom for importing data from csv/tab-delimited files directly into an R tibble. See Hector’s answer. Using fread in data.table for importing … Read more

pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

September 12, 2022 by Tarik

OK, two steps to this – first is to write a function that does the translation you want – I’ve put an example together based on your pseudo-code: def label_race (row): if row[‘eri_hispanic’] == 1 : return ‘Hispanic’ if row[‘eri_afr_amer’] + row[‘eri_asian’] + row[‘eri_hawaiian’] + row[‘eri_nat_amer’] + row[‘eri_white’] > 1 : return ‘Two Or More’ … Read more

How to check whether a pandas DataFrame is empty?

September 12, 2022 by Tarik

You can use the attribute df.empty to check whether it’s empty or not: if df.empty: print(‘DataFrame is empty!’) Source: Pandas Documentation

Filter dataframe rows if value in column is in a set list of values [duplicate]

September 12, 2022 by Tarik

Use the isin method: rpt[rpt[‘STK_ID’].isin(stk_list)]

Create an empty data.frame

September 11, 2022 by Tarik

Just initialize it with empty vectors: df <- data.frame(Date=as.Date(character()), File=character(), User=character(), stringsAsFactors=FALSE) Here’s an other example with different column types : df <- data.frame(Doubles=double(), Ints=integer(), Factors=factor(), Logicals=logical(), Characters=character(), stringsAsFactors=FALSE) str(df) > str(df) ‘data.frame’: 0 obs. of 5 variables: $ Doubles : num $ Ints : int $ Factors : Factor w/ 0 levels: $ Logicals … Read more

Drop unused factor levels in a subsetted data frame

September 11, 2022 by Tarik

Since R version 2.12, there’s a droplevels() function. levels(droplevels(subdf$letters))

How to replace NaN values by Zeroes in a column of a Pandas Dataframe?

September 11, 2022 by Tarik

I believe DataFrame.fillna() will do this for you. Link to Docs for a dataframe and for a Series. Example: In [7]: df Out[7]: 0 1 0 NaN NaN 1 -0.494375 0.570994 2 NaN NaN 3 1.876360 -0.229738 4 NaN NaN In [8]: df.fillna(0) Out[8]: 0 1 0 0.000000 0.000000 1 -0.494375 0.570994 2 0.000000 0.000000 … Read more