pandas – Page 3 – Tarik Billa

How is a Pandas crosstab different from a Pandas pivot_table?

August 28, 2023 by Tarik

The main difference between the two is the pivot_table expects your input data to already be a DataFrame; you pass a DataFrame to pivot_table and specify the index/columns/values by passing the column names as strings. With cross_tab, you don’t necessarily need to have a DataFrame going in, as you just pass array-like objects for index/columns/values. … Read more

Pandas text matching like SQL’s LIKE?

August 14, 2023 by Tarik

You can use the Series method str.startswith (which takes a regex): In [11]: s = pd.Series([‘aa’, ‘ab’, ‘ca’, np.nan]) In [12]: s.str.startswith(‘a’, na=False) Out[12]: 0 True 1 True 2 False 3 False dtype: bool You can also do the same with str.contains (using a regex): In [13]: s.str.contains(‘^a’, na=False) Out[13]: 0 True 1 True 2 … Read more

Pandas rank by column value [duplicate]

August 8, 2023 by Tarik

Here’s one way to do it in Pandas-way You could groupby on Auction_ID and take rank() on Bid_Price with ascending=False In [68]: df[‘Auction_Rank’] = df.groupby(‘Auction_ID’)[‘Bid_Price’].rank(ascending=False) In [69]: df Out[69]: Auction_ID Bid_Price Auction_Rank 0 123 9 1 1 123 7 2 2 123 6 3 3 123 2 4 4 124 3 1 5 124 2 … Read more

How to remove illegal characters so a dataframe can write to Excel

August 3, 2023 by Tarik

Based on Haipeng Su’s answer, I added a function that does this: dataframe = dataframe.applymap(lambda x: x.encode(‘unicode_escape’). decode(‘utf-8’) if isinstance(x, str) else x) Basically, it escapes the unicode characters if they exist. It worked and I can now write to Excel spreadsheets again!

pyspark show dataframe as table with horizontal scroll in ipython notebook

July 23, 2023 by Tarik

this is a workaround spark_df.limit(5).toPandas().head() although, I do not know the computational burden of this query. I am thinking limit() is not expensive. corrections welcome.

Pandas & AWS Lambda

July 22, 2023 by Tarik

Scrape tables into dataframe with BeautifulSoup

July 19, 2023 by Tarik

Pandas already has a built-in method to convert the table on the web to a dataframe: table = soup.find_all(‘table’) df = pd.read_html(str(table))[0]

Jupyter notebook display two pandas tables side by side

June 19, 2023 by Tarik

I have ended up writing a function that can do this: [update: added titles based on suggestions (thnx @Antony_Hatchkins et al.)] from IPython.display import display_html from itertools import chain,cycle def display_side_by_side(*args,titles=cycle([”])): html_str=”” for df,title in zip(args, chain(titles,cycle([‘</br>’])) ): html_str+='<th style=”text-align:center”><td style=”vertical-align:top”>’ html_str+=f'<h2 style=”text-align: center;”>{title}</h2>’ html_str+=df.to_html().replace(‘table’,’table style=”display:inline”‘) html_str+='</td></th>’ display_html(html_str,raw=True) Example usage: df1 = pd.DataFrame(np.arange(12).reshape((3,4)),columns=[‘A’,’B’,’C’,’D’,]) df2 = … Read more

Disabling Pylint no member- E1101 error for specific libraries

June 14, 2023 by Tarik

You can mark their attributes as dynamically generated using generated-members option. E.g. for pandas: generated-members=pandas.*

Python Pandas: drop a column from a multi-level column index?

June 11, 2023 by Tarik

With a multi-index we have to specify the column using a tuple in order to drop a specific column, or specify the level to drop all columns with that key on that index level. Instead of saying drop column ‘c’ say drop (‘a’,’c’) as shown below: df.drop((‘a’, ‘c’), axis = 1, inplace = True) Or … Read more