dataframe – Page 14 – Tarik Billa

Python – “case insensitive” in a string or “case ignore”

November 24, 2023 by Tarik

Series.str.contains has a case parameter that is True by default. Set it to False to do a case insensitive match. df2 = df1[‘company_name’].str.contains(“apple”, na=False, case=False)

How to print like jupyter notebook’s default cell output

November 24, 2023 by Tarik

You can use IPython’s display function to achieve that: from IPython.display import display display(d)

Transposing a dataframe maintaining the first column as heading

November 24, 2023 by Tarik

Adding two columns to existing DataFrame using withColumn

November 23, 2023 by Tarik

AFAIk you need to call withColumn twice (once for each new column). But if your udf is computationally expensive, you can avoid to call it twice with storing the “complex” result in a temporary column and then “unpacking” the result e.g. using the apply method of column (which gives access to the array element). Note … Read more

Print Visually Pleasing DataFrames in For Loop in Jupyter Notebook Pandas

November 23, 2023 by Tarik

You can use this: from IPython.display import display for i in df_list: display(i) Learn more tricks about rich and flexible formatting at Jupyter Notebook Viewer

Writing large Pandas Dataframes to CSV file in chunks

November 22, 2023 by Tarik

Solution: header = True for chunk in chunks: chunk.to_csv(os.path.join(folder, new_folder, “new_file_” + filename), header=header, cols=[[‘TIME’,’STUFF’]], mode=”a”) header = False Notes: The mode=”a” tells pandas to append. We only write a column header on the first chunk.

Boolean indexing on multiple pandas columns [duplicate]

November 21, 2023 by Tarik

It is a precedence operator issue. You should add extra parenthesis to make your multi condition test working: d[(d[‘x’]>2) & (d[‘y’]>7)] This section of the tutorial you mentioned shows an example with several boolean conditions and the parenthesis are used.

How do I melt a pandas dataframe?

November 21, 2023 by Tarik

Note for pandas versions < 0.20.0: I will be using df.melt(…) for my examples, but you will need to use pd.melt(df, …) instead. Documentation references: Most of the solutions here would be used with melt, so to know the method melt, see the documentation explanation. Unpivot a DataFrame from wide to long format, optionally leaving … Read more

Reverse a get_dummies encoding in pandas

November 21, 2023 by Tarik

Pretty one-liner 🙂 new_df = df.idxmax(axis=1)

Start row index from 1 instead of zero without creating additional column in pandas [duplicate]

November 15, 2023 by Tarik

Just assign directly a new index array: df.index = np.arange(1, len(df) + 1) Example: In [151]: df = pd.DataFrame({‘a’:np.random.randn(5)}) df Out[151]: a 0 0.443638 1 0.037882 2 -0.210275 3 -0.344092 4 0.997045 In [152]: df.index = np.arange(1,len(df)+1) df Out[152]: a 1 0.443638 2 0.037882 3 -0.210275 4 -0.344092 5 0.997045 Or just: df.index = df.index … Read more