dataframe – Page 7 – Tarik Billa

Python: How to turn a dictionary of Dataframes into one big dataframe with column names being the key of the previous dict?

December 27, 2023 by Tarik

You can try first set_index of all dataframes in comprehension and then use concat with remove last level of multiindex in columns: print d {‘17012016’: Fruit Price 0 Orange 7 1 Apple 8 2 Pear 9, ‘16012016’: Fruit Price 0 Orange 4 1 Apple 5 2 Pear 6, ‘15012016’: Fruit Price 0 Orange 1 1 … Read more

How to deal with “divide by zero” with pandas dataframes when manipulating columns? [duplicate]

December 26, 2023 by Tarik

It would probably be more useful to use a dataframe that actually has zero in the denominator (see the last row of column two). one two three four five a 0.469112 -0.282863 -1.509059 bar True b 0.932424 1.224234 7.823421 bar False c -1.135632 1.212112 -0.173215 bar False d 0.232424 2.342112 0.982342 unbar True e 0.119209 … Read more

How to reindex a MultiIndex dataframe

December 26, 2023 by Tarik

To get the B using reindex B.reindex( pd.MultiIndex.from_product([B.index.levels[0], A.index], names=[‘Bank’, ‘Curency’]),fill_value=0) Out[62]: Notional Bank Curency Bank_1 AUD 16 BRL 0 CAD 13 EUR 22 INR 0 Bank_2 AUD 24 BRL 0 CAD 20 EUR 17 INR 0 To get the A using concat pd.concat([A]*2,keys=B.index.levels[0]) Out[69]: AUD BRL CAD EUR INR Bank Bank_1 AUD 10 5 … Read more

handling zeros in pandas DataFrames column divisions in Python

December 25, 2023 by Tarik

You need to work in floats, otherwise you will have integer division, prob not what you want In [12]: df = pandas.DataFrame({“a”: [1, 2, 0, 1, 5], “b”: [0, 10, 20, 30, 50]}).astype(‘float64’) In [13]: df Out[13]: a b 0 1 0 1 2 10 2 0 20 3 1 30 4 5 50 In … Read more

Create a dataframe of unequal lengths

December 25, 2023 by Tarik

Why does Spark report “java.net.URISyntaxException: Relative path in absolute URI” when working with DataFrames?

December 25, 2023 by Tarik

It’s the SPARK-15565 issue in Spark 2.0 on Windows with a simple solution (that appears to be part of Spark’s codebase that may soon be released as 2.0.2 or 2.1.0). The solution in Spark 2.0.0 is to set spark.sql.warehouse.dir to some properly-referenced directory, say file:///c:/Spark/spark-2.0.0-bin-hadoop2.7/spark-warehouse that uses /// (triple slashes). Start spark-shell with –conf argument … Read more

Reversed cumulative sum of a column in pandas.DataFrame

December 23, 2023 by Tarik

Reverse column A, take the cumsum, then reverse again: df[‘C’] = df.loc[::-1, ‘A’].cumsum()[::-1] import pandas as pd df = pd.DataFrame( {‘A’: [False, True, False, False, False, True, False, True], ‘B’: [0.03771, 0.315414, 0.33248, 0.445505, 0.580156, 0.741551, 0.796944, 0.817563],}, index=[6, 2, 4, 7, 3, 1, 5, 0]) df[‘C’] = df.loc[::-1, ‘A’].cumsum()[::-1] print(df) yields A B C … Read more

python pandas: filter out records with null or empty string for a given field

December 23, 2023 by Tarik

You can filter out empty strings in your dataframe like this: df = df[df[‘str_field’].str.len() > 0]

How to calculate number of words in a string in DataFrame? [duplicate]

December 22, 2023 by Tarik

IIUC then you can do the following: In [89]: count = df[‘fruits’].str.split().apply(len).value_counts() count.index = count.index.astype(str) + ‘ words:’ count.sort_index(inplace=True) count Out[89]: 1 words: 2 2 words: 2 3 words: 1 4 words: 1 Name: fruits, dtype: int64 Here we use the vectorised str.split to split on spaces, and then apply len to get the count … Read more