dataframe – Page 2 – Tarik Billa

Writing a Python Pandas DataFrame to Word document

April 6, 2024 by Tarik

You can write the table straight into a .docx file using the python-docx library. If you are using the Conda or installed Python using Anaconda, you can run the command from the command line: conda install python-docx –channel conda-forge Or to pip install from the command line: pip install python-docx After that is installed, we … Read more

Create a dataframe with random numbers in each column

April 6, 2024 by Tarik

Python – Json List to Pandas Dataframe

April 6, 2024 by Tarik

If you change this line in your function: dfItem = jsonToDataFrame(data) to: dfItem = pd.DataFrame.from_records(data) it should work. I tested your function with this line replaced, using [‘INAG’] as a parameter passed to your getFinanceHistoricalStockFromByma function, and it returned a DataFrame.

Sum operation on PySpark DataFrame giving TypeError when type is fine

April 6, 2024 by Tarik

You are not using the correct sum function but the built-in function sum (by default). So the reason why the build-in function won’t work is that’s it takes an iterable as an argument where as here the name of the column passed is a string and the built-in function can’t be applied on a string. … Read more

Strategy for partitioning dask dataframes efficiently

April 5, 2024 by Tarik

As of Dask 2.0.0 you may call .repartition(partition_size=”100MB”). This method performs an object-considerate (.memory_usage(deep=True)) breakdown of partition size. It will join smaller partitions, or split partitions that have grown too large. Dask’s Documentation also outlines the usage.

Drop a specific row in Pandas

April 1, 2024 by Tarik

df = pd.DataFrame([[‘Jhon’,15,’A’],[‘Anna’,19,’B’],[‘Paul’,25,’D’]]) df. columns = [‘Name’,’Age’,’Grade’] df Out[472]: Name Age Grade 0 Jhon 15 A 1 Anna 19 B 2 Paul 25 D You can get the index of your row: i = df[((df.Name == ‘jhon’) &( df.Age == 15) & (df.Grade == ‘A’))].index and then drop it: df.drop(i) Out[474]: Name Age Grade 1 … Read more

How to drop a row whose particular column is empty/NaN?

March 10, 2024 by Tarik

Use dropna with parameter subset for specify column for check NaNs: data = data.dropna(subset=[‘sms’]) print (data) id city department sms category 1 2 lhr revenue good 1 Another solution with boolean indexing and notnull: data = data[data[‘sms’].notnull()] print (data) id city department sms category 1 2 lhr revenue good 1 Alternative with query: print (data.query(“sms … Read more

How to convert single-row pandas data frame to series?

February 27, 2024 by Tarik

You can transpose the single-row dataframe (which still results in a dataframe) and then squeeze the results into a series (the inverse of to_frame). df = pd.DataFrame([list(range(5))], columns=[“a{}”.format(i) for i in range(5)]) >>> df.squeeze(axis=0) a0 0 a1 1 a2 2 a3 3 a4 4 Name: 0, dtype: int64 Note: To accommodate the point raised by … Read more

How to remove timezone from a Timestamp column in pandas

February 13, 2024 by Tarik

The column must be a datetime dtype, for example after using pd.to_datetime. Then, you can use tz_localize to change the time zone, a naive timestamp corresponds to time zone None: testdata[‘time’].dt.tz_localize(None) Unless the column is an index (DatetimeIndex), the .dt accessor must be used to access pandas datetime functions.