dataframe – Page 9 – Tarik Billa

How to combine two vectors into a data frame

December 18, 2023 by Tarik

In R, how do you loop over the rows of a data frame really fast?

December 16, 2023 by Tarik

Is it possible to modify a data.frame in-place (destructively)?

December 16, 2023 by Tarik

Convert Pandas DataFrame Column From String to Int Based on Conditional

December 16, 2023 by Tarik

You’re trying to compare a scalar with the entire series which raise the ValueError you saw. A simple method would be to cast the boolean series to int: In [84]: df[‘viz’] = (df[‘viz’] !=’n’).astype(int) df Out[84]: viz a1_count a1_mean a1_std 0 0 3 2 0.816497 1 1 0 NaN NaN 2 0 2 51 50.000000 … Read more

Pandas update multiple columns at once

December 15, 2023 by Tarik

you want to replace print df.loc[df[‘Col1’].isnull(),[‘Col1′,’Col2’, ‘Col3’]] Col1 Col2 Col3 2 NaN NaN NaN 3 NaN NaN NaN With: replace_with_this = df.loc[df[‘Col1’].isnull(),[‘col1_v2′,’col2_v2’, ‘col3_v2’]] print replace_with_this col1_v2 col2_v2 col3_v2 2 a b d 3 d e f Seems reasonable. However, when you do the assignment, you need to account for index alignment, which includes columns. So, … Read more

dplyr mutate in R – add column as concat of columns

December 15, 2023 by Tarik

PySpark DataFrame Column Reference: df.col vs. df[‘col’] vs. F.col(‘col’)?

December 14, 2023 by Tarik

In most practical applictions, there is almost no difference. However, they are implemented by calls to different underlying functions (source) and thus are not exactly the same. We can illustrate with a small example: df = spark.createDataFrame( [(1,’a’, 0), (2,’b’,None), (None,’c’,3)], [‘col’, ‘2col’, ‘third col’] ) df.show() #+—-+—-+———+ #| col|2col|third col| #+—-+—-+———+ #| 1| a| … Read more

Index must be called with a collection of some kind: assign column name to dataframe

December 14, 2023 by Tarik

Documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html columns : Index or array-like Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided Example: df3 = DataFrame(np.random.randn(10, 5), columns=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’]) Try to use: pd.DataFrame(reweightTarget, columns=[‘t’])

Copy pandas dataframe to excel using openpyxl

December 13, 2023 by Tarik

openpyxl 2.4 comes with a utility for converting Pandas Dataframes into something that openpyxl can work with directly. Code would look a bit like this: from openpyxl.utils.dataframe import dataframe_to_rows rows = dataframe_to_rows(df) for r_idx, row in enumerate(rows, 1): for c_idx, value in enumerate(row, 1): ws.cell(row=r_idx, column=c_idx, value=value) You can adjust the start of the enumeration … Read more

Select columns with all zero entries in a pandas dataframe

December 13, 2023 by Tarik

I’d simply compare the values to 0 and use .all(): >>> df = pd.DataFrame(np.random.randint(0, 2, (2, 8))) >>> df 0 1 2 3 4 5 6 7 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 1 1 1 >>> df == 0 0 1 2 3 4 5 … Read more