dataframe
Convert Pandas DataFrame Column From String to Int Based on Conditional
You’re trying to compare a scalar with the entire series which raise the ValueError you saw. A simple method would be to cast the boolean series to int: In [84]: df[‘viz’] = (df[‘viz’] !=’n’).astype(int) df Out[84]: viz a1_count a1_mean a1_std 0 0 3 2 0.816497 1 1 0 NaN NaN 2 0 2 51 50.000000 … Read more
Pandas update multiple columns at once
you want to replace print df.loc[df[‘Col1’].isnull(),[‘Col1′,’Col2’, ‘Col3’]] Col1 Col2 Col3 2 NaN NaN NaN 3 NaN NaN NaN With: replace_with_this = df.loc[df[‘Col1’].isnull(),[‘col1_v2′,’col2_v2’, ‘col3_v2’]] print replace_with_this col1_v2 col2_v2 col3_v2 2 a b d 3 d e f Seems reasonable. However, when you do the assignment, you need to account for index alignment, which includes columns. So, … Read more
PySpark DataFrame Column Reference: df.col vs. df[‘col’] vs. F.col(‘col’)?
In most practical applictions, there is almost no difference. However, they are implemented by calls to different underlying functions (source) and thus are not exactly the same. We can illustrate with a small example: df = spark.createDataFrame( [(1,’a’, 0), (2,’b’,None), (None,’c’,3)], [‘col’, ‘2col’, ‘third col’] ) df.show() #+—-+—-+———+ #| col|2col|third col| #+—-+—-+———+ #| 1| a| … Read more
Index must be called with a collection of some kind: assign column name to dataframe
Documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html columns : Index or array-like Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided Example: df3 = DataFrame(np.random.randn(10, 5), columns=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’]) Try to use: pd.DataFrame(reweightTarget, columns=[‘t’])
Copy pandas dataframe to excel using openpyxl
openpyxl 2.4 comes with a utility for converting Pandas Dataframes into something that openpyxl can work with directly. Code would look a bit like this: from openpyxl.utils.dataframe import dataframe_to_rows rows = dataframe_to_rows(df) for r_idx, row in enumerate(rows, 1): for c_idx, value in enumerate(row, 1): ws.cell(row=r_idx, column=c_idx, value=value) You can adjust the start of the enumeration … Read more
Select columns with all zero entries in a pandas dataframe
I’d simply compare the values to 0 and use .all(): >>> df = pd.DataFrame(np.random.randint(0, 2, (2, 8))) >>> df 0 1 2 3 4 5 6 7 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 1 1 1 >>> df == 0 0 1 2 3 4 5 … Read more