Convert a list to a data frame
With rbind do.call(rbind.data.frame, your_list) Edit: Previous version return data.frame of list‘s instead of vectors (as @IanSudbery pointed out in comments).
With rbind do.call(rbind.data.frame, your_list) Edit: Previous version return data.frame of list‘s instead of vectors (as @IanSudbery pointed out in comments).
read_csv takes an encoding option to deal with files in different formats. I mostly use read_csv(‘file’, encoding = “ISO-8859-1”), or alternatively encoding = “utf-8” for reading, and generally utf-8 for to_csv. You can also use one of several alias options like ‘latin’ or ‘cp1252’ (Windows) instead of ‘ISO-8859-1’ (see python docs, also for numerous other … Read more
If you have a DataFrame with only one row, then access the first (only) row as a Series using iloc, and then the value using the column name: In [3]: sub_df Out[3]: A B 2 -0.133653 -0.030854 In [4]: sub_df.iloc[0] Out[4]: A -0.133653 B -0.030854 Name: 2, dtype: float64 In [5]: sub_df.iloc[0][‘A’] Out[5]: -0.13365288513107493
The R Language Definition is handy for answering these types of questions: http://cran.r-project.org/doc/manuals/R-lang.html#Indexing R has three basic indexing operators, with syntax displayed by the following examples x[i] x[i, j] x[[i]] x[[i, j]] x$a x$”a” For vectors and matrices the [[ forms are rarely used, although they have some slight semantic differences from the [ form … Read more
g1 here is a DataFrame. It has a hierarchical index, though: In [19]: type(g1) Out[19]: pandas.core.frame.DataFrame In [20]: g1.index Out[20]: MultiIndex([(‘Alice’, ‘Seattle’), (‘Bob’, ‘Seattle’), (‘Mallory’, ‘Portland’), (‘Mallory’, ‘Seattle’)], dtype=object) Perhaps you want something like this? In [21]: g1.add_suffix(‘_Count’).reset_index() Out[21]: Name City City_Count Name_Count 0 Alice Seattle 1 1 1 Bob Seattle 2 2 2 Mallory … Read more
jwilner’s response is spot on. I was exploring to see if there’s a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster: df.isnull().values.any() import numpy as np import pandas as pd import perfplot def setup(n): df = pd.DataFrame(np.random.randn(n)) df[df > 0.9] = np.nan return df def … Read more
There is a clean, one-line way of doing this in Pandas: df[‘col_3’] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1) This allows f to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns. Example with data (based on original question): import pandas as pd … Read more
Straight from Wes McKinney’s Python for Data Analysis book, pg. 132 (I highly recommended this book): Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this: In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list(‘bde’), index=[‘Utah’, ‘Ohio’, ‘Texas’, ‘Oregon’]) In [117]: frame Out[117]: b d e … Read more
The error message says that if you’re passing scalar values, you have to pass an index. So you can either not use scalar values for the columns — e.g. use a list: >>> df = pd.DataFrame({‘A’: [a], ‘B’: [b]}) >>> df A B 0 2 3 or use scalar values and pass an index: >>> … Read more
Use df.to_numpy() It’s better than df.values, here’s why.* It’s time to deprecate your usage of values and as_matrix(). pandas v0.24.0 introduced two new methods for obtaining NumPy arrays from pandas objects: to_numpy(), which is defined on Index, Series, and DataFrame objects, and array, which is defined on Index and Series objects only. If you visit … Read more