What is the fastest and most efficient way to append rows to a DataFrame?

As Mohit Motwani suggested fastest way is to collect data into dictionary then load all into data frame. Below some speed measurements examples: import pandas as pd import numpy as np import time import random end_value = 10000 Measurement for creating a list of dictionaries and at the end load all into data frame start_time … Read more

Is there a query method or similar for pandas Series (pandas.Series.query())?

If I understand correctly you can add query(“Points > 100”): df = pd.DataFrame({‘Points’:[50,20,38,90,0, np.Inf], ‘Player’:[‘a’,’a’,’a’,’s’,’s’,’s’]}) print (df) Player Points 0 a 50.000000 1 a 20.000000 2 a 38.000000 3 s 90.000000 4 s 0.000000 5 s inf points_series = df.query(“Points < inf”).groupby(“Player”).agg({“Points”: “sum”})[‘Points’] print (points_series) a = points_series[points_series > 100] print (a) Player a 108.0 … Read more

Pandas pd.Series.isin performance with set versus array

This might not be obvious, but pd.Series.isin uses O(1)-look up per element. After an analysis, which proves the above statement, we will use its insights to create a Cython-prototype which can easily beat the fastest out-of-the-box-solution. Let’s assume that the “set” has n elements and the “series” has m elements. The running time is then: … Read more

Pandas reset index on series to remove multiindex

Just call reset_index(): In [130]: s Out[130]: 0 1 1999-03-31 SOLD_PRICE NaN 1999-06-30 SOLD_PRICE NaN 1999-09-30 SOLD_PRICE NaN 1999-12-31 SOLD_PRICE 3 2000-03-31 SOLD_PRICE 3 Name: 2, dtype: float64 In [131]: s.reset_index() Out[131]: 0 1 2 0 1999-03-31 SOLD_PRICE NaN 1 1999-06-30 SOLD_PRICE NaN 2 1999-09-30 SOLD_PRICE NaN 3 1999-12-31 SOLD_PRICE 3 4 2000-03-31 SOLD_PRICE 3 … Read more

Sort dataframe by string length

You can use reindex of index of Series created by len with sort_values: print (df.name.str.len()) 0 5 1 2 2 6 3 4 Name: name, dtype: int64 print (df.name.str.len().sort_values()) 1 2 3 4 0 5 2 6 Name: name, dtype: int64 s = df.name.str.len().sort_values().index print (s) Int64Index([1, 3, 0, 2], dtype=”int64″) print (df.reindex(s)) name score … Read more

Python Pandas iterate over rows and access column names

I also like itertuples() for row in df.itertuples(): print(row.A) print(row.Index) since row is a named tuples, if you meant to access values on each row this should be MUCH faster speed run : df = pd.DataFrame([x for x in range(1000*1000)], columns=[‘A’]) st=time.time() for index, row in df.iterrows(): row.A print(time.time()-st) 45.05799984931946 st=time.time() for row in df.itertuples(): … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)