series – Page 2 – Tarik Billa

Print series of prime numbers in python

September 14, 2023 by Tarik

You need to check all numbers from 2 to n-1 (to sqrt(n) actually, but ok, let it be n). If n is divisible by any of the numbers, it is not prime. If a number is prime, print it. for num in range(2,101): prime = True for i in range(2,num): if (num%i==0): prime = False … Read more

What is the fastest and most efficient way to append rows to a DataFrame?

September 8, 2023 by Tarik

As Mohit Motwani suggested fastest way is to collect data into dictionary then load all into data frame. Below some speed measurements examples: import pandas as pd import numpy as np import time import random end_value = 10000 Measurement for creating a list of dictionaries and at the end load all into data frame start_time … Read more

Is there a query method or similar for pandas Series (pandas.Series.query())?

September 2, 2023 by Tarik

If I understand correctly you can add query(“Points > 100”): df = pd.DataFrame({‘Points’:[50,20,38,90,0, np.Inf], ‘Player’:[‘a’,’a’,’a’,’s’,’s’,’s’]}) print (df) Player Points 0 a 50.000000 1 a 20.000000 2 a 38.000000 3 s 90.000000 4 s 0.000000 5 s inf points_series = df.query(“Points < inf”).groupby(“Player”).agg({“Points”: “sum”})[‘Points’] print (points_series) a = points_series[points_series > 100] print (a) Player a 108.0 … Read more

how to convert a Series of arrays into a single matrix in pandas/numpy?

August 29, 2023 by Tarik

Another way is to extract the values of your series and use numpy.stack on them. np.stack(s.values) PS. I’ve run into similar situations often.

Pandas mask / where methods versus NumPy np.where

August 24, 2023 by Tarik

I’m using pandas 0.23.3 and Python 3.6, so I can see a real difference in running time only for your second example. But let’s investigate a slightly different version of your second example (so we get2*df[0] out of the way). Here is our baseline on my machine: twice = df[0]*2 mask = df[0] > 0.5 … Read more

Pandas pd.Series.isin performance with set versus array

August 15, 2023 by Tarik

This might not be obvious, but pd.Series.isin uses O(1)-look up per element. After an analysis, which proves the above statement, we will use its insights to create a Cython-prototype which can easily beat the fastest out-of-the-box-solution. Let’s assume that the “set” has n elements and the “series” has m elements. The running time is then: … Read more

Pandas reset index on series to remove multiindex

July 31, 2023 by Tarik

Just call reset_index(): In [130]: s Out[130]: 0 1 1999-03-31 SOLD_PRICE NaN 1999-06-30 SOLD_PRICE NaN 1999-09-30 SOLD_PRICE NaN 1999-12-31 SOLD_PRICE 3 2000-03-31 SOLD_PRICE 3 Name: 2, dtype: float64 In [131]: s.reset_index() Out[131]: 0 1 2 0 1999-03-31 SOLD_PRICE NaN 1 1999-06-30 SOLD_PRICE NaN 2 1999-09-30 SOLD_PRICE NaN 3 1999-12-31 SOLD_PRICE 3 4 2000-03-31 SOLD_PRICE 3 … Read more

Sort dataframe by string length

July 15, 2023 by Tarik

You can use reindex of index of Series created by len with sort_values: print (df.name.str.len()) 0 5 1 2 2 6 3 4 Name: name, dtype: int64 print (df.name.str.len().sort_values()) 1 2 3 4 0 5 2 6 Name: name, dtype: int64 s = df.name.str.len().sort_values().index print (s) Int64Index([1, 3, 0, 2], dtype=”int64″) print (df.reindex(s)) name score … Read more

how to convert pandas series to tuple of index and value

July 14, 2023 by Tarik

Well it seems simply zip(s,s.index) works too! For Python-3.x, we need to wrap it with list – list(zip(s,s.index)) To get a tuple of tuples, use tuple() : tuple(zip(s,s.index)). Sample run – In [8]: s Out[8]: a 1 b 2 c 3 dtype: int64 In [9]: list(zip(s,s.index)) Out[9]: [(1, ‘a’), (2, ‘b’), (3, ‘c’)] In [10]: … Read more

Python Pandas iterate over rows and access column names

July 11, 2023 by Tarik

I also like itertuples() for row in df.itertuples(): print(row.A) print(row.Index) since row is a named tuples, if you meant to access values on each row this should be MUCH faster speed run : df = pd.DataFrame([x for x in range(1000*1000)], columns=[‘A’]) st=time.time() for index, row in df.iterrows(): row.A print(time.time()-st) 45.05799984931946 st=time.time() for row in df.itertuples(): … Read more