indexing – Page 14 – Tarik Billa

Get row and column indices of matches using `which()`

July 25, 2023 by Tarik

Python: Removing Rows on Count condition

July 23, 2023 by Tarik

Here you go with filter df.groupby(‘city’).filter(lambda x : len(x)>3) Out[1743]: city 0 NYC 1 NYC 2 NYC 3 NYC Solution two transform sub_df = df[df.groupby(‘city’).city.transform(‘count’)>3].copy() # add copy for future warning when you need to modify the sub df

How to disable index in innodb

July 22, 2023 by Tarik

Have you tried the following? SET autocommit=0; SET unique_checks=0; SET foreign_key_checks=0; From the MySQL References https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-bulk-data-loading.html See Section “Bulk Data Loading Tips“

git: Unable to index file – permission denied

July 22, 2023 by Tarik

If you are using visual studio or something similar that is generating the mdf file, simply close VS & retry your git command again. This time it should work. To save constantly closing & reopening, you should add references into .gitignore file in the project root. For example, if it is a database causing the … Read more

How does a geospatial index work? [closed]

July 20, 2023 by Tarik

Depending on the data type and usage pattern, either an R-Tree or variant (R*, R+) or a quadtree or perhaps even a kd-tree.

R-Tree and Quadtree Comparison

July 20, 2023 by Tarik

Here’s paper which has pretty nice comparison of QuadTrees and R Trees: Quadtree and R-tree Indexes in Oracle Spatial: A Comparison using GIS Data Some differences: Quadtrees require fine-tuning by choosing appropriate tiling level in order to optimize performance. No specific tuning is required for R-Trees. Quadtree can be implemented on top of existing B-tree. … Read more

PostgreSQL UUID type performance

July 19, 2023 by Tarik

We had a table with about 30k rows that (for a specific unrelated architectural reason) had UUIDs stored in a text field and indexed. I noticed that the query perf was slower than I’d have expected. I created a new UUID column, copied in the text uuid primary key and compared below. 2.652ms vs 0.029ms. … Read more

Speeding up row counting in MySQL

July 15, 2023 by Tarik

So the question is are there any techniques for speeding up these kinds of queries? Well, not really. A column-based storage engine would probably be faster with those SELECT COUNT(*) queries but it would be less performant for pretty much any other query. Your best bet is to maintain a summary table via triggers. It … Read more

Retrieve name of column from its Index in Pandas

July 15, 2023 by Tarik

I think you need index columns names by position (python counts from 0, so for fourth column need 3): colname = df.columns[pos] Sample: df = pd.DataFrame({‘A’:[1,2,3], ‘B’:[4,5,6], ‘C’:[7,8,9], ‘D’:[1,3,5], ‘E’:[5,3,6], ‘F’:[7,4,3]}) print (df) A B C D E F 0 1 4 7 1 5 7 1 2 5 8 3 3 4 2 3 … Read more

In Python pandas, start row index from 1 instead of zero without creating additional column

July 10, 2023 by Tarik

Just assign directly a new index array: df.index = np.arange(1, len(df) + 1) Example: In [151]: df = pd.DataFrame({‘a’:np.random.randn(5)}) df Out[151]: a 0 0.443638 1 0.037882 2 -0.210275 3 -0.344092 4 0.997045 In [152]: df.index = np.arange(1,len(df)+1) df Out[152]: a 1 0.443638 2 0.037882 3 -0.210275 4 -0.344092 5 0.997045 Or just: df.index = df.index … Read more