count – Page 5 – Tarik Billa

How to count duplicates in Ruby Arrays

September 14, 2023 by Tarik

Another version of a hash with a key for each element in your array and value for the count of each element a = [ 1, 2, 3, 3, 4, 3] h = Hash.new(0) a.each { | v | h.store(v, h[v]+1) } # h = { 3=>3, 2=>1, 1=>1, 4=>1 }

Identify if at least one row with given condition exists

September 11, 2023 by Tarik

Commonly, you’d express this as either SELECT COUNT(*) FROM employee WHERE name like ‘kaushik%’ AND rownum = 1 where the rownum = 1 predicate allows Oracle to stop looking as soon as it finds the first matching row or SELECT 1 FROM dual WHERE EXISTS( SELECT 1 FROM employee WHERE name like ‘kaushik%’ ) where … Read more

Count number of distinct values in a vector

September 11, 2023 by Tarik

Is COUNT(*) in SQL Server a constant time operation? If not, why not?

September 8, 2023 by Tarik

No, COUNT(*) is not a constant time operation. A COUNT(*) must return a count of rows that conform to the current scan predicate (ie. WHERE clause), so that alone would make the return of a metadata property invalid. But even if you have no predicates, the COUNT still has to satisfy the current transaction isolation … Read more

pyspark count rows on condition

September 5, 2023 by Tarik

count doesn’t sum Trues, it only counts the number of non null values. To count the True values, you need to convert the conditions to 1 / 0 and then sum: import pyspark.sql.functions as F cnt_cond = lambda cond: F.sum(F.when(cond, 1).otherwise(0)) test.groupBy(‘x’).agg( cnt_cond(F.col(‘y’) > 12453).alias(‘y_cnt’), cnt_cond(F.col(‘z’) > 230).alias(‘z_cnt’) ).show() +—+—–+—–+ | x|y_cnt|z_cnt| +—+—–+—–+ | bn| … Read more

Oracle row count of table by count(*) vs NUM_ROWS from DBA_TABLES

September 3, 2023 by Tarik

According to the documentation NUM_ROWS is the “Number of rows in the table”, so I can see how this might be confusing. There, however, is a major difference between these two methods. This query selects the number of rows in MY_TABLE from a system view. This is data that Oracle has previously collected and stored. … Read more

Counting frequency of values by date using pandas

September 1, 2023 by Tarik

It might be easiest to turn your Series into a DataFrame and use Pandas’ groupby functionality (if you already have a DataFrame then skip straight to adding another column below). If your Series is called s, then turn it into a DataFrame like so: >>> df = pd.DataFrame({‘Timestamp’: s.index, ‘Category’: s.values}) >>> df Category Timestamp … Read more

Count number of columns by a condition (>) for each row

September 1, 2023 by Tarik

Spark: How to translate count(distinct(value)) in Dataframe API’s

August 29, 2023 by Tarik

What you need is the DataFrame aggregation function countDistinct: import sqlContext.implicits._ import org.apache.spark.sql.functions._ case class Log(page: String, visitor: String) val logs = data.map(p => Log(p._1,p._2)) .toDF() val result = logs.select(“page”,”visitor”) .groupBy(‘page) .agg(‘page, countDistinct(‘visitor)) result.foreach(println)

Count number of occurrences for each unique value [duplicate]

August 29, 2023 by Tarik

Yes, you can use GROUP BY: SELECT time, activities, COUNT(*) FROM table GROUP BY time, activities;