Scala Partition/Collect Usage

collect (defined on TraversableLike and available in all subclasses) works with a collection and a PartialFunction. It also just so happens that a bunch of case clauses defined inside braces are a partial function (See section 8.5 of the Scala Language Specification [warning – PDF]) As in exception handling: try { … do something risky … Read more

ruby using the “&:methodname” shortcut from array.map(&:methodname) for hash key strings rather than methodname

You can do this with a lambda: extract_keyname = ->(h) { h[:keyname] } ary_of_hashes.map(&extract_keyname) This tends to be more useful if the block’s logic is more complicated than simply extracting a value from a Hash. Also, attaching names to your bits of logic can help clarify what a chain of Enumerable method calls is trying … Read more

Java 8 is not maintaining the order while grouping

Not maintaining the order is a property of the Map that stores the result. If you need a specific Map behavior, you need to request a particular Map implementation. E.g. LinkedHashMap maintains the insertion order: groupedResult = people.collect(Collectors.groupingBy( p -> new GroupingKey(p, groupByColumns), LinkedHashMap::new, Collectors.mapping((Map<String, Object> p) -> p, toList()))); By the way, there is … Read more

Static context cannot access non-static in Collectors

Unfortunately, the error message “Non-static method cannot be refered from a static context.” is just a place-holder for any type mismatch problem, when method references are involved. The compiler simply failed to determine the actual problem. In your code, the target type Map<Integer, Map<String, List<String>>> doesn’t match the result type of the combined collector which … Read more

pyspark collect_set or collect_list with groupby

You need to use agg. Example: from pyspark import SparkContext from pyspark.sql import HiveContext from pyspark.sql import functions as F sc = SparkContext(“local”) sqlContext = HiveContext(sc) df = sqlContext.createDataFrame([ (“a”, None, None), (“a”, “code1”, None), (“a”, “code2”, “name2”), ], [“id”, “code”, “name”]) df.show() +—+—–+—–+ | id| code| name| +—+—–+—–+ | a| null| null| | a|code1| … Read more

Extract a dplyr tbl column as a vector

With dplyr >= 0.7.0, you can use pull() to get a vector from a tbl. library(dplyr, warn.conflicts = FALSE) db <- src_sqlite(tempfile(), create = TRUE) iris2 <- copy_to(db, iris) vec <- pull(iris2, Species) head(vec) #> [1] “setosa” “setosa” “setosa” “setosa” “setosa” “setosa”

tech