scala – Page 2 – Tarik Billa

Spark final task takes 100x times longer than first 199, how to improve

January 9, 2024 by Tarik

Spark >= 3.0 Since 3.0 Spark provides built-in optimizations for handling skewed joins – which can be enabled using spark.sql.adaptive.optimizeSkewedJoin.enabled property. See SPARK-29544 for details. Spark < 3.0 You clearly have a problem with a huge right data skew. Lets take a look a the statistics you’ve provided: df1 = [mean=4.989209978967438, stddev=2255.654165352454, count=2400088] df2 = … Read more

What reflection capabilities can we expect from Scala 2.10?

January 8, 2024 by Tarik

update 2012-07-04: Daniel SOBRAL (also on SO) details in his blog post “JSON serialization with reflection in Scala! Part 1 – So you want to do reflection?” some of the features coming with reflection: To recapitulate, Scala 2.10 will come with a Scala reflection library. That library is used by the compiler itself, but divided … Read more

Integrating native system libraries with SBT

January 8, 2024 by Tarik

From the research I’ve done in the past, there are only two ways to get native libraries loaded: modifying java.library.path and using System.loadLibrary (I feel like most people do this), or using System.load with an absolute path. As you’ve alluded to, messing with java.library.path can be annoying in terms of configuring SBT and Eclipse, and … Read more

Spark : Read file only if the path exists

January 8, 2024 by Tarik

You can filter out the irrelevant files as in @Psidom’s answer. In spark, the best way to do so is to use the internal spark hadoop configuration. Given that spark session variable is called “spark” you can do: import org.apache.hadoop.fs.FileSystem import org.apache.hadoop.fs.Path val hadoopfs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration) def testDirExist(path: String): Boolean = { val p … Read more

Defining a UDF that accepts an Array of objects in a Spark DataFrame?

January 8, 2024 by Tarik

What you’re looking for is Seq[o.a.s.sql.Row]: import org.apache.spark.sql.Row val my_size = udf { subjects: Seq[Row] => subjects.size } Explanation: Current representation of ArrayType is, as you already know, WrappedArray so Array won’t work and it is better to stay on the safe side. According to the official specification, the local (external) type for StructType is … Read more

Difference between Scala’s existential types and Java’s wildcard by example?

January 8, 2024 by Tarik

This is Martin Odersky’s answer on the Scala-users mailing list: The original Java wildcard types (as described in the ECOOP paper by Igarashi and Viroli) were indeed just shorthands for existential types. I am told and I have read in the FOOL ’05 paper on Wild FJ that the final version of wildcards has some … Read more

Scala: circular references in immutable data types?

January 8, 2024 by Tarik

Let’s try to work it out step by step. As a rule of thumb when creating an immutable object all constructor parameters should be known at the point of instantiation, but let’s cheat and pass constructor parameters by name, then use lazy fields to delay evaluation, so we can create a bidirectional link between elements: … Read more

Cake pattern with Java8 possible?

January 7, 2024 by Tarik

With inspiration from other answers I came up with the following (rough) class hierarchy that is similar to the cake pattern in Scala: interface UserRepository { String authenticate(String username, String password); } interface UserRepositoryComponent { UserRepository getUserRepository(); } interface UserServiceComponent extends UserRepositoryComponent { default UserService getUserService() { return new UserService(getUserRepository()); } } class UserService { … Read more

Google guava vs Scala collection framework comparison

January 7, 2024 by Tarik

Google Guava is a fantastic library, there’s no doubt about it. However, it’s implemented in Java and suffers from all the restrictions that that implies: No immutable collection interface in the standard library No lambda literals (closures), so there’s some heavy boilerplate around the SAM types needed for e.g. predicates lots of duplication in type … Read more

Get companion object of class by given generic type Scala

January 7, 2024 by Tarik

A gist by Miles Sabin may give you a hint: trait Companion[T] { type C def apply() : C } object Companion { implicit def companion[T](implicit comp : Companion[T]) = comp() } object TestCompanion { trait Foo object Foo { def bar = “wibble” // Per-companion boilerplate for access via implicit resolution implicit def companion … Read more