Efficient way to divide a list into lists of n size

You’ll want to do something that makes use of List.subList(int, int) views rather than copying each sublist. To do this really easily, use Guava’s Lists.partition(List, int) method: List<Foo> foos = … for (List<Foo> partition : Lists.partition(foos, n)) { // do something with partition } Note that this, like many things, isn’t very efficient with a … Read more

Pandas: Sampling a DataFrame [duplicate]

What version of pandas are you using? For me your code works fine (i`m on git master). Another approach could be: In [117]: import pandas In [118]: import random In [119]: df = pandas.DataFrame(np.random.randn(100, 4), columns=list(‘ABCD’)) In [120]: rows = random.sample(df.index, 10) In [121]: df_10 = df.ix[rows] In [122]: df_90 = df.drop(rows) Newer version (from … Read more

How does HashPartitioner work?

Well, lets make your dataset marginally more interesting: val rdd = sc.parallelize(for { x <- 1 to 3 y <- 1 to 2 } yield (x, None), 8) We have six elements: rdd.count Long = 6 no partitioner: rdd.partitioner Option[org.apache.spark.Partitioner] = None and eight partitions: rdd.partitions.length Int = 8 Now lets define small helper to … Read more

How to define partitioning of DataFrame?

Spark >= 2.3.0 SPARK-22614 exposes range partitioning. val partitionedByRange = df.repartitionByRange(42, $”k”) partitionedByRange.explain // == Parsed Logical Plan == // ‘RepartitionByExpression [‘k ASC NULLS FIRST], 42 // +- AnalysisBarrier Project [_1#2 AS k#5, _2#3 AS v#6] // // == Analyzed Logical Plan == // k: string, v: int // RepartitionByExpression [k#5 ASC NULLS FIRST], 42 … Read more

Is Zookeeper a must for Kafka? [closed]

Yes, Zookeeper is required for running Kafka. From the Kafka Getting Started documentation: Step 2: Start the server Kafka uses zookeeper so you need to first start a zookeeper server if you don’t already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node zookeeper instance. As to why, … Read more