Starting with Spark 1.0 there are two methods you can use to solve this easily:
RDD.zipWithIndexis just likeSeq.zipWithIndex, it adds contiguous (Long) numbers. This needs to count the elements in each partition first, so your input will be evaluated twice. Cache your input RDD if you want to use this.RDD.zipWithUniqueIdalso gives you uniqueLongIDs, but they are not guaranteed to be contiguous. (They will only be contiguous if each partition has the same number of elements.) The upside is that this does not need to know anything about the input, so it will not cause double-evaluation.