Parquet without Hadoop?

Investigating the same question I found that apparently it’s not possible for the moment.
I found this git issue, which proposes decoupling parquet from the hadoop api. Apparently it has not been done yet.

In the Apache Jira I found an issue, which asks for a way to read a parquet file outside hadoop. It is unresolved by the time of writing.

EDIT:

Issues are not tracked on github anymore (first link above is dead). A newer issue I found is located on apache’s Jira with the following headline:

make it easy to read and write parquet files in java without depending on hadoop

Leave a Comment