How do I get schema / column names from parquet file?

Question

You won’t be able “open” the file using a hdfs dfs -text because its not a text file. Parquet files are written to disk very differently compared to text files.

And for the same matter, the Parquet project provides parquet-tools to do tasks like which you are trying to do. Open and see the schema, data, metadata etc.

Check out the parquet-tool project
parquet-tools

Also Cloudera which support and contributes heavily to Parquet, also has a nice page with examples on usage of parquet-tools. A example from that page for your use case is

parquet-tools schema part-m-00000.parquet

Checkout the Cloudera page. Using the Parquet File Format with Impala, Hive, Pig, HBase, and MapReduce

Leave a Comment Cancel reply