Using pyarrow how do you append to parquet file?

I ran into the same issue and I think I was able to solve it using the following: import pandas as pd import pyarrow as pa import pyarrow.parquet as pq chunksize=10000 # this is the number of lines pqwriter = None for i, df in enumerate(pd.read_csv(‘sample.csv’, chunksize=chunksize)): table = pa.Table.from_pandas(df) # for the first chunk … Read more

Index in Parquet

Update Dec/2018: Parquet Format version 2.5 added column indexes. https://github.com/apache/parquet-format/blob/master/CHANGES.md#version-250 See https://issues.apache.org/jira/browse/PARQUET-1201 for list of sub-tasks for that new feature. Notice that this feature just got merged into Parquet format itself, it will take some time for different backends (Spark, Hive, Impala etc) to start supporting it. This new feature is called Column Indexes. Basically … Read more

Convert csv to parquet file using python

Using the packages pyarrow and pandas you can convert CSVs to Parquet without using a JVM in the background: import pandas as pd df = pd.read_csv(‘example.csv’) df.to_parquet(‘output.parquet’) One limitation in which you will run is that pyarrow is only available for Python 3.5+ on Windows. Either use Linux/OSX to run the code as Python 2 … Read more

How to convert a csv file to parquet

I already posted an answer on how to do this using Apache Drill. However, if you are familiar with Python, you can now do this using Pandas and PyArrow! Install dependencies Using pip: pip install pandas pyarrow or using conda: conda install pandas pyarrow -c conda-forge Convert CSV to Parquet in chunks # csv_to_parquet.py import … Read more

Python: save pandas data frame to parquet file

Pandas has a core function to_parquet(). Just write the dataframe to parquet format like this: df.to_parquet(‘myfile.parquet’) You still need to install a parquet library such as fastparquet. If you have more than one parquet library installed, you also need to specify which engine you want pandas to use, otherwise it will take the first one … Read more