I’ve used this many times as it’s a particular easy implementation of multiprocessing.
import pandas as pd
from multiprocessing import Pool
def reader(filename):
return pd.read_excel(filename)
def main():
pool = Pool(4) # number of cores you want to use
file_list = [file1.xlsx, file2.xlsx, file3.xlsx, ...]
df_list = pool.map(reader, file_list) #creates a list of the loaded df's
df = pd.concat(df_list) # concatenates all the df's into a single df
if __name__ == '__main__':
main()
Using this you should be able to substantially increase the speed of your program without too much work at all. If you don’t know how many processors you have, you can check by pulling up your shell and typing
echo %NUMBER_OF_PROCESSORS%
EDIT: To make this run even faster, consider changing your files to csvs and using pandas function pandas.read_csv