How to open a huge excel file efficiently

Most programming languages that work with Office products have some middle layer and this is usually where the bottleneck is, a good example is using PIA’s/Interop or Open XML SDK.

One way to get the data at a lower level (bypassing the middle layer) is using a Driver.

150MB one-sheet excel file that takes about 7 minutes.

The best I could do is a 130MB file in 135 seconds, roughly 3 times faster:

Stopwatch sw = new Stopwatch();
sw.Start();

DataSet excelDataSet = new DataSet();

string filePath = @"c:\temp\BigBook.xlsx";

// For .XLSXs we use =Microsoft.ACE.OLEDB.12.0;, for .XLS we'd use Microsoft.Jet.OLEDB.4.0; with  "';Extended Properties=\"Excel 8.0;HDR=YES;\"";
string connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source="" + filePath + "";Extended Properties=\"Excel 12.0;HDR=YES;\"";

using (OleDbConnection conn = new OleDbConnection(connectionString))
{
    conn.Open();
    OleDbDataAdapter objDA = new System.Data.OleDb.OleDbDataAdapter
    ("select * from [Sheet1$]", conn);
    objDA.Fill(excelDataSet);
    //dataGridView1.DataSource = excelDataSet.Tables[0];
}
sw.Stop();
Debug.Print("Load XLSX tool: " + sw.ElapsedMilliseconds + " millisecs. Records = "  + excelDataSet.Tables[0].Rows.Count);
        

enter image description here

Win 7×64, Intel i5, 2.3ghz, 8GB ram, SSD250GB.

If I could recommend a hardware solution as well, try to resolve it with an SSD if you’re using standard HDDs.

Note: I can’t download your Excel spreadsheet example as I’m behind a corporate firewall.

PS. See MSDN – Fastest Way to import xlsx files with 200 MB of Data, the consensus being OleDB is the fastest.

PS 2. Here’s how you can do it with Python:
http://code.activestate.com/recipes/440661-read-tabular-data-from-excel-spreadsheets-the-fast/

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)