source of historical stock data [closed]

Let me add my 2ยข, it’s my job to get good and clean data for a hedge-fund, I’ve seen quite a lot of data feeds and historical data providers. This is mainly about US stock data.

To start with, if you have some money don’t bother with downloading data from Yahoo, get the end of day data straight from CSI data, this is where Yahoo gets their EOD data as well AFAIK. They have an API where you can extract the data to whatever format you want. I think the yearly subscription for data is a few $100 bucks.

The main problem with downloading data from a free service is that you only get stocks that still exist, this is called Survivorship Bias and can give you wrong results if you look at many stocks, because you’ll only include the ones that made it so far and not the ones that were de-listed.

For playing around with some intraday data I’d look into IQFeed, they provide several APIs to extract historical data, although they are mainly an outfit for real-time feeds. But here there are quite a few options, some brokers even provide historical data downloads via their APIs, so just pick your poison.

BUT usually all of this data is not very clean, once you really start back testing you’ll see that certain stocks are missing or appear as two different symbols, or stock splits are not properly accounted for, etc. And then you realize that historical dividend data is need as well and so you start running in circles, patching data together from 100 different data sources and so on. So to start with a “discount” data feed will do, but as soon as you run more comprehensive backtests you might run into problems depending on what you do. If you just look at, let’s say, the S&P 500 stocks this will not be so much a problem though and a “cheap” intraday feed will do.

What you will not find is free intraday data. I mean you might find some examples, I’m sure there’s somewhere 5 years of MSFT tick data floating around but that will not get you very far.

Then, if you need the real stuff (level II order book, all ticks as they have happened at all exchanges) one “affordable”, yet excellent option is Nanex. They’ll actually ship you a drive with terabytes of data. If I remember right its about $3k-4K per year of data. But trust me, once you understand how hard it is to get good intraday data, you won’t think this is very much money at all.

Not to discourage you but to get good data is hard, so hard in fact that many hedge-funds and banks spend hundreds of thousands of dollars a month to get data they can trust. Again, you can start somewhere and then go from there but it’s good to see it a bit in context.


Edit: The answer above is from my own experience. This write-up from Caltech about available data feeds will give more insights, and especially recommends QuantQuote.

Leave a Comment