I investigated the same question a while back. After chatting to people who have worked on FPGAs, this is what I get:
- FPGAs are great for realtime systems, where even 1ms of delay might be too long. This does not apply in your case;
- FPGAs can be very fast, espeically for well-defined digital signal processing usages (e.g. radar data) but the good ones are much more expensive and specialised than even professional GPGPUs;
- FPGAs are quite cumbersome to programme. Since there is a hardware configuration component to compiling, it could take hours. It seems to be more suited to electronic engineers (who are generally the ones who work on FPGAs) than software developers.
If you can make CUDA work for you, it’s probably the best option at the moment. It will certainly be more flexible than a FPGA.
Other options include Brook from ATI, but until something big happens, it is simply not as well adopted as CUDA. After that, there’s still all the traditional HPC options (clusters of x86/PowerPC/Cell), but they are all quite expensive.
Hope that helps.