Edit: added another answer on poor man’s profiler, which IMHO is better for multithreaded apps.
Have a look at oprofile. The profiling overhead of this tool is negligible and it supports multithreaded applications—as long as you don’t want to profile mutex contention (which is a very important part of profiling multithreaded applications)