For machine instruction profiling use valgrind’s callgrind (also, cachegrind can do cache and branch prediction profiling which is quite nice).
For time measurements use google’s cpu profiler, it gives way better results than gprof. You can set sampling frequency and it can show the output as a nice annotated call graph.