I’ve run my home project on my 4-core with hyperthreading laptop and recorded the results. This is a fairly compiler-heavy project but it includes a unit test of 17.7 seconds at the end. The compiles are not very IO intensive; there is very much memory available and if not the rest is on a fast SSD.
1 job real 2m27.929s user 2m11.352s sys 0m11.964s
2 jobs real 1m22.901s user 2m13.800s sys 0m9.532s
3 jobs real 1m6.434s user 2m29.024s sys 0m10.532s
4 jobs real 0m59.847s user 2m50.336s sys 0m12.656s
5 jobs real 0m58.657s user 3m24.384s sys 0m14.112s
6 jobs real 0m57.100s user 3m51.776s sys 0m16.128s
7 jobs real 0m56.304s user 4m15.500s sys 0m16.992s
8 jobs real 0m53.513s user 4m38.456s sys 0m17.724s
9 jobs real 0m53.371s user 4m37.344s sys 0m17.676s
10 jobs real 0m53.350s user 4m37.384s sys 0m17.752s
11 jobs real 0m53.834s user 4m43.644s sys 0m18.568s
12 jobs real 0m52.187s user 4m32.400s sys 0m17.476s
13 jobs real 0m53.834s user 4m40.900s sys 0m17.660s
14 jobs real 0m53.901s user 4m37.076s sys 0m17.408s
15 jobs real 0m55.975s user 4m43.588s sys 0m18.504s
16 jobs real 0m53.764s user 4m40.856s sys 0m18.244s
inf jobs real 0m51.812s user 4m21.200s sys 0m16.812s
Basic results:
- Scaling to the core count increases the performance nearly linearly. The real time went down from 2.5 minutes to 1.0 minute (2.5x as fast), but the time taken during compile went up from 2.11 to 2.50 minutes. The system noticed barely any additional load in this bit.
- Scaling from the core count to the thread count increased the user load immensely, from 2.50 minutes to 4.38 minutes. This near doubling is most likely because the other compiler instances wanted to use the same CPU resources at the same time. The system is getting a bit more loaded with requests and task switching, causing it to go to 17.7 seconds of time used. The advantage is about 6.5 seconds on a compile time of 53.5 seconds, making for a 12% speedup.
- Scaling from thread count to double thread count gave no significant speedup. The times at 12 and 15 are most likely statistical anomalies that you can disregard. The total time taken increases ever so slightly, as does the system time. Both are most likely due to increased task switching. There is no benefit to this.
My guess right now: If you do something else on your computer, use the core count. If you do not, use the thread count. Exceeding it shows no benefit. At some point they will become memory limited and collapse due to that, making the compiling much slower. The “inf” line was added at a much later date, giving me the suspicion that there was some thermal throttling for the 8+ jobs. This does show that for this project size there’s no memory or throughput limit in effect. It’s a small project though, given 8GB of memory to compile in.