The STUN server is NOT the signalling server.
The purpose of the signalling server is to pass information between the peers at the start up of the session(how can they send an offer without knowing who to send to?). This information includes the SDPs that are created on the offers and the answers and also any Ice Candidates that are created by either party.
The reason to have a STUN server is so that the two peers can send the media to each other. The media streams will not hit your signalling server but instead will go straight to the other party(the definition of a peer-to-peer connection), the exception to this would be the case when a TURN server is used.
Media cannot magically go through a NAT or a firewall because the two parties do not have direct access to each other(like they would if they were on the same LAN).
In short STUN server is needed the large majority of the time when the two parties are not on the same network(to get valid connection candidates for peer-to-peer media streaming) and a signalling server is ALWAYS needed(whether they are on different networks or not) so that the negotiation and connection build up can take place. Good explanation of the connection and streaming process