Well blocking IO means that a given thread cannot do anything more until the IO is fully received (in the case of sockets this wait could be a long time).
Non-blocking IO means an IO request is queued straight away and the function returns. The actual IO is then processed at some later point by the kernel.
For blocking IO you either need to accept that you are going to wait for every IO request or you are going to need to fire off a thread per request (Which will get very complicated very quickly).
For non-blocking IO you can send off multiple requests but you need to bear in mind that the data will not be available until some “later” point. This checking that the data has actually arrived is probably the most complicated part.
In 99% of applications you will not need to care that your IO blocks. Sometimes however you need the extra performance of allowing yourself to initiate an IO request and then do something else before coming back and, hopefully, finding that the IO request has completed.
Anyway, just my tuppence.
Edit: To answer how to design an application for handling blocking IO while have good performance, coroutines could be a good fit.