Until we get a post from someone who really knows, here’s my understanding to the question, FWIW.
A subroutine and a function are essentially the same thing, with one difference: A function returns some sort of value (usually via the stack or CPU register), while a subroutine does not. Whether subroutine or function, it is a block of executable code, having exactly one point of entry. A co-routine is also a block of executable code, and, just like a subroutine, it has one point of entry. However, it also has one or more points of re-entry. More on this later.
Before getting to threads, let’s review: A computer program, also known as a process, will generally have its allocation of memory organized into a code space, a heap, and a stack. The code space stores the one or more blocks of its executable code. The stack stores the parameters, automatic variables, and return addresses of subroutines, functions, and co-routines (and other things too). The heap is the wide-open memory space available to the process for whatever its purposes. In addition to these memory spaces are the CPU registers, each of which stores a set of bits. These bits could be an integer value, a memory address, a bunch of status flags, or whatever. Most programmers don’t need to know much about them, but they’re there and essential to the operation of the CPU. Probably the ones worth knowing about are the Program Counter, Stack Pointer, and Status Register, but we’re not going to get into them here.
A thread is a single logical flow of execution. In a primitive computing system, there is only one thread available to a process. In modern computing systems, a process is composed of one or more threads. Each thread gets its own stack and set of CPU registers (which is usually physically impossible, but made virtual logically – a detail we’ll skip on here). However, while each thread of a process has its own stack and registers, they will all share the same heap and code space. They are also (presumably) running simultaneously; something that can truly happen in a multi-core CPU. So two or more parts of your program can run at the same time.
Back to the co-routine: As mentioned before, it has one or more points of re-entry. A point of re-entry means that the co-routine can allow for some other block of code outside of itself to have some execution time, and then at some future time have execution time resume back within its own block of code. This implies that the parameters and automatic variables of the co-routine are preserved (and restored if need be) whenever execution is yielded to an external block of code and then returns to that of the co-routine. A co-routine is something that is not directly implemented in every programming language, although it is common to many assembly languages. In any case, it is possible to implement a co-routine in a conceptual way. There is a good article on co-routines at http://en.wikipedia.org/wiki/Coroutine.
It seems to me there are two principal motivations in implementing a co-routine design pattern: (1) overcoming the limitations of a single-threaded process; and (2) hoping to achieve better computational performance. Motivation (1) is clear to understand when the process must address many things at once where a single thread is a must. Motivation (2) may not be as clear to understand, since that is tied to a lot of particulars about the system hardware, compiler design, and language design. I can only imagine that computational effort might be reduced by cutting-back on stack manipulations, avoidance of redoing intializations in a subroutine, or relieving some of the overhead of maintaining a multi-threaded process.
HTH