Continuous output in Neural Networks

Question

Much of the work in the field of neuroevolution involves using neural networks with continuous inputs and outputs.

There are several common approaches:

One node per value
- Linear activation functions – as others have noted, you can use non-sigmoid activation functions on output nodes if you are concerned about the limited range of sigmoid functions. However, this can cause your output to become arbitrarily large, which can cause problems during training.
- Sigmoid activation functions – simply scaling sigmoid output (or shifting and scaling, if you want negative values) is a common approach in neuroevolution. However, it is worth making sure that your sigmoid function isn’t too steep: a steep activation function means that the “useful” range of values is small, which forces network weights to be small. (This is mainly an issue with genetic algorithms, which use a fixed weight modification strategy that doesn’t work well when small weights are desired.)

regular sigmoid
_{(source: natekohl.net)}
steep sigmoid
_{(source: natekohl.net)}

Multiple nodes per value – spreading a single continuous value over multiple nodes is a common strategy for representing continuous inputs. It has the benefit of providing more “features” for a network to play with, at the cost of increasing network size.
- Binning – spread a single input over multiple nodes (e.g. RBF networks, where each node is a basis function with a different center that will be partially activated by the input). You get some of the benefits of discrete inputs without losing a smooth representation.
- Binary representation – divide a single continuous value into 2^N chunks, then feed that value into the network as a binary pattern to N nodes. This approach is compact, but kind of brittle and results in input that changes in a non-continuous manner.