Convolutional Neural Networks – Multiple Channels

How is the convolution operation carried out when multiple channels are present at the input layer? (e.g. RGB) In such a case you have one 2D kernel per input channel (a.k.a plane). So you perform each convolution (2D Input, 2D kernel) separately and you sum the contributions which gives the final output feature map. Please … Read more

How To Determine the ‘filter’ Parameter in the Keras Conv2D Function

Actually – there is no a good answer to your question. Most of the architectures are usually carefully designed and finetuned during many experiments. I could share with you some of the rules of thumbs one should apply when designing its own architecture: Avoid a dimension collapse in the first layer. Let’s assume that your … Read more

Convolutional Neural Network (CNN) for Audio [closed]

We used deep convolutional networks on spectrograms for a spoken language identification task. We had around 95% accuracy on a dataset provided in this TopCoder contest. The details are here. Plain convolutional networks do not capture the temporal characteristics, so for example in this work the output of the convolutional network was fed to a … Read more

Tensorflow: loss decreasing, but accuracy stable

A decrease in binary cross-entropy loss does not imply an increase in accuracy. Consider label 1, predictions 0.2, 0.4 and 0.6 at timesteps 1, 2, 3 and classification threshold 0.5. timesteps 1 and 2 will produce a decrease in loss but no increase in accuracy. Ensure that your model has enough capacity by overfitting the … Read more

Activation function after pooling layer or convolutional layer?

Well, max-pooling and monotonely increasing non-linearities commute. This means that MaxPool(Relu(x)) = Relu(MaxPool(x)) for any input. So the result is the same in that case. So it is technically better to first subsample through max-pooling and then apply the non-linearity (if it is costly, such as the sigmoid). In practice it is often done the … Read more