machine-learning
How To Determine the ‘filter’ Parameter in the Keras Conv2D Function
Actually – there is no a good answer to your question. Most of the architectures are usually carefully designed and finetuned during many experiments. I could share with you some of the rules of thumbs one should apply when designing its own architecture: Avoid a dimension collapse in the first layer. Let’s assume that your … Read more
Is it good learning rate for Adam method?
The learning rate looks a bit high. The curve decreases too fast for my taste and flattens out very soon. I would try 0.0005 or 0.0001 as a base learning rate if I wanted to get additional performance. You can quit after several epochs anyways if you see that this does not work. The question … Read more
Soft attention vs. hard attention
What is exactly attention? To be able to understand this question, we need to dive a little into certain problems which attention seeks to solve. I think one of the seminal papers on hard attention is Recurrent Models of Visual Attention and I would encourage the reader to go through that paper, even if it … Read more
What is the difference between register_parameter and register_buffer in PyTorch?
Pytorch doc for register_buffer() method reads This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the persistent state. As you already observed, model parameters are learned and updated using SGD during the training process. However, … Read more
How does binary cross entropy loss work on autoencoders?
In the context of autoencoders the input and output of the model is the same. So, if the input values are in the range [0,1] then it is acceptable to use sigmoid as the activation function of last layer. Otherwise, you need to use an appropriate activation function for the last layer (e.g. linear which … Read more
Why does sklearn Imputer need to fit?
The Imputer fills missing values with some statistics (e.g. mean, median, …) of the data. To avoid data leakage during cross-validation, it computes the statistic on the train data during the fit, stores it and uses it on the test data, during the transform. from sklearn.preprocessing import Imputer obj = Imputer(strategy=’mean’) obj.fit([[1, 2, 3], [2, … Read more
Precision/recall for multiclass-multilabel classification
For multi-label classification you have two ways to go First consider the following. is the number of examples. is the ground truth label assignment of the example.. is the example. is the predicted labels for the example. Example based The metrics are computed in a per datapoint manner. For each predicted label its only its … Read more
What’s the difference between LSTM() and LSTMCell()?
LSTM is a recurrent layer LSTMCell is an object (which happens to be a layer too) used by the LSTM layer that contains the calculation logic for one step. A recurrent layer contains a cell object. The cell contains the core code for the calculations of each step, while the recurrent layer commands the cell … Read more