machine-learning – Tarik Billa

How to detect how similar a speech recording is to another speech recording?

April 12, 2024 by Tarik

A lot of people seem to be suggesting some sort of edit distance, which IMO is a totally wrong approach for determining the similarity of two speech patterns, especially for patterns as short as OP is implying. The specific algorithms used by speech-recognition in fact are nearly the opposite of what you would like to … Read more

How can I do Train And Test step in Giza++?

April 11, 2024 by Tarik

Custom transformer for sklearn Pipeline that alters both X and y

April 10, 2024 by Tarik

Modifying the sample axis, e.g. removing samples, does not (yet?) comply with the scikit-learn transformer API. So if you need to do this, you should do it outside any calls to scikit learn, as preprocessing. As it is now, the transformer API is used to transform the features of a given sample into something new. … Read more

How does TensorFlow SparseCategoricalCrossentropy work?

April 10, 2024 by Tarik

SparseCategoricalCrossentropy and CategoricalCrossentropy both compute categorical cross-entropy. The only difference is in how the targets/labels should be encoded. When using SparseCategoricalCrossentropy the targets are represented by the index of the category (starting from 0). Your outputs have shape 4×2, which means you have two categories. Therefore, the targets should be a 4 dimensional vector with … Read more

numpy convert categorical string arrays to an integer array

April 7, 2024 by Tarik

np.unique has some optional returns return_inverse gives the integer encoding, which I use very often >>> b, c = np.unique(a, return_inverse=True) >>> b array([‘a’, ‘b’, ‘c’], dtype=”|S1″) >>> c array([0, 1, 2, 0, 1, 2]) >>> c+1 array([1, 2, 3, 1, 2, 3]) it can be used to recreate the original array from uniques >>> … Read more

How to save to disk / export a lightgbm LGBMRegressor model trained in python?

April 6, 2024 by Tarik

Try: my_model.booster_.save_model(‘mode.txt’) #load from model: bst = lgb.Booster(model_file=”mode.txt”) Note: the API state that bst = lgb.train(…) bst.save_model(‘model.txt’, num_iteration=bst.best_iteration) Depending on the version, one of the above works. For generic, You can also use pickle or something similar to freeze your model. import joblib # save model joblib.dump(my_model, ‘lgb.pkl’) # load model gbm_pickle = joblib.load(‘lgb.pkl’) Let … Read more

using confusion matrix as scoring metric in cross validation in scikit learn

April 5, 2024 by Tarik

You could use cross_val_predict(See the scikit-learn docs) instead of cross_val_score. instead of doing : from sklearn.model_selection import cross_val_score scores = cross_val_score(clf, x, y, cv=10) you can do : from sklearn.model_selection import cross_val_predict from sklearn.metrics import confusion_matrix y_pred = cross_val_predict(clf, x, y, cv=10) conf_mat = confusion_matrix(y, y_pred)

XGBoost for multilabel classification?

January 8, 2024 by Tarik

One possible approach, instead of using OneVsRestClassifier which is for multi-class tasks, is to use MultiOutputClassifier from the sklearn.multioutput module. Below is a small reproducible sample code with the number of input features and target outputs requested by the OP import xgboost as xgb from sklearn.datasets import make_multilabel_classification from sklearn.model_selection import train_test_split from sklearn.multioutput import … Read more

Predicting how long an scikit-learn classification will take to run

January 7, 2024 by Tarik

There are very specific classes of classifier or regressors that directly report remaining time or progress of your algorithm (number of iterations etc.). Most of this can be turned on by passing verbose=2 (any high number > 1) option to the constructor of individual models. Note: this behavior is according to sklearn-0.14. Earlier versions have … Read more

How To Determine the ‘filter’ Parameter in the Keras Conv2D Function

January 7, 2024 by Tarik

Actually – there is no a good answer to your question. Most of the architectures are usually carefully designed and finetuned during many experiments. I could share with you some of the rules of thumbs one should apply when designing its own architecture: Avoid a dimension collapse in the first layer. Let’s assume that your … Read more