data-science – Tarik Billa

import langchain => Error : TypeError: issubclass() arg 1 must be a class

April 5, 2024 by Tarik

How to use advanced activation layers in Keras?

December 3, 2023 by Tarik

The correct way to use the advanced activations like PReLU is to use it with add() method and not wrapping it using Activation class. Example: model = Sequential() act = keras.layers.advanced_activations.PReLU(init=”zero”, weights=None) model.add(Dense(64, input_dim=14, init=”uniform”)) model.add(act)

Difference between Standard scaler and MinMaxScaler

September 24, 2023 by Tarik

MinMaxScaler(feature_range = (0, 1)) will transform each value in the column proportionally within the range [0,1]. Use this as the first scaler choice to transform a feature, as it will preserve the shape of the dataset (no distortion). StandardScaler() will transform each value in the column to range about the mean 0 and standard deviation … Read more

GridSearchCV – XGBoost – Early Stopping

September 1, 2023 by Tarik

When using early_stopping_rounds you also have to give eval_metric and eval_set as input parameter for the fit method. Early stopping is done via calculating the error on an evaluation set. The error has to decrease every early_stopping_rounds otherwise the generation of additional trees is stopped early. See the documentation of xgboosts fit method for details. … Read more

What is the difference between Spyder and Jupyter?

August 30, 2023 by Tarik

Here’s just a basic summary of the two tools. Jupyter is a very popular application used for data analysis. It’s an IPython notebook (“interactive python”). You can run each block of code separately. For example, I can print a graph using matplotlib. Create a new block of code and print another graph. There are also … Read more

pandas reset_index after groupby.value_counts()

August 30, 2023 by Tarik

You need parameter name in reset_index, because Series name is same as name of one of levels of MultiIndex: df_grouped.reset_index(name=”count”) Another solution is rename Series name: print (df_grouped.rename(‘count’).reset_index()) A Amt count 0 1 30 4 1 1 20 3 2 1 40 2 3 2 40 3 4 2 10 2 More common solution instead … Read more

Logistic Regression PMML won’t Produce Probabilities

July 21, 2023 by Tarik

How to tell which Keras model is better?

June 8, 2023 by Tarik

Do I use the “acc” (from the training data?) one or the “val acc” (from the validation data?) one? If you want to estimate the ability of your model to generalize to new data (which is probably what you want to do), then you look at the validation accuracy, because the validation split contains only … Read more

Apply StandardScaler to parts of a data set [duplicate]

May 20, 2023 by Tarik

Introduced in v0.20 is ColumnTransformer which applies transformers to a specified set of columns of an array or pandas DataFrame. import pandas as pd data = pd.DataFrame({‘Name’ : [3, 4,6], ‘Age’ : [18, 92,98], ‘Weight’ : [68, 59,49]}) col_names = [‘Name’, ‘Age’, ‘Weight’] features = data[col_names] from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler ct … Read more

How to do superscripts and subscripts in Jupyter Notebook?

May 2, 2023 by Tarik

You can do this inside of a markdown cell. A markdown cell can be created by selecting a cell then pressing the esc key followed by the M key. You can tell when you have converted a cell to markdown when In [ ]: seen to the right of the default code cell is gone. … Read more