How to split data on balanced training set and test set on sklearn

Although Christian’s suggestion is correct, technically train_test_split should give you stratified results by using the stratify param. So you could do: X_train, X_test, y_train, y_test = cross_validation.train_test_split(Data, Target, test_size=0.3, random_state=0, stratify=Target) The trick here is that it starts from version 0.17 in sklearn. From the documentation about the parameter stratify: stratify : array-like or None … Read more

Determining the most contributing features for SVM classifier in sklearn

Yes, there is attribute coef_ for SVM classifier but it only works for SVM with linear kernel. For other kernels it is not possible because data are transformed by kernel method to another space, which is not related to input space, check the explanation. from matplotlib import pyplot as plt from sklearn import svm def … Read more

Why is scikit-learn SVM.SVC() extremely slow?

General remarks about SVM-learning SVM-training with nonlinear-kernels, which is default in sklearn’s SVC, is complexity-wise approximately: O(n_samples^2 * n_features) link to some question with this approximation given by one of sklearn’s devs. This applies to the SMO-algorithm used within libsvm, which is the core-solver in sklearn for this type of problem. This changes much when … Read more

Converting LinearSVC’s decision function to probabilities (Scikit learn python )

scikit-learn provides CalibratedClassifierCV which can be used to solve this problem: it allows to add probability output to LinearSVC or any other classifier which implements decision_function method: svm = LinearSVC() clf = CalibratedClassifierCV(svm) clf.fit(X_train, y_train) y_proba = clf.predict_proba(X_test) User guide has a nice section on that. By default CalibratedClassifierCV+LinearSVC will get you Platt scaling, but … Read more

How to do multi class classification using Support Vector Machines (SVM)

LibSVM uses the one-against-one approach for multi-class learning problems. From the FAQ: Q: What method does libsvm use for multi-class SVM ? Why don’t you use the “1-against-the rest” method ? It is one-against-one. We chose it after doing the following comparison: C.-W. Hsu and C.-J. Lin. A comparison of methods for multi-class support vector … Read more

Does the SVM in sklearn support incremental (online) learning?

While online algorithms for SVMs do exist, it has become important to specify if you want kernel or linear SVMs, as many efficient algorithms have been developed for the special case of linear SVMs. For the linear case, if you use the SGD classifier in scikit-learn with the hinge loss and L2 regularization you will … Read more

Making SVM run faster in python

If you want to stick with SVC as much as possible and train on the full dataset, you can use ensembles of SVCs that are trained on subsets of the data to reduce the number of records per classifier (which apparently has quadratic influence on complexity). Scikit supports that with the BaggingClassifier wrapper. That should … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)