User guide: contents
- 1. Supervised learning
- 1.1. Generalized Linear Models
- 1.1.1. Ordinary Least Squares
- 1.1.2. Ridge Regression
- 1.1.3. Lasso
- 1.1.4. Elastic Net
- 1.1.5. Multi-task Lasso
- 1.1.6. Least Angle Regression
- 1.1.7. LARS Lasso
- 1.1.8. Orthogonal Matching Pursuit (OMP)
- 1.1.9. Bayesian Regression
- 1.1.10. Logistic regression
- 1.1.11. Stochastic Gradient Descent - SGD
- 1.1.12. Perceptron
- 1.1.13. Passive Aggressive Algorithms
- 1.2. Support Vector Machines
- 1.3. Stochastic Gradient Descent
- 1.4. Nearest Neighbors
- 1.5. Gaussian Processes
- 1.6. Cross decomposition
- 1.7. Naive Bayes
- 1.8. Decision Trees
- 1.9. Ensemble methods
- 1.10. Multiclass and multilabel algorithms
- 1.11. Feature selection
- 1.12. Semi-Supervised
- 1.13. Linear and quadratic discriminant analysis
- 1.14. Isotonic regression
- 1.1. Generalized Linear Models
- 2. Unsupervised learning
- 2.1. Gaussian mixture models
- 2.2. Manifold learning
- 2.3. Clustering
- 2.3.1. Overview of clustering methods
- 2.3.2. K-means
- 2.3.3. Affinity Propagation
- 2.3.4. Mean Shift
- 2.3.5. Spectral clustering
- 2.3.6. Hierarchical clustering
- 2.3.7. DBSCAN
- 2.3.8. Clustering performance evaluation
- 2.4. Biclustering
- 2.5. Decomposing signals in components (matrix factorization problems)
- 2.6. Covariance estimation
- 2.7. Novelty and Outlier Detection
- 2.8. Hidden Markov Models
- 2.9. Density Estimation
- 2.10. Neural network models (unsupervised)
- 3. Model selection and evaluation
- 3.1. Cross-validation: evaluating estimator performance
- 3.2. Grid Search: Searching for estimator parameters
- 3.2.1. Exhaustive Grid Search
- 3.2.2. Randomized Parameter Optimization
- 3.2.3. Alternatives to brute force parameter search
- 3.2.3.1. Model specific cross-validation
- 3.2.3.2. Information Criterion
- 3.2.3.3. Out of Bag Estimates
- 3.2.3.3.1. sklearn.ensemble.RandomForestClassifier
- 3.2.3.3.2. sklearn.ensemble.RandomForestRegressor
- 3.2.3.3.3. sklearn.ensemble.ExtraTreesClassifier
- 3.2.3.3.4. sklearn.ensemble.ExtraTreesRegressor
- 3.2.3.3.5. sklearn.ensemble.GradientBoostingClassifier
- 3.2.3.3.6. sklearn.ensemble.GradientBoostingRegressor
- 3.3. Pipeline: chaining estimators
- 3.4. FeatureUnion: Combining feature extractors
- 3.5. Model evaluation: quantifying the quality of predictions
- 3.5.1. The scoring parameter: defining model evaluation rules
- 3.5.2. Function for prediction-error metrics
- 3.5.2.1. Classification metrics
- 3.5.2.1.1. Accuracy score
- 3.5.2.1.2. Average precision score
- 3.5.2.1.3. Confusion matrix
- 3.5.2.1.4. Classification report
- 3.5.2.1.5. Hamming loss
- 3.5.2.1.6. Jaccard similarity coefficient score
- 3.5.2.1.7. Precision, recall and F-measures
- 3.5.2.1.8. Hinge loss
- 3.5.2.1.9. Log loss
- 3.5.2.1.10. Matthews correlation coefficient
- 3.5.2.1.11. Receiver operating characteristic (ROC)
- 3.5.2.1.12. Zero one loss
- 3.5.2.2. Regression metrics
- 3.5.2.1. Classification metrics
- 3.5.3. Clustering metrics
- 3.5.4. Biclustering metrics
- 3.5.5. Dummy estimators
- 4. Dataset transformations
- 4.1. Feature extraction
- 4.1.1. Loading features from dicts
- 4.1.2. Feature hashing
- 4.1.3. Text feature extraction
- 4.1.3.1. The Bag of Words representation
- 4.1.3.2. Sparsity
- 4.1.3.3. Common Vectorizer usage
- 4.1.3.4. Tf–idf term weighting
- 4.1.3.5. Decoding text files
- 4.1.3.6. Applications and examples
- 4.1.3.7. Limitations of the Bag of Words representation
- 4.1.3.8. Vectorizing a large text corpus with the hashing trick
- 4.1.3.9. Performing out-of-core scaling with HashingVectorizer
- 4.1.3.10. Customizing the vectorizer classes
- 4.1.4. Image feature extraction
- 4.2. Preprocessing data
- 4.3. Kernel Approximation
- 4.4. Random Projection
- 4.5. Pairwise metrics, Affinities and Kernels
- 4.1. Feature extraction
- 5. Dataset loading utilities
- 5.1. General dataset API
- 5.2. Toy datasets
- 5.3. Sample images
- 5.4. Sample generators
- 5.5. Datasets in svmlight / libsvm format
- 5.6. The Olivetti faces dataset
- 5.7. The 20 newsgroups text dataset
- 5.8. Downloading datasets from the mldata.org repository
- 5.9. The Labeled Faces in the Wild face recognition dataset
- 5.10. Forest covertypes