Machine Learning Tutorials
Go from fundamentals to real models — supervised & unsupervised learning, regression, classification, clustering, evaluation, and neural networks.
Introduction to Machine Learning
Understand what machine learning is, how it differs from rule-based programming, where it is used, the core terminology, and see one complete end-to-end scikit-learn example from data to prediction.
Types of Machine Learning
Understand the main machine learning paradigms — supervised (classification and regression), unsupervised (clustering, dimensionality reduction, association), reinforcement, and semi/self-supervised learning — with real-world examples and how to decide which one your problem needs.
The Machine Learning Workflow
A step-by-step tour of the end-to-end machine learning lifecycle — from framing the problem and understanding data through training, evaluation, tuning, deployment, and monitoring — with a concrete scikit-learn walkthrough and Pipelines.
Data Preprocessing & Cleaning
Turn messy raw data into a model-ready dataset — handle missing values, duplicates and outliers, encode categorical variables, fix dirty data, and combine steps with ColumnTransformer while avoiding data leakage.
Feature Engineering & Scaling
Craft better inputs for your models — create ratios, datetime and text features, interaction and polynomial terms, then scale with StandardScaler, MinMaxScaler and RobustScaler, and select the features that actually matter.
Train-Test Split & Cross-Validation
Learn why models must be judged on unseen data, how to split with train_test_split, and how k-fold cross-validation gives a robust performance estimate while pipelines block silent data leakage.
Linear Regression
Learn how linear regression predicts a continuous target — the model equation, mean squared error, fitting via the normal equation and gradient descent, coefficient interpretation, R-squared, assumptions, and a full scikit-learn example.
Logistic Regression
Learn how logistic regression turns a linear model into a probability for classification — the sigmoid function, decision boundary, log loss, interpreting coefficients as odds ratios, and a full scikit-learn workflow.
K-Nearest Neighbors (KNN)
Learn the instance-based KNN algorithm — how majority vote and averaging make predictions, the distance metrics behind them, choosing k, why feature scaling is essential, the curse of dimensionality, and a full scikit-learn workflow.
Decision Trees
Understand how decision trees split the feature space with if/else questions — Gini impurity, entropy and information gain, regression trees, controlling overfitting with depth and pruning, feature importance, and a full scikit-learn workflow with plot_tree.
Random Forests
Learn how a Random Forest combines many decorrelated decision trees through bagging and random feature selection to cut variance, get free validation from out-of-bag error, read feature importances, and tune the key hyperparameters in scikit-learn.
Naive Bayes
Learn how Naive Bayes turns Bayes theorem into a fast probabilistic classifier — the conditional-independence assumption, the GaussianNB, MultinomialNB and BernoulliNB variants, a full spam-detection pipeline with CountVectorizer, and Laplace smoothing.
Support Vector Machines (SVM)
Learn how SVMs find the maximum-margin hyperplane between classes, why the closest points (support vectors) matter, hard vs soft margins and the C parameter, the kernel trick with RBF and gamma, why scaling is essential, and a full scikit-learn workflow.
K-Means Clustering
Learn how K-Means groups unlabeled data into k clusters — the assign-and-update algorithm, the inertia objective, choosing k with the elbow method and silhouette score, k-means++ initialization, why scaling matters, and a full scikit-learn customer-segmentation workflow.
Hierarchical Clustering & DBSCAN
Two clustering methods beyond k-means — agglomerative hierarchical clustering with linkage and dendrograms, and density-based DBSCAN that finds arbitrary shapes and labels outliers, with scikit-learn examples and a comparison table.
Dimensionality Reduction & PCA
Understand why high-dimensional data hurts models and how Principal Component Analysis compresses features into a few high-variance directions — with the intuition, the scaling requirement, explained-variance ratios, and scikit-learn code for 2D visualisation.
Model Evaluation Metrics
Judge classification and regression models correctly — the confusion matrix, accuracy, precision, recall, F1, ROC-AUC, multiclass averaging, and MAE, MSE, RMSE, R-squared, MAPE — with scikit-learn code and guidance on choosing the metric that matches your business goal.
Bias-Variance, Overfitting & Regularization
Understand the bias-variance trade-off, diagnose underfitting and overfitting with learning curves, and fight variance using more data, simpler models, cross-validation, early stopping, and L1/L2/ElasticNet regularization.
Ensemble Methods — Bagging, Boosting & XGBoost
Learn how bagging, boosting, and stacking combine many weak learners into one strong model — AdaBoost and gradient boosting explained, XGBoost, LightGBM and CatBoost, voting classifiers, key hyperparameters, and how to avoid overfitting with early stopping.
Introduction to Neural Networks & Deep Learning
Bridge from classic ML to deep learning — the perceptron, layers, activation functions, forward pass, loss, backpropagation and gradient descent, why deep learning wins on images/text/audio, and a tiny illustrative Keras example.