Advanced Machine Learning Algorithms

Performance Evaluation

Definition

Performance evaluation is the process of assessing the effectiveness and accuracy of a machine learning model on a given dataset.

Performance Evaluation Methods

Linearity-based Models

Objective

Linearity-based models assume that the relationship between input variables (features) and the output (target) can be represented as a linear equation.

Algorithms

Advantages

Limitations

Distance-based Models

Objective

These algorithms rely on measuring the "distance" between points in feature space to classify or predict new instances.

Algorithms

Advantages

Limitations

Probabilistic-based Models

Objective

These models aim to predict the likelihood of different outcomes. They are especially useful when data is noisy or uncertain and can be applied to both classification and regression problems.

Algorithms

Advantages

Limitations

Tree-based Models

These models use a tree-like structure to represent decisions and their possible consequences. They split the data into branches based on feature values, with the goal of making more accurate predictions with each split.

Algorithms:

Advantages

Limitations

Naïve Bayes

Definition

Naïve Bayes is a classification algorithm based on Bayes' theorem, which assumes that features are conditionally independent given the class.

$$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$$

Remark

Naïve Bayes assumes that the presence or absence of features is independent of each other given the class. That is why it is called Naïve.

Likelihood in Naïve Bayes

In Naïve Bayes algorithm, "likelihood" refers to the probability of observing a particular feature value given a class label. It quantifies how likely it is to see a specific feature value when we know the class of the data point.

Decision Tree

Decision Tree is a Supervised learning algorithm that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules, and each leaf node represents the outcome.

Definition

A Decision Tree is a flowchart-like structure where each internal node represents a test on a feature, each branch represents the outcome of the test, and each leaf node represents a class label, decision taken after computing all features.

Vocabulary

Decision Tree Algorithm

The algorithm to build a decision tree is as follows:

  1. Start with the root node and all training data.

  2. For each node, select the best feature to split on based on a criterion.

  3. Split the data into subsets based on the selected feature.

  4. Create child nodes for each subset and repeat the process recursively until a stopping condition is met.

Attribute Selection Measure, ASM

Definition

Attribute Selection Measure is a method used to determine the best feature to split on at each node in a decision tree. It evaluates the quality of a split based on how well it separates the data into distinct classes.

Information Gain

Definition

Information gain is a measure used to determine which feature should be used to split the data at each internal node of the decision tree. It is calculated using entropy.

Formula

$$\text{IG}(T, A) = \text{Entropy}(T) - \sum_{v \in \text{Values}(A)} \frac{|T_v|}{|T|} \cdot \text{Entropy}(T_v)$$

where $T$ is the set of training instances, $A$ is the attribute being evaluated, $T_v$ is the subset of $T$ where attribute $A$ has value $v$, and $\text{Values}(A)$ is the set of all possible values for attribute $A$.

Entropy

Definition

Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data. In a decision tree, the goal is to decrease the entropy of the dataset by creating more pure subsets of data. Since entropy is a measure of impurity, by decreasing the entropy, we are increasing the purity of the data.

$$ \text{Entropy}(S) = - \sum_{i=1}^{c} p_i \cdot \log_2(p_i) $$

where $p_i$ is the proportion of instances in class $i$ and $c$ is the total number of classes.

Gini Impurity

Definition

Gini impurity is a measure of how often a randomly chosen element from the dataset would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the dataset.

$$ \text{Gini}(S) = 1 - \sum_{i=1}^{c} p_i^2 $$

where $p_i$ is the proportion of instances in class $i$ and $c$ is the total number of classes.

Random Forest

Definition

Random Forest is an ensemble learning method that constructs multiple decision trees during training and outputs the mode of the classes, for classification, or mean prediction, for regression, of the individual trees.

Random Forest Algorithm

The algorithm to build a random forest is as follows:

  1. From the training dataset, create multiple bootstrap samples.

  2. For each bootstrap sample, train a decision tree using a random subset of features at each split.

  3. Repeat the process to create a large number of trees (the forest).

  4. For classification, aggregate the predictions of all trees by majority vote; for regression, average the predictions.

Neural Networks

Definition

Neural networks are a class of machine learning models inspired by the structure and function of the human brain. They consist of interconnected nodes, neurons, organized in layers, where each connection has a weight that is adjusted during training to minimize prediction errors.

Neural Networks Techniques

Artificial Neural Networks, ANN

Artificial Neural Networks are computational models inspired by the human brain's neural networks. They consist of layers of nodes, called neurons, which are interconnected and work together to process information and make predictions or decisions based on input data.

Artificial Neural Networks Architecture

The architecture of an Artificial Neural Network typically consists of three main types of layers: input layer, hidden layers, and output layer. Each layer is made up of neurons that are connected to neurons in the adjacent layers through weighted connections.

Vocabulary

Deepl Learning

Definition

Deep Learning is a subset of machine learning that focuses on using neural networks with many layers, deep neural networks, to model complex patterns in data. It has been particularly successful in areas such as image and speech recognition, natural language processing, and game playing.

Feature Engineering

Definition

Feature engineering is the process of using domain knowledge to extract or create features from raw data that can improve the performance of machine learning models.

Feature Engineering Techniques

Scaling

Definition

Scaling is the process of transforming features to a specific range, often $[0, 1]$ or $[-1, 1]$. This is important because many machine learning algorithms are sensitive to the scale of the input data.

Normalisation

Definition

Normalisation is the process of transforming features to a specific range, often $[0, 1]$ or $[-1, 1]$. This is important because many machine learning algorithms are sensitive to the scale of the input data.

Z-score Normalization, Standardization

Definition

Z-score normalization transforms features to have a mean of 0 and a standard deviation of 1. This is achieved by subtracting the mean and dividing by the standard deviation for each feature.

Formula

$$ z = \frac{(x - \mu)}{\sigma} $$

Min-Max scaling

Definition

Min-Max scaling transforms features to a fixed range, typically $[0, 1]$. It is calculated by subtracting the minimum value and dividing by the range for each feature.

Formula

$$ x' = \frac{x - x_{min}}{x_{max} - x_{min}} $$

Hyperparameters

Definition

Hyperparameters are parameters that are set before training a machine learning model. They control the behavior of the learning algorithm and can significantly impact the model's performance.

Learning Rate

Definition

The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated.

Regularisation

Definition

Regularisation is a technique used to prevent overfitting in machine learning models. It adds a penalty term to the loss function, which discourages the model from learning overly complex patterns.

Hyperparameter Tuning Techniques

Definition

Hyperparameter tuning techniques are methods used to find the optimal values for hyperparameters in machine learning models. These techniques help improve model performance by systematically adjusting the model's parameters.