AI and Machine Learning - An Overview
While preparing for the AWS Certified AI Practitioner exam, I realized that I needed a quick refresher on the key concepts of Artificial Intelligence (AI) and Machine Learning (ML). This blog post serves as a concise overview of the fundamental concepts, techniques, and applications in AI and ML.
A lot of thanks to Stephane Maarek, whose course on Udemy helped me understand the basics of AI and ML, and provided a solid foundation for the exam.
From AI to Generative AI#
Let's start with the very basics.
- Artificial Intelligence (AI) - the simulation of human intelligence processes by machines.
- Machine Learning (ML) - a subset of AI that enables systems to learn from data and improve their performance over time without being explicitly programmed.
- Deep Learning (DL) - a subset of ML that uses neural networks with many layers to analyze various factors of data.
- Generative AI - a type of AI that can generate new content, such as text, images, or music, based on learned patterns from existing data.
AI Components#
AI systems typically consist of the following components:
- Data: needed to train models.
- Algorithms: techniques used to process data and learn patterns.
- Models: The trained representations of patterns in data that can be used for inference or prediction.
- Applications: The practical implementations of AI models in real-world scenarios.
The following diagram illustrates the relationship between these components:
graph LR;
A[Data] --> B[Algorithms];
B --> C[Models];
C --> D[Applications];
D --> E[Users];
Definitions#
Machine Learning
- a subset of AI
- enables systems to learn from data
- improves performance over time without explicit programming
Deep Learning
- a subset of ML
- uses neural networks with many layers
- Node - a basic unit in a neural network that processes input data
- Layer - a collection of nodes that process data in parallel
- Neural Network - a network of interconnected nodes organized in layers
- analyzes various factors of data
- Examples: image recognition, natural language processing
Generative AI
- subset of deep learning
- a type of AI that can generate new content
- based on learned patterns from existing data
- Examples: text generation, image synthesis, music composition
graph TD;
X[Unlabeled Data] --> A[Generative AI];
A --> B[Text Generation];
A --> C[Image Synthesis];
A --> D[Music Composition];
A --> E[Video Generation];
Types of Data#
Data can be categorized into two main types:
- Structured Data: organized in a predefined format (e.g., tables, databases)
- Unstructured Data: not organized in a predefined format (e.g., text, images, audio)
graph TD;
A[Data] --> B[Structured Data];
A --> C[Unstructured Data];
B --> D[Tables];
B --> E[Databases];
C --> F[Text];
C --> G[Images];
C --> H[Audio];
Types of Machine Learning Algorithms#
Machine learning algorithms can be broadly classified into two main categories:
- Supervised Learning: algorithms learn from labeled data, where the input data is paired with the correct output.
- Classification: predicting discrete labels (e.g., spam detection, image classification).
- Regression: predicting continuous values (e.g., house price prediction, stock price forecasting).
- Unsupervised Learning: algorithms learn from unlabeled data, where the input data does not have corresponding output labels.
- Clustering: grouping similar data points together (e.g., customer segmentation, image clustering).
- Association Rule Learning: discovering interesting relationships between variables in large datasets (e.g., market basket analysis).
- Anomaly Detection: identifying unusual patterns or outliers in data (e.g., fraud detection, network security).
- Semi-supervised Learning: a combination of supervised and unsupervised learning, where the algorithm learns from a small amount of labeled data and a large amount of unlabeled data.
- Reinforcement Learning: algorithms learn by interacting with an environment and receiving feedback in the form of rewards or penalties.
- RHLF (Reinforcement Learning with Human Feedback) is a technique that combines reinforcement learning with human feedback to improve the learning process.
- Agent - the learner or decision-maker in a reinforcement learning environment.
- Environment - the external system with which the agent interacts.
- Action - a decision made by the agent that affects the environment.
- Reward - feedback received by the agent from the environment after taking an action, used to reinforce learning.
- State - a representation of the current situation of the environment, which the agent uses to make decisions.
- Policy - a strategy used by the agent to determine the next action based on the current state.
- Value Function - a function that estimates the expected return (cumulative reward) from a given state, guiding the agent's decisions.
graph LR;
A[Machine Learning Algorithms] --> B[Supervised Learning];
A --> C[Unsupervised Learning];
A --> D[Semi-supervised Learning];
A --> E[Reinforcement Learning];
B --> F[Classification];
B --> G[Regression];
C --> H[Clustering];
C --> I[Dimensionality Reduction];
D --> J[Combining Labeled and Unlabeled Data];
E --> K[Agent-Environment Interaction];
Model Fitting#
- Overfitting:
- occurs when a model learns the training data too well,
- capturing noise and outliers,
- leads to poor generalization on new data.
- To prevent overfitting, you can:
- use simpler models,
- apply regularization techniques,
- gather more training data.
- Underfitting:
- occurs when a model is too simple to capture the underlying patterns in the data
- results in poor performance on both training and test data.
- To prevent underfitting, you can:
- use more complex models,
- add more features,
- increase the training time.
- Balanced Model:
- achieves a good trade-off between bias and variance,
- generalizes well to new data.
- Bias: error due to overly simplistic assumptions in the learning algorithm, leading to underfitting.
- High bias models are too rigid and fail to capture the complexity of the data.
- To reduce bias, you can use more complex models or add more features.
- Variance: error due to excessive sensitivity to small fluctuations in the training data, leading to overfitting.
- High variance models are too flexible and capture noise in the data.
- To reduce variance, you can use simpler models, regularization techniques, or gather more training data.
- Bias: error due to overly simplistic assumptions in the learning algorithm, leading to underfitting.
Confusion Matrix#
A confusion matrix is a table used to evaluate the performance of a classification model by comparing predicted and actual labels. It provides insights into the model's accuracy, precision, recall, and F1 score.
quadrantChart
title Confusion Matrix
x-axis Positive --> Negative
y-axis Negative --> Positive
quadrant-1 False Negative
quadrant-2 True Positive
quadrant-3 False Positive
quadrant-4 True Negative
- Precision = True Positives / (True Positives + False Positives)
- Best when false positives are costly (e.g., spam detection).
- Recall = True Positives / (True Positives + False Negatives)
- Best when false negatives are costly (e.g., disease screening).
- F1 Score = 2 (Precision Recall) / (Precision + Recall)
- Best when you need a balance between precision and recall (e.g., information retrieval).
AUC and ROC Curve#
Area uncer the curve - receiver overator curve
- value between 0 and 1
- Uses true positive rate (TPR) and false positive rate (FPR) to evaluate model performance.
Regression metrics#
Regression metrics are used to evaluate the performance of regression models, which predict continuous values. Common regression metrics include:
- Mean Absolute Error (MAE): the average absolute difference between predicted and actual values.
- Mean Absolute Percentage Error (MAPE): the average absolute percentage difference between predicted and actual values, useful for understanding errors relative to the scale of the data.
- Mean Squared Error (MSE): the average of the squared differences between predicted and actual values, penalizing larger errors more than MAE.
- Root Mean Squared Error (RMSE): the square root of MSE, providing an error metric in the same units as the target variable.
- R-squared (Coefficient of Determination): a statistical measure that represents the proportion of variance for a dependent variable that's explained by an independent variable or variables in a regression model.
More terms#
- Feature: an individual measurable property or characteristic of the data.
- Feature Engineering: the process of selecting, modifying, or creating features to improve model performance.
- Hyperparameter: a parameter that is set before training a model and controls the learning process (e.g., learning rate, regularization strength).
- Hyperparameter Tuning: the process of searching for the best hyperparameters to optimize model performance.
- Important hyperparameters include:
- Learning rate: controls how much to change the model in response to the estimated error each time the model weights are updated.
- Regularization strength: controls the amount of regularization applied to prevent overfitting.
- Number of layers and nodes in a neural network: determines the complexity of the model.
- Batch size: the number of training examples used in one iteration of model training.
- Training: the process of teaching a model to learn patterns from data.
- Validation: the process of evaluating a model's performance on a separate dataset during training to tune hyperparameters.
- Testing: the process of evaluating a model's performance on a separate dataset after training to assess its generalization ability.
- Cross-validation: a technique for assessing how the results of a statistical analysis will generalize to an independent dataset, often used to prevent overfitting.
- ROC Curve: a graphical representation of a model's performance across different thresholds, plotting true positive rate against false positive rate.
- AUC (Area Under the Curve): a single scalar value that summarizes the performance of a classification model across all thresholds, with higher values indicating better performance.
- Bias-Variance Tradeoff: the balance between a model's ability to generalize (bias) and its sensitivity to noise in the training data (variance).
- Ensemble Learning: a technique that combines multiple models to improve overall performance, often used to reduce overfitting and increase robustness.
- Transfer Learning: a technique where a pre-trained model is fine-tuned on a new task, leveraging knowledge from previous tasks to improve performance on the new task.
- Explainable AI (XAI): techniques and methods that make AI models more interpretable and understandable to humans, helping to build trust and transparency in AI systems.
- Model Deployment: the process of making a trained model available for use in production environments, often involving integration with applications or services.
- GPT (Generative Pre-trained Transformer): a type of deep learning model designed for natural language processing tasks, capable of generating human-like text based on input prompts.
- LLM (Large Language Model): a type of AI model trained on vast amounts of text data to understand and generate human language, often used in applications like chatbots, translation, and content generation.
- RAG (Retrieval-Augmented Generation): a technique that combines retrieval of relevant information from a knowledge base with generative capabilities to produce more accurate and contextually relevant responses in AI applications.
- LLMOps: a set of practices and tools for managing the lifecycle of large language models, including training, deployment, monitoring, and maintenance.