AI and Machine Learning - An Overview

June 15, 2025 Ravi Kumar Gupta 7 min read

machine learningartificial intelligence

While preparing for the AWS Certified AI Practitioner exam, I realized that I needed a quick refresher on the key concepts of Artificial Intelligence (AI) and Machine Learning (ML). This blog post serves as a concise overview of the fundamental concepts, techniques, and applications in AI and ML.

A lot of thanks to Stephane Maarek, whose course on Udemy helped me understand the basics of AI and ML, and provided a solid foundation for the exam.

From AI to Generative AI#

Let's start with the very basics.

Artificial Intelligence (AI) - the simulation of human intelligence processes by machines.
Machine Learning (ML) - a subset of AI that enables systems to learn from data and improve their performance over time without being explicitly programmed.
Deep Learning (DL) - a subset of ML that uses neural networks with many layers to analyze various factors of data.
Generative AI - a type of AI that can generate new content, such as text, images, or music, based on learned patterns from existing data.

AI Components#

AI systems typically consist of the following components:

Data: needed to train models.
Algorithms: techniques used to process data and learn patterns.
Models: The trained representations of patterns in data that can be used for inference or prediction.
Applications: The practical implementations of AI models in real-world scenarios.

The following diagram illustrates the relationship between these components:

graph LR;
    A[Data] --> B[Algorithms];
    B --> C[Models];
    C --> D[Applications];
    D --> E[Users];

Definitions#

Machine Learning

a subset of AI
enables systems to learn from data
improves performance over time without explicit programming

Deep Learning

a subset of ML
uses neural networks with many layers
- Node - a basic unit in a neural network that processes input data
- Layer - a collection of nodes that process data in parallel
- Neural Network - a network of interconnected nodes organized in layers
analyzes various factors of data
Examples: image recognition, natural language processing

Generative AI

subset of deep learning
a type of AI that can generate new content
based on learned patterns from existing data
Examples: text generation, image synthesis, music composition

graph TD;
    X[Unlabeled Data] --> A[Generative AI];
    A --> B[Text Generation];
    A --> C[Image Synthesis];
    A --> D[Music Composition];
    A --> E[Video Generation];

Types of Data#

Data can be categorized into two main types:

Structured Data: organized in a predefined format (e.g., tables, databases)
Unstructured Data: not organized in a predefined format (e.g., text, images, audio)

graph TD;
    A[Data] --> B[Structured Data];
    A --> C[Unstructured Data];
    B --> D[Tables];
    B --> E[Databases];
    C --> F[Text];
    C --> G[Images];
    C --> H[Audio];

Types of Machine Learning Algorithms#

Machine learning algorithms can be broadly classified into two main categories:

Supervised Learning: algorithms learn from labeled data, where the input data is paired with the correct output.
- Classification: predicting discrete labels (e.g., spam detection, image classification).
- Regression: predicting continuous values (e.g., house price prediction, stock price forecasting).
Unsupervised Learning: algorithms learn from unlabeled data, where the input data does not have corresponding output labels.
- Clustering: grouping similar data points together (e.g., customer segmentation, image clustering).
- Association Rule Learning: discovering interesting relationships between variables in large datasets (e.g., market basket analysis).
- Anomaly Detection: identifying unusual patterns or outliers in data (e.g., fraud detection, network security).
Semi-supervised Learning: a combination of supervised and unsupervised learning, where the algorithm learns from a small amount of labeled data and a large amount of unlabeled data.
Reinforcement Learning: algorithms learn by interacting with an environment and receiving feedback in the form of rewards or penalties.
- RHLF (Reinforcement Learning with Human Feedback) is a technique that combines reinforcement learning with human feedback to improve the learning process.
- Agent - the learner or decision-maker in a reinforcement learning environment.
- Environment - the external system with which the agent interacts.
- Action - a decision made by the agent that affects the environment.
- Reward - feedback received by the agent from the environment after taking an action, used to reinforce learning.
- State - a representation of the current situation of the environment, which the agent uses to make decisions.
- Policy - a strategy used by the agent to determine the next action based on the current state.
- Value Function - a function that estimates the expected return (cumulative reward) from a given state, guiding the agent's decisions.

graph LR;
    A[Machine Learning Algorithms] --> B[Supervised Learning];
    A --> C[Unsupervised Learning];
    A --> D[Semi-supervised Learning];
    A --> E[Reinforcement Learning];
    B --> F[Classification];
    B --> G[Regression];
    C --> H[Clustering];
    C --> I[Dimensionality Reduction];
    D --> J[Combining Labeled and Unlabeled Data];
    E --> K[Agent-Environment Interaction];

Model Fitting#

Overfitting:
- occurs when a model learns the training data too well,
- capturing noise and outliers,
- leads to poor generalization on new data.
- To prevent overfitting, you can:
  - use simpler models,
  - apply regularization techniques,
  - gather more training data.
Underfitting:
- occurs when a model is too simple to capture the underlying patterns in the data
- results in poor performance on both training and test data.
- To prevent underfitting, you can:
  - use more complex models,
  - add more features,
  - increase the training time.
Balanced Model:
- achieves a good trade-off between bias and variance,
- generalizes well to new data.
  - Bias: error due to overly simplistic assumptions in the learning algorithm, leading to underfitting.
    - High bias models are too rigid and fail to capture the complexity of the data.
    - To reduce bias, you can use more complex models or add more features.
  - Variance: error due to excessive sensitivity to small fluctuations in the training data, leading to overfitting.
    - High variance models are too flexible and capture noise in the data.
    - To reduce variance, you can use simpler models, regularization techniques, or gather more training data.

Confusion Matrix#

A confusion matrix is a table used to evaluate the performance of a classification model by comparing predicted and actual labels. It provides insights into the model's accuracy, precision, recall, and F1 score.

quadrantChart
    title Confusion Matrix
    x-axis Positive --> Negative
    y-axis Negative --> Positive
    quadrant-1 False Negative
    quadrant-2 True Positive
    quadrant-3 False Positive
    quadrant-4 True Negative

Precision = True Positives / (True Positives + False Positives)
- Best when false positives are costly (e.g., spam detection).
Recall = True Positives / (True Positives + False Negatives)
- Best when false negatives are costly (e.g., disease screening).
F1 Score = 2 (Precision Recall) / (Precision + Recall)
- Best when you need a balance between precision and recall (e.g., information retrieval).

AUC and ROC Curve#

Area uncer the curve - receiver overator curve

value between 0 and 1
Uses true positive rate (TPR) and false positive rate (FPR) to evaluate model performance.

Regression metrics#

Regression metrics are used to evaluate the performance of regression models, which predict continuous values. Common regression metrics include:

Mean Absolute Error (MAE): the average absolute difference between predicted and actual values.
Mean Absolute Percentage Error (MAPE): the average absolute percentage difference between predicted and actual values, useful for understanding errors relative to the scale of the data.
Mean Squared Error (MSE): the average of the squared differences between predicted and actual values, penalizing larger errors more than MAE.
Root Mean Squared Error (RMSE): the square root of MSE, providing an error metric in the same units as the target variable.
R-squared (Coefficient of Determination): a statistical measure that represents the proportion of variance for a dependent variable that's explained by an independent variable or variables in a regression model.

More terms#

Feature: an individual measurable property or characteristic of the data.
Feature Engineering: the process of selecting, modifying, or creating features to improve model performance.
Hyperparameter: a parameter that is set before training a model and controls the learning process (e.g., learning rate, regularization strength).
- Hyperparameter Tuning: the process of searching for the best hyperparameters to optimize model performance.
- Important hyperparameters include:
  - Learning rate: controls how much to change the model in response to the estimated error each time the model weights are updated.
  - Regularization strength: controls the amount of regularization applied to prevent overfitting.
  - Number of layers and nodes in a neural network: determines the complexity of the model.
  - Batch size: the number of training examples used in one iteration of model training.
Training: the process of teaching a model to learn patterns from data.
Validation: the process of evaluating a model's performance on a separate dataset during training to tune hyperparameters.
Testing: the process of evaluating a model's performance on a separate dataset after training to assess its generalization ability.
Cross-validation: a technique for assessing how the results of a statistical analysis will generalize to an independent dataset, often used to prevent overfitting.
ROC Curve: a graphical representation of a model's performance across different thresholds, plotting true positive rate against false positive rate.
AUC (Area Under the Curve): a single scalar value that summarizes the performance of a classification model across all thresholds, with higher values indicating better performance.
Bias-Variance Tradeoff: the balance between a model's ability to generalize (bias) and its sensitivity to noise in the training data (variance).
Ensemble Learning: a technique that combines multiple models to improve overall performance, often used to reduce overfitting and increase robustness.
Transfer Learning: a technique where a pre-trained model is fine-tuned on a new task, leveraging knowledge from previous tasks to improve performance on the new task.
Explainable AI (XAI): techniques and methods that make AI models more interpretable and understandable to humans, helping to build trust and transparency in AI systems.
Model Deployment: the process of making a trained model available for use in production environments, often involving integration with applications or services.
GPT (Generative Pre-trained Transformer): a type of deep learning model designed for natural language processing tasks, capable of generating human-like text based on input prompts.
LLM (Large Language Model): a type of AI model trained on vast amounts of text data to understand and generate human language, often used in applications like chatbots, translation, and content generation.
RAG (Retrieval-Augmented Generation): a technique that combines retrieval of relevant information from a knowledge base with generative capabilities to produce more accurate and contextually relevant responses in AI applications.
LLMOps: a set of practices and tools for managing the lifecycle of large language models, including training, deployment, monitoring, and maintenance.