After completing the Principal Component Analysis section of module 1 I asked Gemini to suggest anything else I should know about PCA and any other related items. It also split the module into parts. Part A is titled “Linear Algebra in Action – The Geometric Interpreter” and consists of the PCA I have already completed, with the addition of creating a scree plot.
Part B is titled “Advanced Dimensionality Reduction” and has three sections. Section 1 consists of calculating how many principal components are needed to capture a specific cumulative variance threshold. Section 2 covers Kernel PCA, an approach to performing PCA on data with non-linear relationships. Section 3 entails using different techniques, t-SNE and UMAP, to visualize the data.
Part C is titled “Calculus and Statistical Analysis” and has two sections. Section 1 covers implementing a gradient decent algorithm from scratch. Section 2 entails telling a data story that summarize all findings from PCA, Kernel PCA, t-SNE, and UMAP.
I have reproduced the updated module 1 outline below.
Module 1: The Mathematical Bedrock (Revised)
Gemini 2.5 Pro
This module is designed to solidify the mathematical concepts that form the language of machine learning by building tools from scratch and exploring the nuances of dimensionality reduction.
Part A: Linear Algebra in Action – The Geometric Interpreter
Project: Implement Principal Component Analysis (PCA) from scratch to transform the wine dataset into a 2D space and visualize it.
Tasks:
1. Standardize the data.
2. Calculate the covariance matrix from scratch.
3. Implement the Power Iteration method to find all 11 eigenvectors and eigenvalues of the covariance matrix.
4. Create a Scree Plot by plotting the eigenvalues in descending order to visualize the “elbow” and the importance of each component.
5. Project the data onto the first two principal components and create the 2D scatter plot, colored by wine quality.
Goal: To understand how to manually dissect and visualize high-dimensional data using PCA and to interpret the meaning of the principal components.
Part B: Advanced Dimensionality Reduction
Project: Apply advanced and non-linear techniques to the same dataset to compare their results and understand their distinct advantages.
Choosing the Right Number of Dimensions:
Task: Write a function that calculates how many principal components are needed to capture a specific cumulative variance threshold (e.g., 90%).
Goal: To learn the standard method for selecting the number of components for data preprocessing in a machine learning pipeline, beyond simple visualization.
Non-Linearity with Kernel PCA:
Task: Using a library like Scikit-learn, implement Kernel PCA (with an RBF kernel) on the wine data and generate a new 2D visualization.
Goal: To see how a non-linear method might reveal different structures compared to standard PCA and to understand when it might be preferred.
Visualizing Local Structure with t-SNE and UMAP:
Task: Create two more visualizations of the wine data: one using t-SNE and one using UMAP. Experiment with their key parameters (e.g.,perplexityfor t-SNE,n_neighborsfor UMAP).
Goal: To compare these modern visualization techniques against PCA. Observe how they prioritize preserving local neighborhood structures, often revealing clearer clusters than PCA is capable of showing.
Part C: Calculus and Statistical Analysis
Project: Implement a simple optimization algorithm and formalize the data’s story.
Calculus in a Computational Context:
Task: Implement a simple gradient descent algorithm from the ground up to find the minimum of a function.
Goal: To understand the fundamental iterative process that powers the training of most machine learning models.
Probability and Statistics for Data Storytelling:
Task: Formally write a “Data Story” that summarizes your findings from all the visualizations (PCA, Kernel PCA, t-SNE, UMAP). Analyze feature distributions and correlations to support your narrative.
Goal: To practice communicating data-driven insights, proposing hypotheses based on visual and statistical evidence.
Leave a comment