← index
AI Mastery Path
Mathematics for AI
The exact mathematics you need to master AI — machine learning, deep learning, and large language models — and the computer science under them. It starts at counting and assumes nothing. Every lesson is here because something in AI demands it: nothing more, nothing less, each one resting on the last.
Curriculum · 13 areas, 100 lessons
Part I · Arithmetic & number sense
Start from zero — what numbers are and how they combine. The ground everything else stands on.
Part II · Algebra — the language of patterns
Letters for numbers: the notation every later idea is written in, and the function — the object ML is built from.
- 8Variables and expressionssoon
- 9Solving linear equationssoon
- 10Inequalitiessoon
- 11The laws of exponentssoon
- 12Polynomials and factoringsoon
- 13The coordinate plane and graphssoon
- 14What a function is — the input–output machinesoon
Part III · Logic, sets & counting
The discrete foundation under both probability and computer science — how to reason, and how to count.
- 15Statements and logic — and, or, not, if–thensoon
- 16Sets and their operationssoon
- 17Relations and functions, formallysoon
- 18Counting — permutations and combinationssoon
- 19Proof and mathematical inductionsoon
- 20Sequences, recursion, and recurrencessoon
Part IV · Functions, exponentials & logarithms
The specific functions ML runs on — exponentials, logs, sigmoids — and the rate-of-change idea that opens calculus.
- 21Linear functions and slope — the seed of "rate of change"soon
- 22Quadratic and polynomial functionssoon
- 23Exponential functions — growth and decaysoon
- 24Logarithms→
- 25The number e and the natural logarithmsoon
- 26The logistic (sigmoid) function — your first activationsoon
- 27Sinusoids and periodicity — for positional encodingssoon
Part V · Single-variable calculus
The mathematics of change. The derivative and the chain rule are the literal machinery of how networks learn.
- 28Limits, intuitivelysoon
- 29Continuitysoon
- 30The derivative as a rate of change→
- 31Rules of differentiationsoon
- 32The chain rule — the engine of backpropagationsoon
- 33Maxima, minima, and optimizationsoon
- 34The integral as accumulationsoon
- 35The fundamental theorem of calculussoon
Part VI · Linear algebra — the engine of ML
Vectors, matrices, and the operations on them. This is the single most-used branch of math in all of AI.
- 36Vectors — points, arrows, and lists of numberssoon
- 37Vector addition, scaling, and linear combinationssoon
- 38The dot product, norms, and distance — similaritysoon
- 39Matrices and matrix multiplicationsoon
- 40Matrices as linear transformationssoon
- 41Systems of linear equations and Gaussian eliminationsoon
- 42Linear independence, span, basis, and ranksoon
- 43The four fundamental subspacessoon
- 44Determinants and invertibilitysoon
- 45Eigenvalues and eigenvectorssoon
- 46Diagonalization and the spectral theoremsoon
- 47The singular value decomposition (SVD)soon
- 48Orthogonality, projections, and least squaressoon
- 49Tensors and broadcasting — the data of deep learningsoon
Part VII · Multivariable & matrix calculus
Calculus and linear algebra meet. Gradients, Jacobians, and the chain rule on a computation graph: this is backprop.
- 50Functions of several variablessoon
- 51Partial derivativessoon
- 52The gradient and directional derivatives — steepest ascentsoon
- 53The multivariable chain rulesoon
- 54The Jacobian and the Hessiansoon
- 55Matrix calculus — gradients of vector and matrix expressionssoon
- 56Backpropagation as the chain rule on a computation graphsoon
Part VIII · Optimization — how models learn
Turning learning into minimizing a loss, and the algorithms — gradient descent and its kin — that actually do it.
- 57The optimization problem — minimizing a losssoon
- 58Convexity and why it matterssoon
- 59Gradient descentsoon
- 60Stochastic and mini-batch gradient descentsoon
- 61Momentum and adaptive methods (Adam)soon
- 62Constrained optimization and Lagrange multiplierssoon
- 63Regularization — L1, L2, and the bias toward simplicitysoon
Part IX · Probability — reasoning under uncertainty
AI is probabilistic to the core. Distributions, expectation, and Bayes' theorem are the language of every model's predictions.
- 64Sample spaces, events, and probabilitysoon
- 65Conditional probability and independencesoon
- 66Bayes' theoremsoon
- 67Random variables — discrete and continuoussoon
- 68Distributions you'll meet — Bernoulli, binomial, Gaussiansoon
- 69Expectation, variance, and covariancesoon
- 70Joint, marginal, and conditional distributionssoon
- 71The law of large numbers and the central limit theoremsoon
- 72Sampling and Monte Carlo methodssoon
Part X · Statistics & data analysis
Learning from data: estimation, maximum likelihood, the bias–variance tradeoff, and the first models that come straight out of statistics.
- 73Descriptive statistics and data summariessoon
- 74Populations, samples, and sampling distributionssoon
- 75Estimation and confidence intervalssoon
- 76Maximum likelihood estimation — the objective behind most MLsoon
- 77Hypothesis testing, p-values, and A/B testingsoon
- 78The bias–variance tradeoffsoon
- 79Correlation, causation, and confounderssoon
- 80Linear regression — your first modelsoon
- 81Logistic regression and classificationsoon
Part XI · Information theory — the math of LLMs
Entropy, cross-entropy, and KL divergence: the quantities that define how language models are trained and scored.
- 82Information and surprisesoon
- 83Entropy — measuring uncertaintysoon
- 84Cross-entropy — the loss function of language modelssoon
- 85KL divergence — the distance between distributionssoon
- 86Mutual informationsoon
- 87Perplexity — how language models are scoredsoon
Part XII · Algorithms & computation
The math under computer science and data structures — how to measure cost and reason about programs. Builds on Part III; take it any time after.
- 88Asymptotics and Big-O notationsoon
- 89Analyzing recursion and recurrence relationssoon
- 90Combinatorics for algorithmssoon
- 91Graphs and treessoon
- 92Graph algorithms and traversalsoon
- 93Computational complexity — P, NP, and intractabilitysoon
- 94Modular arithmetic and hashingsoon
Part XIII · Numerical & computational methods
Where the math meets the machine — floating point, numerical stability, and how autodiff actually computes every gradient.
- 95Floating-point numbers and precisionsoon
- 96Numerical stability — log-sum-exp and the softmax tricksoon
- 97The cost of linear algebra — vectorization and complexitysoon
- 98Iterative and approximate methodssoon
- 99Automatic differentiation — how backprop is really computedsoon
- 100Putting it together — the math inside a training loopsoon