AI Mastery Path

Mathematics for AI

The exact mathematics you need to master AI — machine learning, deep learning, and large language models — and the computer science under them. It starts at counting and assumes nothing. Every lesson is here because something in AI demands it: nothing more, nothing less, each one resting on the last.

Curriculum · 13 areas, 100 lessons

Part I · Arithmetic & number sense

Start from zero — what numbers are and how they combine. The ground everything else stands on.

Part II · Algebra — the language of patterns

Letters for numbers: the notation every later idea is written in, and the function — the object ML is built from.

8Variables and expressionssoon
9Solving linear equationssoon
10Inequalitiessoon
11The laws of exponentssoon
12Polynomials and factoringsoon
13The coordinate plane and graphssoon
14What a function is — the input–output machinesoon

Part III · Logic, sets & counting

The discrete foundation under both probability and computer science — how to reason, and how to count.

15Statements and logic — and, or, not, if–thensoon
16Sets and their operationssoon
17Relations and functions, formallysoon
18Counting — permutations and combinationssoon
19Proof and mathematical inductionsoon
20Sequences, recursion, and recurrencessoon

Part IV · Functions, exponentials & logarithms

The specific functions ML runs on — exponentials, logs, sigmoids — and the rate-of-change idea that opens calculus.

21Linear functions and slope — the seed of "rate of change"soon
22Quadratic and polynomial functionssoon
23Exponential functions — growth and decaysoon
24Logarithms→
25The number e and the natural logarithmsoon
26The logistic (sigmoid) function — your first activationsoon
27Sinusoids and periodicity — for positional encodingssoon

Part V · Single-variable calculus

The mathematics of change. The derivative and the chain rule are the literal machinery of how networks learn.

28Limits, intuitivelysoon
29Continuitysoon
30The derivative as a rate of change→
31Rules of differentiationsoon
32The chain rule — the engine of backpropagationsoon
33Maxima, minima, and optimizationsoon
34The integral as accumulationsoon
35The fundamental theorem of calculussoon

Part VI · Linear algebra — the engine of ML

Vectors, matrices, and the operations on them. This is the single most-used branch of math in all of AI.

36Vectors — points, arrows, and lists of numberssoon
37Vector addition, scaling, and linear combinationssoon
38The dot product, norms, and distance — similaritysoon
39Matrices and matrix multiplicationsoon
40Matrices as linear transformationssoon
41Systems of linear equations and Gaussian eliminationsoon
42Linear independence, span, basis, and ranksoon
43The four fundamental subspacessoon
44Determinants and invertibilitysoon
45Eigenvalues and eigenvectorssoon
46Diagonalization and the spectral theoremsoon
47The singular value decomposition (SVD)soon
48Orthogonality, projections, and least squaressoon
49Tensors and broadcasting — the data of deep learningsoon

Part VII · Multivariable & matrix calculus

Calculus and linear algebra meet. Gradients, Jacobians, and the chain rule on a computation graph: this is backprop.

50Functions of several variablessoon
51Partial derivativessoon
52The gradient and directional derivatives — steepest ascentsoon
53The multivariable chain rulesoon
54The Jacobian and the Hessiansoon
55Matrix calculus — gradients of vector and matrix expressionssoon
56Backpropagation as the chain rule on a computation graphsoon

Part VIII · Optimization — how models learn

Turning learning into minimizing a loss, and the algorithms — gradient descent and its kin — that actually do it.

57The optimization problem — minimizing a losssoon
58Convexity and why it matterssoon
59Gradient descentsoon
60Stochastic and mini-batch gradient descentsoon
61Momentum and adaptive methods (Adam)soon
62Constrained optimization and Lagrange multiplierssoon
63Regularization — L1, L2, and the bias toward simplicitysoon

Part IX · Probability — reasoning under uncertainty

AI is probabilistic to the core. Distributions, expectation, and Bayes' theorem are the language of every model's predictions.

64Sample spaces, events, and probabilitysoon
65Conditional probability and independencesoon
66Bayes' theoremsoon
67Random variables — discrete and continuoussoon
68Distributions you'll meet — Bernoulli, binomial, Gaussiansoon
69Expectation, variance, and covariancesoon
70Joint, marginal, and conditional distributionssoon
71The law of large numbers and the central limit theoremsoon
72Sampling and Monte Carlo methodssoon

Part X · Statistics & data analysis

Learning from data: estimation, maximum likelihood, the bias–variance tradeoff, and the first models that come straight out of statistics.

73Descriptive statistics and data summariessoon
74Populations, samples, and sampling distributionssoon
75Estimation and confidence intervalssoon
76Maximum likelihood estimation — the objective behind most MLsoon
77Hypothesis testing, p-values, and A/B testingsoon
78The bias–variance tradeoffsoon
79Correlation, causation, and confounderssoon
80Linear regression — your first modelsoon
81Logistic regression and classificationsoon

Part XI · Information theory — the math of LLMs

Entropy, cross-entropy, and KL divergence: the quantities that define how language models are trained and scored.

82Information and surprisesoon
83Entropy — measuring uncertaintysoon
84Cross-entropy — the loss function of language modelssoon
85KL divergence — the distance between distributionssoon
86Mutual informationsoon
87Perplexity — how language models are scoredsoon

Part XII · Algorithms & computation

The math under computer science and data structures — how to measure cost and reason about programs. Builds on Part III; take it any time after.

88Asymptotics and Big-O notationsoon
89Analyzing recursion and recurrence relationssoon
90Combinatorics for algorithmssoon
91Graphs and treessoon
92Graph algorithms and traversalsoon
93Computational complexity — P, NP, and intractabilitysoon
94Modular arithmetic and hashingsoon

Part XIII · Numerical & computational methods

Where the math meets the machine — floating point, numerical stability, and how autodiff actually computes every gradient.

95Floating-point numbers and precisionsoon
96Numerical stability — log-sum-exp and the softmax tricksoon
97The cost of linear algebra — vectorization and complexitysoon
98Iterative and approximate methodssoon
99Automatic differentiation — how backprop is really computedsoon
100Putting it together — the math inside a training loopsoon

Learn · Shawon Chowdhury · back to all subjects