## Mathematical Methods in Data Science (with Python)

### Description

This textbook on the mathematics of data has two intended audiences:

• For students majoring in math (or other quantitative fields like physics, economics, engineering, etc.): it is meant as an invitation to data science and AI from a rigorous mathematical perspective.
• For (mathematically-inclined) students in data science related fields (at the undergraduate or graduate level): it can serve as a mathematical companion to machine learning, AI, and statistics courses.
Content-wise it is a second course in linear algebra, multivariable calculus, and probability theory motivated by and illustrated on data science applications. As such, the reader is expected to be familiar with the basics of those areas, as well as to have been exposed to proofs -- but no knowledge of data science is assumed. Moreover, while the emphasis is on mathematical concepts, programming is used throughout. Basic familiarity with Python will suffice. The book provides an introduction to some specialized packages, especially Numpy, NetworkX, and PyTorch.

### Archive

It is based on Jupyter notebooks that were developed for MATH 535: MATHEMATICAL METHODS IN DATA SCIENCE, a one-semester advanced undergraduate and Master's level course offered at UW-Madison. Websites from previous semesters are below. Warning: They are no longer maintained and may have broken links.

### Online book and Jupyter notebooks

Links to specific chapters are below, together some additional materials (assignments, Jupyter notebooks, datasets, auto-quizzes, etc.). Most of these resources are also available on the GitHub page of the book.

Exercises: Assignments and practice exams for Spring 2024 follow.

Python package: To run some of the code below, you will need mmids.py.

Chap 1: Introduction

Chap 2: Least squares: geometric, algebraic, and numerical aspects

Chap 3: Singular value decomposition

Chap 4: Spectral graph theory

Chap 5: Random walks on graphs and Markov chains

Chap 6: Optimization theory and algorithms

Chap 7: Probabilistic models: from simple to complex

### Programming languages

• Python: I recommend using Google Colaboratory to run the notebooks. Some resources for learning Python:
• Julia, R, etc.: If you would like to use a different programming language, try converting the code in the notebooks with your favorite AI chatbot.

The material on this website was partly influenced by the following excellent textbooks.

• [Ara] C. Arangala, Linear Algebra With Machine Learning and Data, Chapman & Hall, 2023
• [Axl] S. Axler, Linear Algebra Done Right, Springer, 2015
• [BHK] A. Blum, J. Hopcroft, R. Kannan, Foundations of Data Science, Cambridge University Press, 2020
• [Bis] C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006 (Chaps 2, 8, 9, 13)
• [Data8] A. Adhikari, J. DeNero, D. Wagner, Computational and Inferential Thinking: The Foundations of Data Science
• [DS100] S. Lau, J. Gonzalez, D. Nolan, Learning Data Science, O'Reilly, 2023
• [ISLP] G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning: with Applications in Python, Springer, 2023
• [MSMB] S. Holmes, W. Huber, Modern Statistics for Modern Biology, Cambridge University Press, 2019
• [Nic] B. Nica, A Brief Introduction to Spectral Graph Theory, EMS Textbooks in Mathematics, 2018
• [Sol] J. Solomon, Numerical algorithms, CRC Press, 2015 (Chaps 4-7)
• [Str] G. Strang, Linear Algebra and Learning from Data, Wellesley-Cambridge Press, 2019
• [TB] L. N. Trefethen, D. Bau, III, Numerical Linear Algebra, SIAM, 1997
• [VMLS] S. Boyd and L. Vandenberghe. Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares. Cambridge University Press, 2018
• [Wri] S. Wright, Optimization Algorithms for Data Analysis, in: The Mathematics of Data, AMS, 2018 (Sections 2-4)
• [WR] S. Wright, B. Recht, Optimization for Data Analysis, Cambridge University Press, 2022

Last updated: apr 22, 2024