Aneesh Pappu

AI Research & Policy

Advice

After advising several undergraduate and graduate students on pursuing work in machine learning, I thought it would be useful to share my thoughts publicly. Of course, your mileage may vary, and I am not an expert by any means, so take these notes with a grain of salt :)

Academically, I have found the most important foundations to be linear algebra, probability, and optimization.

Linear algebra is not only the primary language of data analysis, neural networks, etc., but also helps build intuitions for understanding how objects can be decomposed into combinations of simpler parts (e.g. basis decompositions, eigenvalue decompositions, SVD, etc.). There are many times in AI research where the idea of decomposing a phenomenon into a combination of simpler, atomic parts is useful. Linear algebra is often the language for doing this.

Probability is foundational because all modern machine learning relies on the idea of using models to approximate probability distributions over data of interest (e.g. internet text and human preferences via a language model, images via diffusion model, etc.).

Optimization is not only the methodology for training all modern ML systems, but more broadly is the language for modeling and solving many problems across the sciences and engineering, independent of machine learning. For machine learning research, all modern neural nets induce non-convex loss landscapes (in their original formulation, caveating for Mert Pilanci's work on convex reformulations of neural networks). However, many present-day optimizers can be interpreted as approximating convex optimization methods for which we have well-established theory (e.g., Adam and Adagrad can be interpreted as approximating second-order methods like Newton's method that use curvature information to speed up convergence).

Finally, as ML models grow larger, parallel computation is the necessary backbone for making training and inference feasible. Familiarity with the basics of GPUs, tensor sharding, multithreading, etc., can go a long way.

To these ends, the most important classes I've taken at Stanford have been:

  • EE263
  • EE278
  • EE364A
  • CS149
  • EE276
  • CS229
  • CS238
  • CS228
  • CS224N
  • CS231N