Algebraic & Geometric Methods in Statistics

Outline and some illustrative examples in nonlinear statistics

Goals

After the course, you can:

  • list topics in algebraic statistics
  • recognize problems in statistics that are answerable by algebraic methods
  • assess which algebraic methods are suitable for solving a problem
  • apply basic algebraic tools to solve a problem

Tentative course outline:

  1. What is algebraic statistics? An invitation / introduction / overview

  2. Exponential families 1.1. Statistical foundations 1.2. Underlying algebra

  3. Conditional independence and graphical models 2.1. Statistical foundations 2.2. Underlying algebra

  4. Goodness-of-fit testing of models for discrete data 3.1. Overview 3.2. Chromosome clusters in cancer cells 3.3. Network data 3.4. Challenges of large, sparse data sets

  5. Parameter identifiability 4.1. Overview 4.2. Graphical models 4.3. Phylogenetics and evolutionary biology 4.4. Model selection: learning a causal graph

  6. Maximum likelihood estimation 5.1. Introduction 5.2. Deciding existence of ML estimators 5.3. Algorithms for MLE: convex and non-convex optimization

Materials

Books and resources

Main textbook: Seth Sullivant “Algebraic Statistics”. It is avaiable in the bookstore. (I will check with the library for an e-copy.)

General course syllabus is here

Homework and grade

Approximately 6-7 assignments, expect a usual weekly workload.

Project

Reading a paper, working on a small research project, or applying algebraic methods on a data set, and writing a report on it. Timeline will be determined soon; project will take place during second half of semester. Groups up to 2 students.

Participate (team) in the Eric and Wendy Schmidt Center’s cancer immunotherapy data science challenge: https://go.topcoder.com/schmidtcentercancerchallenge/


Communication

We will not use Blackboard. Email is not efficient. So.. ??

  • Campuswire
  • GitHub
  • Your input please as I decide. Decision will be made THIS WEEK.

Saving this information:

Course homepage will be created here: sonjapetrovicstats.com/teaching, and the syllabus will be posted there.

Student input time!

AhaSlides.com/NLASTAT1

Motivating example 1: Discrete Markov chain

Section 1.1. of the textbook.

Lecture on board.

Motivating example 2: Graphical models

What is algebraic statistics?


Probability / statistics

  • Probability distribution
  • Statistical model
  • (Discrete) exponential family
  • Conditional inference
  • Maximum likelihood estimation
  • model selection
  • Multivariate Gaussian model
  • Phylogenetic model
  • MAP estimates

Algebra/geometry

  • Point
  • (Semi)algebraic set
  • Toric variety / ideal
  • Lattice points in polytopes
  • Polynomial optimization
  • Geometry of singularities
  • Spectrahedral geometry
  • Tensor networks
  • Tropical geometry

Lecture plan

We will continue now with the following topics:

  • Probability Primer (Chapter 2) and
  • Conditional Independence (Chapter 4)

Appendix

Following is a 3-slide “intro” to algebraic geometry; these were slides by S. Sullivant given at a colloquium a long long time ago. They are meant to just give you a glimpse into the vocabulary… not to digest this immediately.

Introduction to algebraic geometry



Example: Hardy-Weinberg Equilibrium

License

This document is created for Math/Stat 561, Spring 2023, at Illinois Tech.

While the course materials are generally not to be distributed outside the course without permission of the instructor, all materials posted on this page are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.