Marginalizing, or ‘what are all those intergrals doing?’

Purpose of these notes:

Our lecture definition of marginal independence and conditional independence was example-driven.
Our book’s definition of these concepts started from first principles, and uses all of the technicalities needed to make the definition precise.

Let us reconcile what is going on in the background.

Marginals

Let $X_1,\dots,X_m$ be a set of random variables. What is the notation for the joint distribution of them?

Discrete case: $P_{i_1,\dots,i_m} = P(X_1=i_1,\dots,X_m=i_m)$. This is the probability of the event $X_1=i_1$ and $X_2=i_2$ and … and $X_m=i_m$.
Continuous case: $f_{X_1,\dots,X_m}$ is the joint density function for the random vector $X_1,\dots,X_m$.

Now, what does the following mean (from Definition 4.1.1. from the texbook):

Let $A \subseteq [m]$. The marginal density $f_A(x_A)$ of $X_A$ is obtained by integrating out $x_{[m]\backslash A}$ \[ f_A(x_A) := \int_{x_{[m]\backslash A}} f(x_a, x_{[m]\backslash A}) d x_{[m]\backslash A} \] for all $x_A$.

First of all, $A\subseteq [m]$ means $A$ is denoting the sub-collection of the random variables $X_A$ indexed by the set $A$.

For example if $[m]=[3]=\{1,2,3\}$, maybe $A=\{1,2\}$ so that is referring to $X_1$ and $X_2$. In this case, the notation ${x{[m]A}} $ simply means $\int_{x_3}$, that is, integrate out all the values $X_3$ can take.

Next, the ‘marginal density $f_A(x_A)$ of $X_A$’ is referring to the joint density of the variables in the set $A$, but not taking into account the variables outside of $A$.

In the example above, if $X_1,\dots,X_m$ are discrete random variables, then $f_A(x_A)$ denotes the probabilities $p_{x_1,x_2}$ or $p_{ij}:=P(X_1=i,X_2=j)$. What happened to $X_3$? It is still there, but given that the data will consist of joint observations of $X_1,X_2,X_3$, the way to get to only the first two is to ‘marginalize’ over or ‘integrate out’ the $X_3$. (Remember, summation is just a finite/discrete version of the integral.) So… since $X_3$ exists in this example, we do not write $p_{ij}$; rather, we write $p_{ij+}$ to indicate that this is the marginal probability (density) for the first two variables and the third one is integrated out.

As you have seen in worksheet 1, part (ii) of the proof of the CI (CI=conditional independence) axioms, marginalizing over a set of variables has the effect of making them ‘disappear’ from conditional independence statements – they are still there, just summed over.

Further examples

Please check out this lovely lesson “5.1 - Notation and Structure” from a Penn State course on discrete data analysis.

The link contains examples of the following relevant concepts: marginal data, joint distributions, conditional tables. These notions and examples are very much related to the soccer/hair length/gender example from our lectures in Week 1.

# prerequisites to run this code chunk 
library(survival)
library(vcd)

Loading required package: grid

#### Berkeley Admissions Example: a 2x2x6 table
#### let X=sex, Y=admission status, Z=department

# data available as an array in R
admit <- UCBAdmissions
dimnames(admit) <- list(
  Admit=c("Admitted","Rejected"),
  Sex=c("Male","Female"),
  Dept=c("A","B","C","D","E","F"))
admit

, , Dept = A

          Sex
Admit      Male Female
  Admitted  512     89
  Rejected  313     19

, , Dept = B

          Sex
Admit      Male Female
  Admitted  353     17
  Rejected  207      8

, , Dept = C

          Sex
Admit      Male Female
  Admitted  120    202
  Rejected  205    391

, , Dept = D

          Sex
Admit      Male Female
  Admitted  138    131
  Rejected  279    244

, , Dept = E

          Sex
Admit      Male Female
  Admitted   53     94
  Rejected  138    299

, , Dept = F

          Sex
Admit      Male Female
  Admitted   22     24
  Rejected  351    317

# create a flat contingency table
ftable(admit, row.vars=c("Dept","Sex"), col.vars="Admit")

            Admit Admitted Rejected
Dept Sex                           
A    Male              512      313
     Female             89       19
B    Male              353      207
     Female             17        8
C    Male              120      205
     Female            202      391
D    Male              138      279
     Female            131      244
E    Male               53      138
     Female             94      299
F    Male               22      351
     Female             24      317

Question: Which specific marginal tables does dach of the following tables represent?

### define marginal tables 
XY <- margin.table(admit, c(2,1))
ZY <- margin.table(admit, c(3,1))
XZ <- margin.table(admit, c(2,3))
### print marginal tables 
XY

        Admit
Sex      Admitted Rejected
  Male       1198     1493
  Female      557     1278

ZY

    Admit
Dept Admitted Rejected
   A      601      332
   B      370      215
   C      322      596
   D      269      523
   E      147      437
   F       46      668

XZ

        Dept
Sex        A   B   C   D   E   F
  Male   825 560 325 417 191 373
  Female 108  25 593 375 393 341

Question: If you were to model this last table, XZ, what would be the corresponding marginal probability distribution?

Exercise: check understanding

For the Berkeley admissions example, what is the observed conditional distribution of sex and admission status, given Department B?

Note: “observed” distribution in this case is referring to the observed frequences. Like the sampling distribution from Lecture 3. For example, if $X$ is a binary random variable and I observe it in a state 0 five times and in a state 1 two times, then the observed distribution of $X$ is $(5/7,2/7)$ because state 0 appears with frequency $5/7$, etc.

Appendix

So what about independence in that Berkeley admissions table? Here are some code nuggets you may find useful that we will come back to later in the course.

## the chi-squared test
chisq.test(XY, correct=FALSE)


    Pearson's Chi-squared test

data:  XY
X-squared = 92.205, df = 1, p-value < 2.2e-16

## Cochran-Mantel-Haenszel test of conditional independence
 mantelhaen.test(admit)


    Mantel-Haenszel chi-squared test with continuity correction

data:  admit
Mantel-Haenszel X-squared = 1.4269, df = 1, p-value = 0.2323
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
 0.7719074 1.0603298
sample estimates:
common odds ratio 
        0.9046968

 mantelhaen.test(admit, correct=FALSE)


    Mantel-Haenszel chi-squared test without continuity correction

data:  admit
Mantel-Haenszel X-squared = 1.5246, df = 1, p-value = 0.2169
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
 0.7719074 1.0603298
sample estimates:
common odds ratio 
        0.9046968

License

This document was created for Math 561, Spring 2023, at Illinois Tech. This set of notes is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Sonja Petrović, Associate Professor of Applied Mathematics, College of Computing, Illinios Tech. Homepage, Email.↩︎