Marginalizing, or ‘what are all those intergrals doing?’
Purpose of these notes:
- Our lecture definition of marginal independence and conditional independence was example-driven.
- Our book’s definition of these concepts started from first principles, and uses all of the technicalities needed to make the definition precise.
Let us reconcile what is going on in the background.
Marginals
Let \(X_1,\dots,X_m\) be a set of random variables. What is the notation for the joint distribution of them?
- Discrete case: \(P_{i_1,\dots,i_m} = P(X_1=i_1,\dots,X_m=i_m)\). This is the probability of the event \(X_1=i_1\) and \(X_2=i_2\) and … and \(X_m=i_m\).
- Continuous case: \(f_{X_1,\dots,X_m}\) is the joint density function for the random vector \(X_1,\dots,X_m\).
Now, what does the following mean (from Definition 4.1.1. from the texbook):
Let \(A \subseteq [m]\). The marginal density \(f_A(x_A)\) of \(X_A\) is obtained by integrating out \(x_{[m]\backslash A}\) \[ f_A(x_A) := \int_{x_{[m]\backslash A}} f(x_a, x_{[m]\backslash A}) d x_{[m]\backslash A} \] for all \(x_A\).
First of all, \(A\subseteq [m]\) means \(A\) is denoting the sub-collection of the random variables \(X_A\) indexed by the set \(A\).
- For example if \([m]=[3]=\{1,2,3\}\), maybe \(A=\{1,2\}\) so that is referring to \(X_1\) and \(X_2\). In this case, the notation ${x{[m]A}} $ simply means \(\int_{x_3}\), that is, integrate out all the values \(X_3\) can take.
Next, the ‘marginal density \(f_A(x_A)\) of \(X_A\)’ is referring to the joint density of the variables in the set \(A\), but not taking into account the variables outside of \(A\).
- In the example above, if \(X_1,\dots,X_m\) are discrete random variables, then \(f_A(x_A)\) denotes the probabilities \(p_{x_1,x_2}\) or \(p_{ij}:=P(X_1=i,X_2=j)\). What happened to \(X_3\)? It is still there, but given that the data will consist of joint observations of \(X_1,X_2,X_3\), the way to get to only the first two is to ‘marginalize’ over or ‘integrate out’ the \(X_3\). (Remember, summation is just a finite/discrete version of the integral.) So… since \(X_3\) exists in this example, we do not write \(p_{ij}\); rather, we write \(p_{ij+}\) to indicate that this is the marginal probability (density) for the first two variables and the third one is integrated out.
As you have seen in worksheet 1, part (ii) of the proof of the CI (CI=conditional independence) axioms, marginalizing over a set of variables has the effect of making them ‘disappear’ from conditional independence statements – they are still there, just summed over.
Further examples
Please check out this lovely lesson “5.1 - Notation and Structure” from a Penn State course on discrete data analysis.
The link contains examples of the following relevant concepts: marginal data, joint distributions, conditional tables. These notions and examples are very much related to the soccer/hair length/gender example from our lectures in Week 1.
# prerequisites to run this code chunk
library(survival)
library(vcd)
Loading required package: grid
#### Berkeley Admissions Example: a 2x2x6 table
#### let X=sex, Y=admission status, Z=department
# data available as an array in R
<- UCBAdmissions
admit dimnames(admit) <- list(
Admit=c("Admitted","Rejected"),
Sex=c("Male","Female"),
Dept=c("A","B","C","D","E","F"))
admit
, , Dept = A
Sex
Admit Male Female
Admitted 512 89
Rejected 313 19
, , Dept = B
Sex
Admit Male Female
Admitted 353 17
Rejected 207 8
, , Dept = C
Sex
Admit Male Female
Admitted 120 202
Rejected 205 391
, , Dept = D
Sex
Admit Male Female
Admitted 138 131
Rejected 279 244
, , Dept = E
Sex
Admit Male Female
Admitted 53 94
Rejected 138 299
, , Dept = F
Sex
Admit Male Female
Admitted 22 24
Rejected 351 317
# create a flat contingency table
ftable(admit, row.vars=c("Dept","Sex"), col.vars="Admit")
Admit Admitted Rejected
Dept Sex
A Male 512 313
Female 89 19
B Male 353 207
Female 17 8
C Male 120 205
Female 202 391
D Male 138 279
Female 131 244
E Male 53 138
Female 94 299
F Male 22 351
Female 24 317
Question: Which specific marginal tables does dach of the following tables represent?
### define marginal tables
<- margin.table(admit, c(2,1))
XY <- margin.table(admit, c(3,1))
ZY <- margin.table(admit, c(2,3))
XZ ### print marginal tables
XY
Admit
Sex Admitted Rejected
Male 1198 1493
Female 557 1278
ZY
Admit
Dept Admitted Rejected
A 601 332
B 370 215
C 322 596
D 269 523
E 147 437
F 46 668
XZ
Dept
Sex A B C D E F
Male 825 560 325 417 191 373
Female 108 25 593 375 393 341
Question: If you were to model this last table,
XZ
, what would be the corresponding marginal
probability distribution?
Exercise: check understanding
For the Berkeley admissions example, what is the observed conditional distribution of sex and admission status, given Department B?
Note: “observed” distribution in this case is referring to the observed frequences. Like the sampling distribution from Lecture 3. For example, if \(X\) is a binary random variable and I observe it in a state 0 five times and in a state 1 two times, then the observed distribution of \(X\) is \((5/7,2/7)\) because state 0 appears with frequency \(5/7\), etc.
Appendix
So what about independence in that Berkeley admissions table? Here are some code nuggets you may find useful that we will come back to later in the course.
## the chi-squared test
chisq.test(XY, correct=FALSE)
Pearson's Chi-squared test
data: XY
X-squared = 92.205, df = 1, p-value < 2.2e-16
## Cochran-Mantel-Haenszel test of conditional independence
mantelhaen.test(admit)
Mantel-Haenszel chi-squared test with continuity correction
data: admit
Mantel-Haenszel X-squared = 1.4269, df = 1, p-value = 0.2323
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
0.7719074 1.0603298
sample estimates:
common odds ratio
0.9046968
mantelhaen.test(admit, correct=FALSE)
Mantel-Haenszel chi-squared test without continuity correction
data: admit
Mantel-Haenszel X-squared = 1.5246, df = 1, p-value = 0.2169
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
0.7719074 1.0603298
sample estimates:
common odds ratio
0.9046968
License
This document was created for Math 561, Spring 2023, at Illinois Tech. This set of notes is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.