Limoxifen: Developing a new drug as a supplement for hormonal treatment of breast cancer Background According to the guidelines of treating hormone-sensitive breast cancer, hormonal therapy is prescribed for 5 years to women who have already received other treatments. This improves their prognosis. However, hormonal therapy can produce numerous side effects, such as reduced sexual desire, joint pa
Two-Way Contingency Tables
Joint, Marginal and Conditional Distributions
Consider X and Y two categorical response variables, with X having I levels and Y having Jlevels and suppose we classify each item in a population using both of these variables.
Responses (X, Y) corresponding to a randomly chosen item from this population have a jointprobability distribution. Let ij denote the probability that X assumes its ith level and Y Consider the following I x J contingency table: is the joint distribution of X and Y ; it defines the
(bivariate) relationship between the two variables.
The marginal distributions of X and Y are the row and column totals (respectively),
obtained by summing the joint probabilities. These are denoted by
The marginal distributions represent single-variable information; they do not refer to
association links between the two variables.
are unknown but they can be estimated by sampling.
Example: Consider a sample of 3566 individuals cross-classified by smoking status and sleep
problems. This yields a 2 x 2 contingency table.
Note: The overall sample size is fixed but row and column totals are not fixed. Thus this
study corresponds to a multinomial sample with 4 outcomes. The maximum likelihood
estimates (M.L.E.s) of
Sometimes one variable can be thought of as a response variable and the other as anexplanatory variable. (In this study, we might treat sleep problems as a response variable andsmoking status as an explanatory variable.) For such cases, it may be useful to construct aseparate probability distribution for Y at each level of X. Given that an item is classified inrow i of X, let denote the probability of classification in column j of Y. This yields the STAT5602
represent the conditional distribution of Y at the ith
level of X. The conditional distribution of Y given X is related to the joint distribution of Usually,these conditional probability distributions are also unknown but they can beestimated by sampling.
For our example, we estimate the conditional probability distribution for sleep problems at the ith level of service in Vietnam using STAT5602
When both variables are response variables, we can describe their association using:
- their joint distribution,
- the conditional distribution of Y given X
- the conditional distribution of X given Y.
The variables X and Y are statistically independent if
Thus, for X and Y independent, When Y is a response and X is an explanatory variable, the condition for all j is a more natural definition of independence.
Note: In some tables where Y is a response variable and X is an explanatory variable, X is
fixed rather than random. Then the notion of a joint distribution for X and Y is no longer
meaningful. However, for a fixed level of X, Y has a probability distribution. Thus we can
consider the conditional distribution of Y for different fixed levels of X.
Test for Homogeneity: Prospective Study
The Physicians’ Health Study was a 5 year study testing whether regular intake of aspirin
reduces mortality from cardiovascular disease. In this study, 22,071 physicians were
randomly assigned either to a group that was to take one aspirin tablet every other day or to a
group that was to take a placebo every other day. Of the 22,071 physicians, 11,034 were
assigned to receive the placebo and 11,037 were assigned to receive aspirin. The study was
blind - i.e. the physicians did not know which type of pill they were assigned to take. Of the
11,034 physicians taking the placebo, 189 suffered myocardial infarcation (MI) over the
course of the study (18 of i were fatal) while of the 11,037 taking aspirin, 104 suffered MI (5
of which were fatal). The results are summarized in the following 2 x 2 contingency table:
Source: Preliminary report: Findings from the aspirin component of the ongoing hysicians’ Health Study, N. Engl. J. Med. 318: Question:
Is the proporton of physicians taking a placebo who suffer MI the same as the proportion ofphysicians taking aspirin who suffer MI? This is an example of a prospective study. (Note: In a prospective study, the row totals are
(NOTE: This study is a clinical trial, since physicians are assigned to the placebo and aspirin
groups by the investigators. Another type of prospective study is a cohort study, where the
researchers do not assign individuals to groups. e.g. to study the effect of smoking on MI, a
researcher might select a sample of smokers independently of a sample of nonsmokers, but
the researcher does not assign individuals to the smoking and nonsmoking groups.)
probability of suffering MI, given that the physician takes the placebo probability of not suffering MI given that the physician takes the placebo probabiility of suffering MI given that the physician takes aspirin probability of not suffering MI given that the pysician takes aspirin j i . This allows us to determine estimated expected frequencies mij. Pearson’s Chi-square test statistic can then be used here.
so the kernel of the likihood is .
The log likelihood of the kernel is Thus, under H0,
Using this, the estimated frequencies are STAT5602
We obtain Pearson’s X2 . Recall that for large samples, X2 1. The p-value is approximately 0, so there is strong evidence against H0.
A likelihood ratio Chi-square test can also be used here.
First we maximize the likelihood under H0; then we maximize the likelihood under H0
as the ratio of these two maximized likelihoods.
For the test for homogeneity above, the kernel is Recall that, when H0 is assumed to be true, the kernel simply becomes and the log likelihood of this kernel is maximized at Consider now the kernel in the general context.H0 STAT5602
For our example above, G2 (called Wilks‘ statistic) is : nij log nij/mij and mij 1 (same as for Pearson’s Chi-square test).
with p value of approximately 0, again concluding that there is strong evidence against H0 Now we try to understand the nature of this difference in proportions of physicians taking
aspirin who suffer MI and those physicians taking a placebo who suffer MI. We do this by
examining confidence intervals, relative risk, and odds ratios.
Large Sample Confidence Interval for 1 1
We showed that the MLEs of 1 1 and 1 2 were where n1 and n2 are fixed. Also, n11 and n21 are independent binomial random
variables with means and variances
Consequently p1 1 and p1 2 are independent with means and variances STAT5602
For large samples, we can use the fact that p1 1 and p1 2 will be approximately normallydistributed. Thus a 100 1 For our example, to obtain a a 95% confidence interval for 1 1 and thus a 95% confidence interval for 1 1 Noting that the interval does not contain 0 , this indicates that aspirin appears to diminish therisk of MI.
A difference between two proportions may have greater importance when both proportionsare near 0 or 1 than when they are near the middle. So, instead of studying the effect ofaspirin on MI by considering the difference 1 1 we could look at the relative risk, which is the ratio of the ”success” probabilities for the 2 groups.In this case, ”success”represents having MI.
1 2 (i.e. the response is not affected by the group) so to estimate the population relative risk. For our data, we 1.82. This implies that the sample proportion of MI cases was 82% higher for the group taking the placebo.
Note that a relative risof 1.0 corresponds to independence.
Obtaining a 100 1
% confidence interval for the (population) p
based on 1 1 :
The problem here is that the distribution of 1 1 is highly skewed unless our sample sizes are extremely large. So instead, we obtain a confidence interval for log To derive the confidence interval, we use the delta method.
The delta method for a function of a random variable:
Let Tn be a statistic, depending on a sample of size n. For large samples, suppose Tn isapproximately normally distributed with mean 0 and variance Using a Taylor series expansion of g Tn around , we can write converges in probability to 0 as n Now we want a confidence interval for log We start with the point estimator of log STAT5602
For our example above, the 95%C.I. for log Now taking antilogs,a 95%C.I. for the relative risk This means that we are 95% confident in stating that, after 5 years, the proportion of MIcases for physicians taking a placebo every second day is between 1.43 and 2.31 times theproportion of MI cases for physicians taking a single aspirin every second day.
Note: Sometimes we might want to estimate the ratio of the ”failure” probabilities than the ratio of ”success” probabilities STAT5602
Another measure of association in contingency tables is the odds ratio
Consider again the physician example above. Within row 1, the odds that the response is incolumn 1 instead of column 2 is Similarly within row 2, the corresponding odds ratio is 1 then response 1 is more likely than response 2 in row i.
Within-row conditional distributions are identical iff 2 is called the odds ( or cross product) ratio
1 response is not affected by group.
We estimate the population odds ratio meaning the odds of MI are 83% higher for physicians in the placebo group.
% C.I. for the population odds ratio again, since the sampling distribution of is highly skewed except for extremely large sample sizes, we first obtain a confidence interval for log STAT5602
This means we are 95% confident that, after 5 years, the odds of MI for physicians taking aplacebo every second day is between 1.44 and 2.33 times the odds of MI for physicianstaking aspirin.
Relationship between Odds Ratio and Relative Risk
So when the probabilities of ”success” for both groups ( i.e. 1 1 and 1 2 ) are close to zero,the odds ratio and the relative risk are similar. (This happens for our physician example and,in general, for a rare condition.) STAT5602
SAS program for the physician example.
If the data is internal to the program:data aspirin;input Group $ MI $ count;cards;Placebo YES 189Placebo No 10845Aspirin Yes 104Aspirin No 10933;proc freq order tables GROUP*MI/ chisq expected cellchi2 nocol nopct measures;weight count;run; If the data is external to the program:data aspirin;infile ’k:/STAT5602/aspirin.txt’;input Group $ MI $ count;proc freq order tables GROUP*MI / chisq expected cellchi2 nocol nopct measures;weight count;
Journal of Jesuit Interdisciplinary Studies Modernity through the Prism of Jesuit HistoryProfessor Paul Grendler wrote recently that “When I look at all the new articles and books that the Jesuitica Project lists every week, I suspect that there is enough scholarship and interest in the history of the Society of Jesus and individual Jesuits to fill a new journal. I am particularly impressed w