Scientific Program > Topic 3 > Session 3N >
 Presentation 3N1. Teaching Categorical Data Analysis

 Presenter Alan Agresti (USA) aa@stat.ufl.edu

 Presentation Abstract At the University of Florida, I have developed two courses in categorical data analysis. One is designed for masters students in statistics and the other is designed both for undergraduate statistics majors as well as graduate students in other disciplines who have had some exposure to basic statistics including regression. Regardless of the level, I unify methods taught in the course by showing how they occur as special cases of generalized linear models for categorical responses. For instance, each inferential method results from a choice of distribution for the response (binomial, Poisson, ...), link function for the mean of the response (logit, log, ...), and inferential use of likelihood function (Wald, score, likelihood-ratio). As much as possible, I use the same generalized linear modeling software throughout the course (e.g., PROC GENMOD in SAS or the glm function in S-plus). Over time I have placed more emphasis on logistic regression and less on loglinear models. This reflects most applications having a single response variable and possibly quantitative as well as qualitative predictors. Thus, the course is not simply one in "contingency table analysis." As part of the course, I always require students to obtain a data set (e.g., General Social Survey results off the WWW) and write a report showing a data analysis. Even when students do well on exams, it is humbling to see the rather naive errors students make in the modeling process. It seems worth putting less emphasis on exams and having students do at least two projects, even if the second only entails improving analyses in earlier projects based on feedback from the instructor. Various challenges arise in teaching such a course. For one, there is an increasing variety of methods for analyzing even the most basic of categorical data (e.g., single proportions, 2-by-2 tables), and the simplest approaches to teach sometimes have quite poor operational performance (e.g., Wald confidence interval for a proportion or difference of proportions). Second, it is difficult to provide general guidelines about when one can use large-sample inference, and yet teaching small-sample methods requires careful consideration of complicating effects of possibly substantial discreteness. Third, in practice many problems have clustered data. Methods for clustered data such as generalized estimating equations and random effects models have been developed relatively recently, and require sufficient sophistication that they are not easy to incorporate in a first course on this topic.