Problems with pearson and deviance residuals

Cindy Feng; Longhai Li; Alireza Sadeghpour

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Problems with pearson and deviance residuals

CF Cindy Feng

LL Longhai Li

AS Alireza Sadeghpour

This method is extracted from research article: BMC Med Res Methodol, Jan 2020

A comparison of residual diagnosis tools for diagnosing regression models for count data

DOI: 10.1186/s12874-020-01055-2

Request a Protocol

Ask a question

Favorite

For a normal linear regression model, the Pearson and deviance residuals are identical and have an approximate normal distribution under the true model. However, their distributions are often skewed and non-normally distributed for counts regression models [8, 20]. It is argued that the deviance residuals typically follow more closely a normal distribution than the Pearson residuals; nevertheless, as μ_i/ϕ→∞, both Pearson and deviance residuals from an exponential family model approach to the normal distribution due to the distribution for the response variable converging to normality. However, the asymptotic normal distribution only holds when the mean of the response variable is relatively large. Further, the residual plots often exhibit parallel curves according to distinct response values, imposing great challenges for visual inspection. Hence, Pearson and deviance residuals are difficult to use for graphically assessing the GOF of count regression models.

Further, the overall GOF of a regression model is often assessed based on the sum squares of the Pearson and deviance residuals, i.e., $X^{2} = \sum_{i = 1}^{n} r_{i}^{P^{2}}$ and $D^{2} = \sum_{i = 1}^{n} r_{i}^{D^{2}}$ , respectively. Asymptotically, under a correctly specified normal regression model, we can expect X² and D² to have a chi-square distribution $χ_{n - p}^{2}$ , where n is the sample size, and p is the number of parameters. In practice, we often fail to achieve large samples, which renders the null distribution of this statistic invalid. It is also recognized that this approximation for diagnosing count regression models can be very poor even for large sample sizes [9, 21].

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol