Data analyzed in this study were extracted from the Information for Management, Planning, Analysis, and Coordination (IMPAC II) database, which is used by the NIH staff to track and manage research grants and contracts. These data are publicly available through the NIH Commons (, except for personal identifying information, including race, ethnicity, and sex of applicants, per NIH policy. The data used in this study include application texts, demographics of the principal investigator (PI), impact and percentile scores, whether an application was discussed by a study section, and the ultimate funding decision. We identified whether applicants were new investigators using the new investigator flag in IMPAC II and considered all other applicants as established investigators. With the exception of the resubmission data in Fig. 2 and the multivariate regression analysis, which were limited to Type 1 (new) R01s, all other analyses considered both Type 1 and Type 2 (renewal) R01s submitted between FY 2011 and 2015. For descriptive analyses, race of the contact PI was used to group applications. For multivariate regression, multi-PI applications were excluded. To visually represent the differing proportions of applications submitted, discussed, and funded for AA/B and WH applicants, we produced a flow diagram that we termed a “rocket chart” because of its shape (see more details in the “Rocket charts” section below).

To analyze the effect of the initial impact score on the funding gap, we extracted the overall impact score data from IMPAC II. Impact and percentile scores were only assigned to discussed applications and were available for 100 and 91%, respectively, of discussed R01 applications.

