Fine-tuning of Geneformer was accomplished by initializing the model with the pretrained Geneformer weights and adding a final task-specific transformer layer. The fine-tuning objective was either gene classification or cell classification as indicated in Supplementary Table 2. The trainer from the Huggingface Transformers library159 was used for pretraining with the substitution of a custom tokenizer as described above and a custom data collator for dynamically labeling gene or cell classes as indicated in Supplementary Table 2. To demonstrate the efficacy of the pretrained Geneformer in boosting predictive potential of downstream fine-tuning applications, we intentionally used the same fine-tuning hyperparameters for all applications. It should be noted that hyperparameter tuning for deep learning applications generally significantly enhances learning and so it is likely that the maximum predictive potential of Geneformer in these downstream applications is significantly underestimated. Hyperparameters utilized for fine-tuning were as follows: max learning rate: 5e-5; learning scheduler: linear with warmup; optimizer: Adam with weight decay fix160; warmup steps: 500; weight decay: 0.001; batch size: 12. All fine-tuning in Supplementary Table 2 was performed with a single training epoch to avoid overfitting.
The number of layers frozen from fine-tuning are indicated in Supplementary Table 2. Generally, in our experience, applications that are more relevant to the pretraining objective benefit from more layers being frozen to prevent overfitting to the limited task-specific data, whereas applications that are more distant from the pretraining objective benefit from fine-tuning of more layers to optimize performance on the new task. Fine-tuning results for gene classification applications were reported as AUCs +/− standard deviation and F1 score calculated based on a 5-fold cross-validation strategy where training was performed on 80% of the gene training labels and performance was tested on the 20% held-out gene training labels, repeating for 5 folds. Of note, because the fine-tuning applications are trained on classification objectives that are completely separate from the masked learning objective, whether or not task-specific data was included in the pretraining corpus is not relevant to the classification predictions, as demonstrated in Extended Data Fig. 1f.
We then fully fine-tuned the dosage sensitivity and bivalency classification models using all gene training labels in order to test their ability to generalize to out-of-sample data. We tested whether, without any further training, the model fine-tuned to distinguish dosage sensitive versus insensitive genes could predict dosage sensitivity of a recently reported set of disease genes from Collins et al., which analyzed CNVs from 753,994 individuals to define genes whose deletion was associated with primarily neurodevelopmental disease with either high (>0.85 score) or moderate (0.15–0.85 score) confidence22. Predicted dosage sensitivity of these gene sets was tested in the context of 10,000 randomly sampled cells from Genecorpus-30M, neurons across any adult or developmental timepoint defined as TUBB3-marked cells from Genecorpus-30M, or fetal cerebral cells from the Fetal Cell Atlas23. We also tested whether, without any further training, the model fine-tuned to distinguish bivalent versus single Lys4-marked genes by training on the 56 highly-conserved loci would generalize to the genome-wide setting30.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.