Bioinformatics analysis

NM Nicola S. Meagher
KG Kylie L. Gorringe
MW Matthew Wakefield
AB Adelyn Bolithon
CP Chi Nam Ignatius Pang
DC Derek S. Chiu
MA Michael S. Anglesio
KM Kylie-Ann Mallitt
JD Jennifer A. Doherty
HH Holly R. Harris
JS Joellen M. Schildkraut
AB Andrew Berchuck
KC Kara L. Cushing-Haugen
KC Ksenia Chezar
AC Angela Chou
AT Adeline Tan
JA Jennifer Alsop
EB Ellen Barlow
MB Matthias W. Beckmann
JB Jessica Boros
DB David D.L. Bowtell
AB Alison H. Brand
JB James D. Brenton
IC Ian Campbell
DC Dane Cheasley
JC Joshua Cohen
CC Cezary Cybulski
EE Esther Elishaev
RE Ramona Erber
RF Rhonda Farrell
AF Anna Fischer
ZF Zhuxuan Fu
BG Blake Gilks
AG Anthony J. Gill
CG Charlie Gourley
MG Marcel Grube
PH Paul R. Harnett
AH Arndt Hartmann
AH Anusha Hettiaratchi
CH Claus K. Høgdall
TH Tomasz Huzarski
AJ Anna Jakubowska
MJ Mercedes Jimenez-Linan
CK Catherine J. Kennedy
BK Byoung-Gie Kim
JK Jae-Weon Kim
JK Jae-Hoon Kim
KK Kayla Klett
JK Jennifer M. Koziak
TL Tiffany Lai
request Request a Protocol
ask Ask a question
Favorite

We used unsupervised hierarchical clustering and clustered samples based on gene-expression profiles. We used the “complete” agglomeration method and measured the Euclidean distance between samples. The heat maps were drawn using the iheatmapr package (v0.5.1) in R (31). Diagnosis groups in the clustering were MBOT, low stage (I/II) MOC, advanced stage (III/IV) MOC, pancreas, gastric, and lower GI (colorectal and appendiceal combined). We used random forest analysis and stratified bootstrapping (32) to assess the ability of the gene-expression profiles to predict the disease class (diagnosis group) of each sample. The cohort was divided into independent training and testing sets using stratified random subsampling, maintaining a balanced proportion of samples of each disease class. The training data set was used to train a random forest classifier (the randomForest package in R, version 4.6-14) using default parameters and the classifier was benchmarked against the test set to obtain an error rate (Supplementary Methods). We repeated the above analyses 100 times to obtain a distribution of error rates, the mean overall error rate, and the mean and standard deviation of each element of the confusion matrix, to tabulate the number of samples associated with the actual and predicted class.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A