request Request a Protocol
ask Ask a question
Favorite

Before settling down with the optimization problem (Eqs. 1 and 2) in step I, we have tried several other possible ways to formulate our clustering problem into an optimization problem. We have proposed the solution to each of them, written code to implement them, and run them on simulated data. They all have much inferior performance compared to the optimization problem we finally settled down with. Here we briefly describe these alternative forms of the optimization problem and give an explanation why they do not perform as well. The algorithms that solve each of these problems are given in Additional file 1. We think this content may be of interest to researchers who would like to further improve over scSorter, and it may also help to understand the reason why scSorter works well.

The first alternative is: find C={Ck}k=1,...,K and μ={μik}i=1,...,g+h,k=1,...,K that

The second alternative is: find C={Ck}k=1,...,K and μ={μik}i=1,...,g+h,k=1,...,K that

The third alternative is: find C={Ck}k=1,...,K,μ={μi}i=1,…,g∪{μik}i=g+1,...,g+h,k=1,...,K, and δ={δik}i=1,...,g;k=1,...,K that

In the first alternative formation, note that when γik=0 (gene i is not a marker gene of cell type k), the constraints (Eq. 5) are automatically satisfied. Thus, the constraints only take effect on marker genes, which have to satisfy μik1Nj=1Nxij, i.e., the representative expression of marker gene should be no less than the overall average.

In the first two alternative formations, a marker gene (i.e., i=1,…,g) has a different expression μik in every cell type k. The constraints of the first formation (Eq. 5) are relatively weak: for a marker gene, although there is a constraint that its expression in cells of its corresponding cell type cannot be lower than its mean expression in all cells, there is no constraint on its expression in cells not of its corresponding cell type. As a result, the threshold for putting cells into a certain cell type is not high enough, and a large number of cells from other cell types, in which the marker genes of this cell type are also relatively highly expressed, may be incorrectly assigned to this type. The constraints in the second formation (Eq. 7) are much stronger. However, it is hard for every marker gene to satisfy these constraints, as its expression in cells of its corresponding cell type may, although high, not be higher than that in all the other clusters. As a result, the cells truly from a cell type, but in which the marker genes are not highly expressed, may be falsely excluded from this cell type. By using an elevated expression level μi+δik to cells in cell type k and a common base expression level μi to all other cells, the third alternative formation has constraints with strength stronger than the first alternative but weaker than the second. However, it still does not allow each marker gene to freely choose between an elevated level and a base level and thus does not fit the data as properly as scSorter.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A