Computational determination of neighborhoods enriched with deletions

VR Veronica V. Rezelj LC Lucía Carrau FM Fernando Merwaiss LL Laura I. Levi DE Diana Erazo QT Quang Dinh Tran AH Annabelle Henrion-Lacritick VG Valérie Gausson YS Yasutsugu Suzuki DS Djoshkun Shengjuler BM Bjoern Meyer TV Thomas Vallet JW James Weger-Lucarelli VB Veronika Bernhauerová AT Avi Titievsky VS Vadim Sharov SP Stefano Pietropaoli MD Marco A. Diaz-Salinas VL Vincent Legros NP Nathalie Pardigon GB Giovanna Barba-Spaeth LB Leonid Brodsky MS Maria-Carla Saleh MV Marco Vignuzzi

This protocol is extracted from research article:

Defective viral genomes as therapeutic interfering particles against flavivirus infection in mammalian and mosquito hosts

**
Nat Commun**,
Apr 16, 2021;
DOI:
10.1038/s41467-021-22341-7

Defective viral genomes as therapeutic interfering particles against flavivirus infection in mammalian and mosquito hosts

Procedure

“Junction” reads (reads that align to the reference virus genome but not as a continuous alignment) were grouped into clusters of fragments with similar start and end deletion positions. Specifically, deleted fragments with less than 10 positions in divergence in their start and end positions were grouped in a cluster (“elementary deletion cluster”). The size of the elementary deletion cluster is a “junction coverage” of the deletions’ narrow start and end position interval by reads with the corresponding junctions. The ratio of “junction coverage” to the sum of “junction coverage” and “continuous coverage” (derived from reads with continuous alignment that cover these deletions’ start and end intervals) is a frequency of the elementary deletion cluster.

The most abundant deletions (with frequencies higher than 0.0085%) from all high MOI replicates were selected to determine regions in the viral genome in which deletions were more predominant. Since deletions with this threshold are not uniformly distributed, enriched areas can be found unequivocally by the applied nested neighborhood algorithm in a 2-dimensional plane composed of start and end positions of deletions as X and Y coordinates. This method allows detecting the area (neighborhood) enriched by points around a certain center by sequentially extending the neighborhood’s border (distance to the center from the next closest point) and calculating the neighborhood’s fractal dimension to get more accurate *p* values of enrichments on each step. For methods describing the calculation of fractal dimension, refer to the “Fractal dimensions of neighborhoods” section.

Putative centers of the neighborhoods enriched by deletions were detected on the plane using a grid method. Namely, a number of deletions (“points”) were randomly selected as the grid references. For all points, distances to these reference points were calculated. For every point, a product of all-rounded logs of its Euclidean distances to the reference points was used as a hashing index. The hashing indexes of all points were sorted, and big enough islands of points (threshold ≥ 7 elementary deletion points) in the sorting that have the same hash index were considered as containing putative centers of significant enrichment. Any point of the island can be used as a center for the subsequent determination of the center’s neighborhood most significantly enriched by points/deletions. Let the null hypothesis assumption be that all points/deletions are uniformly distributed on the start/end plane (i.e., no enrichments assumption). Then probability for a number of points to be in a neighborhood of radius *r* from the center with volume (or an area if the space is a two-dimensional plane) *V*_{r} can be calculated from Poisson distribution. Indeed, if points are uniformly distributed in the neighborhood with radius *R*, *R* > *r* of volume *V*_{R}., then the number of points in a neighborhood with the same center and radius *r* (therefore, with volume *V*_{r}) will be a random variable, having a Poisson distribution with the parameter $\lambda =\alpha \cdot {r}^{k}$, where *k* is the dimension of the space, and *α* is the density of the uniform distribution of *n* points in *V*_{R}
$\left(\alpha =\frac{n}{{R}^{k}}\right)$. Thus, *n* is proportional to *Vr*. The probability (*P*) of finding more or equal to *m* points in a neighborhood with radius *r* (pvalue) will be:

All deletions/points were sorted according to closeness to the selected central point. Those regions that are most enriched by points were determined as follows. In the sorting, let us consider a transition from a neighborhood with radius *r*_{t}, which is equal to a distance from the center to point *t* in the sorting, to a neighborhood with radius ${r}_{t+1}$. The Poisson *p* value of the enrichment of the neighborhood of radius *r*_{t} containing *t* points is calculated from the perspective of the extended neighborhood with the radius ${r}_{t+1}$ and assumptions that its *t* + 1 points are uniformly distributed in this volume of space of fractal dimension: the fractal dimension is defined by the sequence of distances from the *t* + 1 points to the center. For non-uniform dense areas, the Hausdorff fractal dimension is higher than the geometrical dimension of the plane equal to two. This higher dimension makes a drop of *p* value sharper on a transition from *t* to *t* + 1 than in two-dimensional space. The *t*-neighborhood with the most significant Poisson *p* value was selected as the best neighborhood, i.e., the one that is most enriched by deletions/points.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A

Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.