Gene Expression Data.

BF Ben D. Fulcher
AF Alex Fornito
request Request a Protocol
ask Ask a question
Favorite

Gene expression data measured using in situ hybridization (ISH) from the adult C57BL/6J male mouse at age P56 were obtained from the Allen Mouse Brain Atlas (26). Allen Mouse Brain Atlas gene expression data were retrieved for the same set of 213 anatomical brain regions as reported for the mesoscale mouse connectome (24) by querying the Allen API (api.brain-map.org/api/v2/data). All 22,157 section datasets were retrieved (in JSON format) using the following API query: api.brain-map.org/api/v2/data/query.json?criteria=model::SectionDataSet,rma::criteria,[failed$eq'false'][expression$eq'true'],products[id$eq1]. For each section dataset retrieved, gene metadata were obtained using a query of the following form: api.brain-map.org/api/v2/data/query.json?criteria=model::Gene,rma::criteria,data_sets[id$eqXXX] for each dataset identification (XXX). To get the identifications of structures (brain regions) used in the connectivity analysis, we first downloaded all structures in the Allen Mouse Brain Atlas using the following query: api.brain-map.org/api/v2/data/query.json?criteria=model::Structure,rma::criteria,[graph_id$eq1] and then matched them to the 213 structures used in the connectivity analysis [matching on region acronyms provided in the work by Oh et al. (24)]. We then iterated over these 213 structures and all of the section datasets retrieved above to retrieve measures of gene expression energy and density for each brain region using queries of the following form, api.brain-map.org/api/v2/data/query.json?criteria=model::StructureUnionize,rma::criteria,section_data_set[id$eqXXX],structure[id$eqYYY], for each section dataset identification (XXX) and each structure identification (YYY). In this way, we obtained measures of gene expression density and energy (defined below) for each of 213 brain regions and 22,157 section datasets.

We analyzed the full set of 22,157 experimental section datasets spanning 17,642 unique genes. ISH data were obtained from either sagittal or coronal sections (intersection spacing of 200 μm), which are registered to the Allen Mouse Brain Atlas using an algorithm that results in ISH data in the atlas space at 100-μm3 resolution [in supplemental methods 2 in the work by Lein et al. (26)]. Each 100-μm3 “quadrat” is labeled with the anatomical structures that it intersects, allowing quantification of expression statistics for a given brain region. Gene expression for a brain region was quantified in two ways: (i) expression density, which refers to the proportion of expressed voxels in an anatomical division, and (ii) expression energy, which measures the mean pixel intensity in a region (26, 49). We followed previous studies and used expression energy (20), but note that energy and density measurements are similar and that the main qualitative results of this paper are also reproduced using expression density. Genes measured in multiple experiments were represented by their average expression level in each region over those experiments, as per previous work (17). Because of potential differences in data quality between expression measurements derived from coronal and sagittal sections, we checked that the qualitative results of this paper were not sensitive to our use of both coronal and sagittal section data. Indeed, the main results were reproduced when computing coexpression values using data from 3,191 genes measured from coronal sections, including the exponential distance dependence of gene coexpression; the trends in coexpression across reciprocally connected, unidirectionally connected, and unconnected pairs of brain regions; and the trend across topological connection type, such that coexpression is highest for rich links followed by feeder links and then peripheral links within the topological rich club regime.

The magnitudes of ISH-measured expression levels are not directly comparable across genes but rather, reflect the relative amount of signal, arising from limitations of high-throughput, nonradioactive ISH (namely tyramide amplification for detecting low transcript concentrations, variations in probe permeability into the cell, variability in cell volume, and probe accessibility to mRNA) (43). To facilitate a meaningful comparison of ISH measurements across different genes, we required a transformation that put all genes on a comparable scale and which also accounted for the presence of outliers in the data (which often represent artifacts). Accordingly, we normalized the expression levels across the brain for each gene using a sigmoidal transformation:

where S(x) is the normalized expression value of a given gene, x is the raw expression value of that gene, and 〈x〉 and σx are the mean and SD of the expression values for that gene across the brain, respectively. After normalization, each gene was linearly rescaled to the unit interval, yielding a normalized set of expression values for each gene (shown for all 17,642 genes across all 213 brain regions in Fig. 1C). Normalized gene expression levels can be interpreted as the relative expression of that gene across the brain: from low values for that gene (blue in Fig. 1C) to high values for that gene (red in Fig. 1C).

Unnormalized expression values used in other work (17, 18, 26, 49) or monotonic transformations of these values, such as the logarithmic transformation (20), do not take into account the particular distribution of each gene’s expression across the brain, do not saturate outlying expression data (e.g., because of potential artifacts in these data), and allow genes with high overall expression to dominate computed coexpression values. Robust normalizing transformations, such as the Hampel hyperbolic tangent transformation, could more directly account for outliers in the data, but here we used the standard sigmoid for simplicity. We note, however, that the main results reported here are not a consequence of using sigmoidal normalization; we found similar differences in gene coexpression using unnormalized data, a linear rescaling to the unit interval, and the Hampel hyperbolic tangent transformation. The low values of coexpression reported here relative to other studies (26, 49) are because of the normalization of gene expression and the spatial correction applied to gene coexpression values. As explained above, spatial correction of gene coexpression allows us to be confident that our results represent robust effects of connectivity and connection topology that cannot be explained simply by the spatial proximity of different pairs of brain regions. We note that our qualitative results are not caused by spatial correction; similar qualitative results were obtained when no spatial correction was applied (see Fig. S3A and Fig. S6A).

Expression data are relatively complete, with only 293 of 17,642 genes displaying more than 10% missing values across 213 brain regions analyzed here (missing values are plotted green in Fig. 1C). Only 6 of 213 brain regions had more than 10% gene expression data missing: perirhinal area (PERI, isocortex, 48.6% missing), primary auditory area (AUDp, isocortex, 35.0% missing), ventral auditory area (AUDv, isocortex, 34.1% missing), nucleus raphe magnus (RM, medulla, 24.7% missing), periventricular hypothalamic nucleus, preoptic part (PVpo, hypothalamus, 19.5% missing), and dorsal auditory area (AUDd, isocortex, 10.1% missing). The treatment of missing values in gene coexpression calculations is explained below.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A