Discovery and Characterisation of miRNA Genes in Atlantic Salmon26 August 2013
MicroRNAs (miRNAs) are an abundant class of endogenous small RNA molecules that downregulate gene expression at the posttranscriptional level. This study by Rune Andreassen, Oslo and Akershus University, et al, aims to identify and characterize miRNA genes in Atlantic salmon by deep sequencing analysis of small RNA libraries from nine different tissues.
MicroRNAs (miRNAs) are an abundant class of endogenous small RNA molecules. They regulate gene expression as part of the miRNA-induced silencing complex (miRISC) at the post-transcriptional level by binding to the mRNA of target genes in a sequence specific manner. The binding of the miRISC to mRNA results in downregulation of gene expression either by inhibition of translation or by degradation of the target genes. Most mature miRNAs are 20–24 nt in length while precursor-miRNAs are usually 60–80 nt and have a hairpin secondary structure. Some miRNAs are highly conserved from species to species while other miRNAs seems to be species specific. They play important roles in multiple biological processes by regulating genes that control developmental timing, growth, stem cell division and apoptosis. They are often expressed in a tissue-specific manner, and a large proportion, probably more than 30 per cent, of all protein coding genes of animals may be regulated by miRNAs. Failure in miRNA expression or mutation in miRNA genes may result in genetic disease. There are e.g. 163 diseases reported in the miR2Disease database that are associated with dysfunction of miRNA genes or miRNA/target gene-interaction. Dysfunctional miRNA/target gene interaction may also contribute to development of cancer when miRNAs e.g. act as oncogenes. On the other hand, naturally occurring variation in miRNA genes or miRNA target sites may contribute to normal phenotypic differences. Some of these phenotypic variants, like the one affecting muscularity in sheep, may affect economically important traits.
Recent advances in sequencing technology have led to increased sensitivity in sequencing analysis (deep sequencing) that allows even lowly abundant small RNAs to be detected. Experimental data from such deep sequencing analysis together with bioinformatic tools that utilize the current knowledge about the characteristic structure of miRNA precursor molecules and how they are processed into mature miRNAs in the cell may be used in miRNA discovery projects. Combining the sensitive deep sequencing methods and these tools it is possible to discover both novel and evolutionary conserved miRNAs.
Despite the position Atlantic salmon (Salmo salar) has as an economically important domesticated animal, and despite the focus on functional genomics in aquaculture, there has been little research on miRNAs in Salmo salar, and in miRBase (http://www.mirbase.org/), there are at present no Salmo salar miRNAs. The regulatory role of miRNAs in growth, in the immune system or in other developmental and physiological processes in salmon is therefore unknown. However, the fact that heptamers identical to known miRNA binding sites are conserved in the 3′ UTRs of Salmo salar genes, and that homology based in silico studies indicate that there are many miRNA genes in the salmon genome both suggest, as expected, that miRNAs are important regulators that control a large proportion of protein coding genes also in Atlantic salmon.
Due to a relatively recent whole-genome duplication (WGD) believed to have occurred between 25 and 120 million years ago in the common salmonid ancestor the salmon genome is complex. Present salmonids appear to have retained more than 50 per cent of loci as duplicates, also referred to as ohnologs i.e. duplicate genes that originate from a WGD. Many of the miRNAs that are evolutionary conserved across species would therefore be expected to exist as duplicate gene copies (ohnologs) in Salmo salar. Ohnologs may be deleted or develop into pseudogenes, but they also have the potential to gain new function. It has been suggested that WGD could allow for a more rapid evolution of novel miRNA families, although evolutionary studies of ancient vertebrate WGDs have not supported such a hypothesis. Since Atlantic salmon is a vertebrate species with an additional and recent WGD, studies of miRNAs in this species might contribute to our biological knowledge on the fate and evolution of miRNAs following whole genome duplications.
Furthermore, knowledge of miRNAs and their target genes may in the future be used to control health and to improve performance of economically important traits in farmed animals and aquaculture species. Thus, the aims of this study were to identify and characterize both evolutionary conserved as well as putative novel miRNA genes in Atlantic salmon by deep sequencing analysis of small RNA libraries. Nine different tissues were analyzed independently to identify a large number of miRNAs with a high confidence. This also allowed for a comparison between tissues to detect any obvious differences in tissue specific expression of particular miRNAs. All miRNAs discovered were mapped to genomic locations in the present version of the Salmo salar genome assembly. The subsequent comparison of miRNA precursor locations allowed us to map miRNA clusters in the salmon genome.
Results and discussion
Discovery and characterization of miRNAs in Salmo salar
Total RNA from liver (two samples), spleen (two samples), kidney, head kidney, heart, brain, intestine, white muscle and gills from individuals at the pre-smolt developmental stage as well as total RNA from a one day old individual was successfully extracted. The concentration of total RNA ranged from 58–900 ng/μl (total volume 100 μl). Following size separation and preparation of the libraries the twelve tissue and developmental stage specific libraries were successfully subjected to next generation sequencing using Illumina Genome Analyzer IIx sequencing platform. The number of quality filtered and adaptor trimmed reads from each sample ranged from 1.4 to 18 millions while the number of unique reads ranged from 64593 to 246444. An overview of total read numbers in the twelve samples together with concentrations of total RNA in the extracts is given in Table 1.
All miRNAs discovered were initially identified by miRDeep analysis. The underlying idea of the miRDeep software package is to detect miRNAs by comparing reads from deep-sequencing data to how miRNA precursors are processed in the cell. The final output from miRDeep analysis was a list of putative miRNA precursors with their corresponding 5p and 3p mature reads that was assigned a miRDeep score (score estimated as described in). All outputs with precursors that showed a score value above the lower threshold (see Methods) and with reads that were perfect matches to the expected 5p and 3p mature miRNAs processed from that particular precursor were considered as putative miRNA genes.
Figure 1 shows an example of a putative Salmo salar miRNA precursor sequence as reported by the initial miRDeep analysis. The reads are shown as “stacks” below the precursor sequence and, as demonstrated in the figure, they align to the precursor in a discrete manner and close to the 5′ end (5p reads) or 3′ end (3p reads) of the putative precursor sequence. The number of each of the unique 5p and 3p reads that align to the precursor is given in the figure, and in this case the read numbers suggested that the 5p reads, being approximately 20× more frequent, were the predominant mature miRNAs while the 3p reads were the less frequent mature miRNAs processed from this precursor. Most reads (both 5p and 3p) align to the precursor from identical 5′ nucleotides. The experimental data in this case supported that this was a true miRNA precursor with the corresponding processed mature miRNAs.
All putative precursor sequences discovered by the miRDeep analysis were further analyzed by BLAST homology searches against the stem-loop sequences in miRBase (see Methods). Any precursor that provided a match with an expectation value less that 1× e-7 against a stem-loop sequence in miRBase was considered a true Salmo salar ortholog of an evolutionary conserved miRNA gene. Any Salmo salar miRNA belonging to miRNA families conserved across species were therefore first identified in an initial miRDeep analysis that was independent of cross-species comparisons. Then the homology based identification approach (BLAST analysis against miRBase) provided further support that they were true miRNAs belonging to a certain evolutionary conserved miRNA family.
A total of 180 distinct evolutionary conserved mature 5p-miRNA sequences and their corresponding 3p-miRNA sequences were identified in this two-step approach together with 356 different putative precursor sequences (mirs) at unique positions within the preliminary assembly of the Salmo salar genome. In addition, a total of 111 of these 356 precursors had one identical copy at another unique genome location, while fifteen precursors had three identical copies and one precursor had four identical copies at unique genome locations. There were thus a total of 501 putative precursors discovered at unique genome locations that corresponded to the 180 conserved mature miRNAs. An overview of all such evolutionary conserved precursor (mir) sequences identified along with their corresponding mature 5p and 3p sequences is given in Additional file 1.
Any precursor that provided a significant match to a stem-loop sequence of a certain miRNA gene family in miRBase was assumed to be a Salmo salar ortholog of this miRNA gene. Consequently, they have been annotated as a Salmo salar precursor belonging to this miRNA family in Additional file 1. Many of the mature sequences within the same family of miRNA genes revealed small sequence difference. These were annotated using the same family number but differed by adding lettered suffixes (−a, -b etc.) in accordance with the nomenclature rules.
The annotation of miRNAs showed that we have identified 106 evolutionary conserved miRNA families in our material. The type of evolutionary conserved miRNA genes present in a species is expected to be in accordance with its taxonomic level. All 106 miRNA families identified are among those expected to be present in teleosts while those miRNAs suggested to be specific to Mammalia, Tetrapoda and Amniota were not observed.
The result from the miRDeep analysis showed that there were several additional putative precursor sequences revealing scores above threshold and with reads that aligned as expected for true 5p and 3p mature miRNAs. However, homology analysis showed that they did not belong to any of the miRNA gene families in miRBase. These putative novel miRNAs were further analyzed by BLAST searches against the preliminary salmon genome sequence assembly. Those putative precursor sequences that provided multiple significant hits against the salmon genome sequence (more than 5 hits with E-values <1× e-6) were removed (data not shown). These sequences had initially been identified as putative precursors since reads aligned discretely and they had the ability to form stable stem loops. However, being present in multiple copies in the salmon genome they were considered likely to be some kind of interspersed repeats and/or long tandem repeats. The reads that matched these sequences did, however, align in a discrete manner and, thus, have the properties expected for processed small RNAs. We can, therefore, not rule out that they may represent some kind of functional small RNAs in Salmo salar.
The remaining putative precursors were analyzed against other small RNA databases and the refseq-RNA database in Genbank (see Methods), but did not match other kinds of noncoding small RNAs or mRNAs. They did, however, exhibit the following common properties. The precursors along with the reads were detected in at least two samples, and, in all cases, reads that perfectly matched the expected mature miRNAs (5p and 3p) were detected. Finally, most of the reads showed the properties expected from products of processed precursors, aligning from identical nucleotide positions at their 5′ end. They meet the consensus criteria used to recognise novel miRNAs, and are likely to be true novel miRNAs. A total of 13 distinct novel mature 5pmiRNAs and their corresponding 3p-miRNAs and 15 different putative precursors, some being present as identical duplicates, were discovered in our material. Table 2 gives an overview of all the putative novel miRNA precursors and their 5p and 3p mature miRNAs. The novel miRNAs with their precursor and mature sequences as well as their genome locations is also given in Additional file 2. All miRNAs discovered have been submitted to miRBase. The data from this study has also been submitted to the NCBI SRA database (accession # SRP022967). The accession numbers to data from the individual samples are given in Table 1.
Taken together we have identified 180 distinct evolutionary conserved miRNAs and 13 distinct novel miRNAs. The precursor sequences of the evolutionary conserved mature miRNAs were distributed in multiple genomic locations and corresponded to a total of 501 putative precursors discovered at unique genome locations. Only 44 out of the 180 distinct conserved mature sequences (25 per cent) corresponded to one single precursor located at one unique genome location while the others (75 per cent) corresponded to precursors located in more than one genome location (either two identical precursors or slightly different ones matching identical mature miRNAs). Thus, about three quarters of the distinct conserved mature sequences could be transcribed from multicopy miRNA genes. Many of these precursors may, however, be transcriptionally inactive pseudogenes. The corresponding percentage of such multicopy miRNA genes in zebrafish (Danio rerio) was reported by Chen et al. to be about 44 per cent (68 out of 153). This could indicate that a larger percentage of evolutionary conserved miRNA genes exist as multicopy genes in Salmo salar. A larger copy number of those miRNA genes that are conserved across species would be in agreement with previous studies indicating that about 50 per cent of the Salmo salar genome consists of duplicate sequence originating from the salmonid specific genome duplication.
However, to assemble the salmon genome sequence with its large amount of highly similar duplicate sequences is challenging, and the preliminary genome assembly may be of a relatively poor quality. Thus, we cannot rule out that some of the precursors now assigned to different contigs i.e. appears as duplicated at different unique genome locations, may in fact not be true duplicates, but just single locus sequence that has been incorrectly assigned into different contigs. The distinct mature novel miRNAs, on the other hand revealed a somewhat different precursor distribution. Eight out of the thirteen distinct mature miRNAs (61 per cent) corresponded to one single precursor found in one unique genome location. One would expect miRNAs that have evolved after the salmonid specific genome duplication to be present in a lower copy number than the evolutionary conserved ones. This finding is, thus, in agreement with the expectations for these miRNAs and supports the fact that they could be true novel salmon specific miRNAs.
There is usually an arm selection when precursors are processed leading to a high copy number of products from one arm and much less frequent number of mature products from the other arm. While most miRNAs show arm selection it has been reported that in some miRNA genes there may be a less pronounced difference in expression of the mature products with similar copy numbers of the 5p and 3p mature miRNAs (see e.g.). To assess any dominance in arm preference among the mature miRNAs in our material we compared read counts of mature miRNAs aligning to either 5p or 3p for all evolutionary conserved miRNAs discovered. We found only a few cases with similar copy numbers of mature miRNAs from both arms while there was a 5p arm dominance in approximately 60 per cent of all cases and 3p arm dominance in the remaining cases. A similar distribution of either 5p or 3p arm dominance was observed in the group of distinct novel miRNAs. Also in this group there were slightly more cases where mature miRNAs were processed from the 5p arm (55 per cent of the cases). These distributions correspond well with observations from other studies, e.g. Li et al., and in Additional files 1 and 2 the arm dominance (5p or 3p) is given for each miRNA.
miRNA gene clusters discovered in Salmo salar are evolutionary conserved
Clusters of miRNA genes have been reported in several species including medaka (Oryzias latipedes) and zebrafish (Danio rerio). A miRNA gene cluster is, according to miRBase, defined as two or more miRNA genes located less than 10 kb apart and with same direction of the transcription. Applying this definition a total of 198 of the precursors discovered (approximately 40 per cent of the precursors) were located in a gene cluster, and there were a total of 84 gene clusters. One of the novel miRNA genes discovered (ssa-mir 2184) was located in a gene cluster with two other conserved miRNA genes (ssa-mir 212a-2 and 132–2). The remaining miRNA genes that were clustered were from the group of evolutionary conserved miRNAs. Most of the gene clusters consisted of two precursors, but from three to six clustered precursors was also observed. Together, a total of 87 distinct mature miRNAs may be transcribed from clustered precursors. The distances between precursors in our clusters were in most cases small and most often less than five kb. This is half of the size distance usually used for defining gene clusters. However, a miRNA gene cluster can only be detected in the salmon genome when located within a contig, and the total size of most contigs of the preliminary genome assembly is rather small (the contig N50 is 9.3 kb). This limits the ability to discover gene clusters in the Salmo salar genome. The number of gene clusters may therefore be underestimated in our material. A complete overview of all gene clusters, their locations within a contig and the Genbank references to these contigs is given in Additional file 3.
Apart from eight gene clusters that were discovered in one contig only, the clustered miRNA genes could be subdivided into 20 groups where the members of each group consisted of the same miRNA gene families clustered in the same direction but observed in different contigs. There were from two to nine contigs showing such similar miRNA gene-clusters within each of the 20 groups in our material. These groups are indicated by roman numerals in Additional file 3.
Multiple copies of certain miRNA gene clusters within a genome have also been observed in other species. However, a comparison with results from Danio rerio (see Table 3) showed that, in general, there was a larger duplicate number in Salmo salar. Again, considering the unfinished state of the preliminary salmon genome sequence, it is possible that some of these duplicate gene clusters are not true duplicates, but located in single sequence loci that have been incorrectly assigned into different contigs.
To reveal whether the particular gene clusters discovered in Atlantic salmon were present in other vertebrate species we compared the 28 different Salmo salar miRNA gene clusters to the ones discovered in Danio rerio (miRBase and ) and humans (data from miRBase). These comparisons showed that 26 out of the 28 Salmo salar miRNA gene clusters discovered were also observed in other vertebrates. Table 3 shows the 26 different gene clusters from Atlantic salmon together with the orthologous gene clusters in Danio rerio. These comparisons showed that most of the gene clusters discovered in Atlantic salmon are conserved across vertebrate species. The fact that these particular miRNA precursors discovered in our material are located in clusters as reported in other vertebrates support that they are true Salmo salar miRNA genes.
Tissue specific expression differences and functional predictions
The Illumina® TruSeq Small RNA sample preparation protocol is designed to provide data that may be used to profile expression levels of miRNAs (see Methods). To test the performance of the method directly in our material we performed a linear regression analysis of normalized read count levels (see Methods) of all miRNAs in the two liver samples. This analysis showed a Pearson correlation coefficient (r) of 0.97 indicating that the method applied reproduced the different expression levels of the individual miRNAs very well. Although expression differences could be measured more accurately and confirmed by additional quantitative analysis, we believe that large differences between individual miRNAs from different tissue samples in our present material would suggest that they are expressed in a tissue specific manner. A few such differences were observed when applying DESeq (see Methods) to compare miRNAs expressed in one tissue to all other tissues. One such miRNA, ssa-miR 736, revealed a normalised read count of 261 in the heart tissue sample (sample 8, Table 1). No ssa-miR 736 reads were detected in any of the other tissue samples. Applying the DESeq package (see Methods) to further evaluate the observed difference in ssa-miR 763 expression showed that the difference was significant (P-adj = 0.02). Studies in other vertebrates have shown that miR 736, a gene conserved across species, belongs to the 208 family of miRNA genes. This family of miRNA genes is specifically expressed in cardiac tissue. It is therefore often referred to as “myomiR”. Our observation suggests that ssa-miR 736 has a similar tissue specific function in Salmo salar.
One may assume that the miRNA gene clusters that are observed across species (same miRNA genes and transcriptional directions) are evolutionary conserved as clusters because they are important key genes in regulatory gene networks that are essential to all vertebrates. The evolutionary conserved Salmo salar miRNA gene clusters could therefore be expected to have similar regulatory functions in this species as in other vertebrates. One could, from such an assumption, predict that the ssa-mir-15e, 16a-2 gene cluster regulate cell cycle progression while the genes in the ssa-mir-144, 451a-1, 451a-1 gene cluster is likely to regulate erythropoiesis. Such predictions would be more robust if there were additional experimental evidence that supported the assumed function of a given Salmo salar gene cluster. The miRNA 212 and 132 gene cluster is known to be important in neurological development and time perception, and due to these important functions their mature miRs are enriched in neuronal cells. Interestingly, the largest number of reads that perfectly matched the mature reads from the clustered ssa-miR genes 212 and 132 was observed in brain tissue (sample 10, Table 1). Applying the DESeq package the normalized read counts were 13860 and 5293 for miR 212 and 132, respectively. In contrast, normalized read counts in the other tissue samples were 72 and 44 for miR 212 and 132, respectively (log2 fold changes of approx. -7). The following comparison of brain tissue to the other tissues showed that the differences for both miRNAs were significant (P-adj = 0.003). The results suggest that the salmon miRNAs belonging to 212 and 132 families are more extensively expressed in brain tissue. Thus, in the case of these genes, both the fact that they are clustered and the fact that they showed an elevated expression in brain tissue indicate that they have similar developmental and regulatory functions in Salmo salar neuronal cells as revealed in other vertebrates.
One of the novel miRNAs discovered, ssa-miR 8163, was observed in liver tissue (sample 1 and 6, Table 1), and at low level in the sample from the one day old individual (consisting of a mix of all tissues, sample 13, Table 1) while no ssa-miR 8163 reads were detected at all in any of the other samples. Applying DESeq the normalized read counts in liver (two samples combined) was 241 while they were absent in samples from other tissues. Further analysis showed that the difference was significant (P-adj = 0.049). This suggests that there may be a higher expression level of this novel miRNA in liver tissue. The precursor sequence of this miRNA (ssa-mir 8163) was present as a single copy sequence in the salmon genome assembly. To retrieve more information about the genome location of this miRNA gene we performed a BLAST analysis against the nt/nr-database in Genbank. This homology analysis revealed an almost perfect match (97 per cent identity) to intron number 7 in the transferrin gene of Oncorhynchus tshawytscha (Genbank: AH008271, basepairs 2808–2868, E = 2× e-19). Thus, the gene is located in, and presumably co-transcribed with, the transferrin gene that is known to be under positive selection among salmonids.
This study provides the mature miRNAs and their precursor sequences to a large number of conserved Salmo salar miRNAs. Thirteen distinct novel mature miRNAs were also discovered. The comparison of precursor locations within the salmon genome revealed a large number of evolutionary conserved Salmo salar miRNA gene clusters. Together, these results provide knowledge on miRNAs in Atlantic salmon that will greatly facilitate further functional studies in this economically important species.
You can view the full report by clicking here.