Fast processing of environmental DNA metabarcoding sequence data using convolutional neural networks
bioRxiv, ISSN: 2692-8205
2021
- 1Citations
Metric Options: CountsSelecting the 1-year or 3-year option will change the metrics count to percentiles, illustrating how an article or review compares to other articles or reviews within the selected time period in the same journal. Selecting the 1-year option compares the metrics against other articles/reviews that were also published in the same calendar year. Selecting the 3-year option compares the metrics against other articles/reviews that were also published in the same calendar year plus the two years prior.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Metrics Details
- Citations1
- Citation Indexes1
- CrossRef1
Article Description
The intensification of anthropogenic pressures have increased consequences on biodiversity and ultimately on the functioning of ecosystems. To monitor and better understand biodiversity responses to environmental changes using standardized and reproducible methods, novel high-throughput DNA sequencing is becoming a major tool. Indeed, organisms shed DNA traces in their environment and this”environmental DNA” (eDNA) can be collected and sequenced using eDNA metabarcoding. The processing of large volumes of eDNA metabarcoding data remains challenging, especially its transformation to relevant taxonomic lists that can be interpreted by experts. Speed and accuracy are two major bottlenecks in this critical step. Here, we investigate whether convolutional neural networks (CNN) can optimize the processing of short eDNA sequences. We tested whether the speed and accuracy of a CNN are comparable to that of the frequently used OBITools bioinformatic pipeline. We applied the methodology on a massive eDNA dataset collected in Tropical South America (French Guiana), where freshwater fishes were targeted using a small region (60pb) of the 12S ribosomal RNA mitochondrial gene. We found that the taxonomic assignments from the CNN were comparable to those of OBITools, with high correlation levels and a similar match to the regional fish fauna. The CNN allowed the processing of raw fastq files at a rate of approximately 1 million sequences per minute which was 150 times faster than with OBITools. Once trained, the application of CNN to new eDNA metabarcoding data can be automated, which promises fast and easy deployment on the cloud for future eDNA analyses.
Bibliographic Details
Provide Feedback
Have ideas for a new metric? Would you like to see something else here?Let us know