Parallel Lossy Compression for Large FASTQ Files
Communications in Computer and Information Science, ISSN: 1865-0937, Vol: 1814 CCIS, Page: 97-120
2023
- 3Citations
Metric Options: CountsSelecting the 1-year or 3-year option will change the metrics count to percentiles, illustrating how an article or review compares to other articles or reviews within the selected time period in the same journal. Selecting the 1-year option compares the metrics against other articles/reviews that were also published in the same calendar year. Selecting the 3-year option compares the metrics against other articles/reviews that were also published in the same calendar year plus the two years prior.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Metrics Details
- Citations3
- Citation Indexes3
Conference Paper Description
In this paper we present a parallel version for the algorithm BFQzip, we introduced in [Guerrini et al., BIOSTEC – BIOINFORMATICS 2022], that modifies the bases and quality scores components taking into account both information at the same time, while preserving variant calling. The resulting FASTQ file achieves better compression than the original data. Here, we introduce a strategy that splits the FASTQ file into t blocks and processes them in parallel independently by using the BFQzip algorithm. The resulting blocks with the modified bases and smoothed qualities are merged (in order) and compressed. We show that our strategy can improve the compression ratio of large FASTQ files by taking advantage of the redundancy of reads. When splitting into blocks, the reads belonging to the same portion of the genome could end up in different blocks. Therefore, we analyze how reordering reads before splitting the input FASTQ can improve the compression ratio as the number of threads increases. We also propose a paired-end mode that allows to exploit the paired-end information by processing blocks of FASTQ files in pairs. Availability: The software is freely available at https://github.com/veronicaguerrini/BFQzip
Bibliographic Details
http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85172206063&origin=inward; http://dx.doi.org/10.1007/978-3-031-38854-5_6; https://link.springer.com/10.1007/978-3-031-38854-5_6; https://dx.doi.org/10.1007/978-3-031-38854-5_6; https://link.springer.com/chapter/10.1007/978-3-031-38854-5_6
Springer Science and Business Media LLC
Provide Feedback
Have ideas for a new metric? Would you like to see something else here?Let us know