Genome compression: A novel approach for large collections | all4bioinformatics
Breaking News
Loading...

Sunday, 1 September 2013

Genome compression: A novel approach for large collections













Abstract
Motivation: Genomic repositories are rapidly growing, as witnessed by the 1000 Genomes or the UK10K projects. Hence, compression of multiple genomes of the same species is becoming an active research area in the last years. The well-known large redundancy in human sequences is not easy to exploit because of huge memory requirements from traditional compression algorithms.
Results: We show how to obtain several times higher compression ratio than of the best reported results, on two large genome collections (1092 human and 775 plant genomes). Our input are VCF files restricted to their essential fields. More precisely, our novel LZ-style compression algorithm squeezes a single human genome to about 400KB. The key to high compression is to look for similarities across the whole collection, not just against one reference sequence, what is typical for existing solutions.
Availability: http://sun.aei.polsl.pl/tgc(also as Supplementary material) under a free license.
Supplementary data:available at Bioinformatics online.

Contact: sebastian.deorowicz@polsl.pl

google+

linkedin

About Author
  • Donec sed odio dui. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Sed posuere consecteturDonec sed odio dui. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Read More

    0 comments:

    POST A COMMENT

     

    Gallery

    About

    About Us