TY - JOUR
T1 - FrangiPANe, a tool for creating a panreference using left behind reads
AU - Christine, Tranchant Dubreuil
AU - Clothilde, Chenal
AU - Mathieu, Blaison
AU - Laurence, Albar
AU - Valentin, Klein
AU - Cédric, Mariac
AU - Wing Rod, A.
AU - Yves, Vigouroux
AU - Francois, Sabot
N1 - Publisher Copyright:
© 2023 The Author(s).
PY - 2023/3/1
Y1 - 2023/3/1
N2 - We present here FrangiPANe, a pipeline developed to build panreference using short reads through a map-then-assemble strategy. Applying it to 248 African rice genomes using an improved CG14 reference genome, we identified an average of 8 Mb of new sequences and 5290 new contigs per individual. In total, 1.4 G of new sequences, consisting of 1 306 676 contigs, were assembled. We validated 97.7% of the contigs of the TOG5681 cultivar individual assembly from short reads on a newly long reads genome assembly of the same TOG5681 cultivar. FrangiPANe also allowed the anchoring of 31.5% of the new contigs within the CG14 reference genome, with a 92.5% accuracy at 2 kb span. We annotated in addition 3252 new genes absent from the reference. FrangiPANe was developed as a modular and interactive application to simplify the construction of a panreference using the map-then-assemble approach. It is available as a Docker image containing (i) a Jupyter notebook centralizing codes, documentation and interactive visualization of results, (ii) python scripts and (iii) all the software and libraries requested for each step of the analysis. We foreseen our approach will help leverage large-scale illumina dataset for pangenome studies in GWAS or detection of selection.
AB - We present here FrangiPANe, a pipeline developed to build panreference using short reads through a map-then-assemble strategy. Applying it to 248 African rice genomes using an improved CG14 reference genome, we identified an average of 8 Mb of new sequences and 5290 new contigs per individual. In total, 1.4 G of new sequences, consisting of 1 306 676 contigs, were assembled. We validated 97.7% of the contigs of the TOG5681 cultivar individual assembly from short reads on a newly long reads genome assembly of the same TOG5681 cultivar. FrangiPANe also allowed the anchoring of 31.5% of the new contigs within the CG14 reference genome, with a 92.5% accuracy at 2 kb span. We annotated in addition 3252 new genes absent from the reference. FrangiPANe was developed as a modular and interactive application to simplify the construction of a panreference using the map-then-assemble approach. It is available as a Docker image containing (i) a Jupyter notebook centralizing codes, documentation and interactive visualization of results, (ii) python scripts and (iii) all the software and libraries requested for each step of the analysis. We foreseen our approach will help leverage large-scale illumina dataset for pangenome studies in GWAS or detection of selection.
UR - http://www.scopus.com/inward/record.url?scp=85160639387&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85160639387&partnerID=8YFLogxK
U2 - 10.1093/nargab/lqad013
DO - 10.1093/nargab/lqad013
M3 - Article
AN - SCOPUS:85160639387
SN - 2631-9268
VL - 5
JO - NAR Genomics and Bioinformatics
JF - NAR Genomics and Bioinformatics
IS - 1
M1 - lqad013
ER -