Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics.

Item request has been placed! ×
Item request cannot be made. ×
  Processing Request
  • معلومة اضافية
    • Author-Supplied Keywords:
      Africa
      Bioinformatics
      Docker
      Genomics
      Pipeline
      Reproducibility
      Workflows
    • NAICS/Industry Codes:
      541711 Research and Development in Biotechnology
      541712 Research and Development in the Physical, Engineering, and Life Sciences (except Biotechnology)
    • Abstract:
      Background: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. Results: H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. Conclusion: The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network. [ABSTRACT FROM AUTHOR]
    • Abstract:
      Copyright of BMC Bioinformatics is the property of BioMed Central and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
    • Author Affiliations:
      1Department of Digital Technologies, University of Mauritius, Reduit, Mauritius
      2Australian Centre for Ancient DNA, University of Adelaide, Adelaide, South Australia, Australia
      3Computational Biology Division, Department of Integrative Medical Biosciences, IDM, University of Cape Town, Cape Town, South Africa
      4School of Electrical & Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
      5Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
      6South African National Bioinformatics Institute, University of the Western Cape, Bellville, Cape Town, South Africa
      7Natural Sciences, University of the Western Cape, Bellville, Cape Town, South Africa
      8Institut Pasteur De Tunis, University Tunis El manar, Tunis, Tunisia
      9Institut Superieur des Technologies Medicales de Tunis, University Tunis El manar, Tunis, Tunisia
      10Center for Bioinformatics & Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
      11Department of Electrical & Electronic Engineering, Faculty of Engineering, University of Khartoum, Khartoum, Sudan
      12Genomics Institute, University of California, Santa Cruz, California, USA
      13Common Workflow Language project, Software Freedom Conservancy, New York City, NY, USA
      14Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
      15Centre for Bioinformatics and Computational Biology, Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa
    • Full Text Word Count:
      7729
    • ISSN:
      1471-2105
    • Accession Number:
      10.1186/s12859-018-2446-1
    • Accession Number:
      133250877
  • Citations
    • ABNT:
      BAICHOO, S. et al. Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics. BMC Bioinformatics, [s. l.], v. 19, n. 1, p. 1–13, 2018. DOI 10.1186/s12859-018-2446-1. Disponível em: http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=asn&AN=133250877&custid=s8280428. Acesso em: 11 dez. 2019.
    • AMA:
      Baichoo S, Souilmi Y, Panji S, et al. Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics. BMC Bioinformatics. 2018;19(1):1-13. doi:10.1186/s12859-018-2446-1.
    • APA:
      Baichoo, S., Souilmi, Y., Panji, S., Botha, G., Meintjes, A., Hazelhurst, S., … Heusden, P. van. (2018). Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics. BMC Bioinformatics, 19(1), 1–13. https://doi.org/10.1186/s12859-018-2446-1
    • Chicago/Turabian: Author-Date:
      Baichoo, Shakuntala, Yassine Souilmi, Sumir Panji, Gerrit Botha, Ayton Meintjes, Scott Hazelhurst, Hocine Bendou, et al. 2018. “Developing Reproducible Bioinformatics Analysis Workflows for Heterogeneous Computing Environments to Support African Genomics.” BMC Bioinformatics 19 (1): 1–13. doi:10.1186/s12859-018-2446-1.
    • Harvard:
      Baichoo, S. et al. (2018) ‘Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics’, BMC Bioinformatics, 19(1), pp. 1–13. doi: 10.1186/s12859-018-2446-1.
    • Harvard: Australian:
      Baichoo, S, Souilmi, Y, Panji, S, Botha, G, Meintjes, A, Hazelhurst, S, Bendou, H, Beste, E de, Mpangase, PT, Souiai, O, Alghali, M, Yi, L, O’Connor, BD, Crusoe, M, Armstrong, D, Aron, S, Joubert, F, Ahmed, AE, Mbiyavanga, M & Heusden, P van 2018, ‘Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics’, BMC Bioinformatics, vol. 19, no. 1, pp. 1–13, viewed 11 December 2019, .
    • MLA:
      Baichoo, Shakuntala, et al. “Developing Reproducible Bioinformatics Analysis Workflows for Heterogeneous Computing Environments to Support African Genomics.” BMC Bioinformatics, vol. 19, no. 1, Nov. 2018, pp. 1–13. EBSCOhost, doi:10.1186/s12859-018-2446-1.
    • Chicago/Turabian: Humanities:
      Baichoo, Shakuntala, Yassine Souilmi, Sumir Panji, Gerrit Botha, Ayton Meintjes, Scott Hazelhurst, Hocine Bendou, et al. “Developing Reproducible Bioinformatics Analysis Workflows for Heterogeneous Computing Environments to Support African Genomics.” BMC Bioinformatics 19, no. 1 (November 29, 2018): 1–13. doi:10.1186/s12859-018-2446-1.
    • Vancouver/ICMJE:
      Baichoo S, Souilmi Y, Panji S, Botha G, Meintjes A, Hazelhurst S, et al. Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics. BMC Bioinformatics [Internet]. 2018 Nov 29 [cited 2019 Dec 11];19(1):1–13. Available from: http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=asn&AN=133250877&custid=s8280428