This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Pangenome Analysis in Prokaryotes

Welcome to this lesson on the fundamental principles of Pangenomics, a rapidly advancing field in bioinformatics. Throughout this course, you will delve into the basic theories that underpin the study of pangenomes. By utilizing command-line software, you will gain hands-on experience in downloading and annotating public bacterial genomes, thereby acquiring essential skills for genomic analysis.

One of the key highlights of this course is the opportunity to engage with specialized programs designed for pangenomics analysis. You will master the art of gene family clustering. You will become adept at constructing interactive pangenome graphs and plots, powerful visualization tools for studying the general structure of a pangenome and the families composing it. You will finally explore how to apply Topological Data Analysis to the study of pangenomes.

The analyses presented here were meticulously curated to equip you with the necessary tools for conducting a starting pangenomics pipeline. By refining your bioinformatician skills through practical application, you will not only gain confidence in your abilities but also be well-prepared to explore diverse resources (See Other Resources). With this, you can go ahead and develop your personalized workflow tailored to the specific objectives of your pangenomics research.

Get ready to embark on this exciting journey into the world of Pangenomics, where you will unlock new insights and unravel the complexities of genomic variation!


Before diving into this lesson on Pangenomics, it is essential to have a working understanding of the Bash shell and the language Python. If you are not already familiar with these programming languages, we recommend completing the Introduction to the Command Line for Pangenomics lesson prior to starting this one and Introduction to Python for Pangenomics FIXME 💢.

Additionally, some familiarity with biological concepts is assumed for this lesson. It is beneficial to have a basic understanding of prokaryote, genomes, genes, and orthology. If you are new to these concepts, we encourage you to review relevant materials to ensure a solid foundation for this course.

Throughout this lesson, we will be utilizing data hosted on an Amazon Machine Instance (AMI). Workshop participants will receive information on how to log in to the AMI during the workshop. If you are studying independently, you will need to set up your own AMI or install the necessary programs on your personal computer. Detailed instructions on setting up an AMI and accessing the required data can be found on the Pangenomics Workshop Setup page.
If you are taking this workshop in UNAM-CCM you will access the shell and python, and have access to all the bioinformatics programs through a JupyterHub server.

This lesson is the third part of the Pangenomics Workshop, which also includes Introduction to the Command Line for Pangenomics and Introduction to Python for Pangenomics FIXME 💢.


Setup Download files required for the lesson
00:00 1. Introduction to Pangenomics What is a pangenome?
What are the components of a pangenome?
00:15 2. Downloading Genomic Data How to download public genomes by using the command line?
01:00 3. Annotating Genomic Data How can I identify the genes in a genome?
01:45 4. Measuring Sequence Similarity How can we measure differences in gene sequences?
02:30 5. Clustering with BLAST Results How can we use the blast results to form families?
03:15 6. Clustering Protein Sequences Can I cluster my sequences automatically?
03:55 7. Exploring Pangenome Graphs How can I build a pangenome of thousands of genomes?
How can I visualize the spatial relationship between gene families?
04:45 8. Interactive Pangenome Plots How can I obtain an interactive pangenome plot?
How can I measure the homogeneity of the gene families?
How to obtain an enrichment analysis of the gene families?
How to compute the ANI values between the genomes of the pangenome?
05:40 9. Topological Data Analysis What is topological data analysis?
07:10 10. Computational Tools for TDA How can I computationally manipulate simplex
07:55 11. TDA in Pangenomes How can I apply TDA to describe Pangenomes
08:40 12. Examples TDA in genomics How can I apply TDA to describe Pangenomes
09:25 13. Examples TDA in genomics How can I apply TDA to describe Pangenomes
10:10 14. Other Resources What can I do after I have built a pangenome?
What bioinformatic tools are available for downstream analysis of pangenomes?
10:30 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.