Nontuberculous mycobacteria
A Genetic Analysis Using Online Bioinformatics Tools





Strong Lab

Honda Lab



Overview

This lesson asks you to compare gene sequences between one wild-type and one of a variety of mutant nontuberculous mycobacterial strains. You will identify mutations as single-nucleotide polymorphisms (SNPs) and then make an inference on whether your variant strain will impact the activity of the protein or antibiotics that target the protein.

Objectives

By the end of this activity you will be able to:

1) infer mycobacterial species identity, using sequence search tools including BLAST, https://blast.ncbi.nlm.nih.gov/Blast.cgi

2) compare environmental NTM sequences to clinical sequences, using multiple sequence alignment tools, including CLUSTAL, https://www.ebi.ac.uk/Tools/msa/clustalo/

3) identify and explain what a single-nucleotide polymorphism (SNP) is when comparing two gene sequences.

4) navigate online scientific tools to translate DNA into polypeptide sequences, using tools such as GeneMarkS http://exon.gatech.edu/GeneMark/genemarks.cgi , and to compare and contrast wild-type and variant polypeptide sequences using CLUSTAL, https://www.ebi.ac.uk/Tools/msa/clustalo/.

5) determine whether your given SNP will result in sense, missense, or nonsense mutations in the resulting amino acid sequence.

6) hypothesize whether a SNP will impact antibiotic activity, using information derived from protein databases including the Protein Data Bank (PDB), https://www.rcsb.org



Lesson:

Part 1
A patient has just been diagnosed with a mycobacterial infection. In order to treat this effectively, you need to know the species. You decide single gene sequencing is adequate for this.

Please use BLAST, to search DNA databases to determine the species of mycobacteria, based on the DNA sequence.

>unknown species atgcgcggcaacactggaggaccgatcttggcagtctctcgccagactaagaccgataac gcaactactaactccgtacctggggcccctagccgactttccttcgccaagctgcgtgaa ccgcttgcggttcccggcctgctcgatgtgcagacggagtcctttgaatggctggttgga tcgccgcgctggcgtgaggttgcgactgcacgcggtgaggtgaacccgaccggcggcctt gaggagatcctcacggagctttcgccgatcgaagacttctccggctcgatgtcgctgtcg ttcagcgacccgcgcttcgacgaggtcaaggcgcccgtcgacgagtgcaaagacaaggac atgacgtacgcggccccgctgttcgtcacggccgagttcatcaacaacaacaccggtgag atcaagagccagacggtcttcatgggtgatttcccgatgatgaccgatatgggcaccttc atcatcaacggcaccgagcgcgtggtcgtgtcgcagctcgtccgttcgccgggtgtctac ttcgacgagagcatcgacaagtcgaccgagaagaccctgcatagcgtcaaggtcatcccc ggccgcggtgcctggctcgagttcgacgtcgacaagcgcgacaccgtcggcgtccgcatc gaccgcaagcgccgccagccggtcaccgtgctgctgaaggcgctgggctggaccaacgag cagatcgtcgagcgcttcgggttctccgagatcatgatgggcaccctggagaaggacaac atcgccggtcccgacgaggcgttgctggacatctaccgcaagctgcgcccgggcgagccg ccgaccaaggagtcggcgcaggccctgctggagaacctgttcttcaaggagaagcgttac gacctggcccgcgtgggtcggtacaaggtgaacaagaagctgggcctgggcggcaccaat ccggctcaggtgaccaccaccaccctcaccgaggaagacgtcgtcgccaccatcgagtac ctggtgcgcctgcacgagggccagaccacgatgaccgcccccggtggcgtcgaggtgccg gtggatgtggacgacatcgaccacttcggtaaccgtcgcctgcgtaccgtcggcgagctg attcagaaccagatccgggtcggcctgtcccgtatggagcgcgtcgtgcgtgagcgcatg accacgcaggacgtcgaggcgatcaccccgcagaccctgatcaacatccgtcccgtcgtg gcggcgatcaaggagttcttcggaaccagccagctgtcgcagttcatggaccagaacaac ccgctgtcgggcctgacccacaagcgtcgt


limit search to:
Mycobacteria (taxid:85007)
and
Sequences from Type Material

Part 2

You asked the patients to sample their home, to see if any swabs came up NTM culture positive. Some locations came up positive, which you then sequenced using the Sanger method. BLAST each of the following sequences to determine potential species.

>Environmental_Location_1
1 acgacgcttg tgggtcagac ccgacagcgg gttgttctgg tccatgaact gcgacagctg 61 gctggttccg aagaactcct tgatcgccgc caccacggga cggatgttga tcagggtctg 121 cggggtgatc gcctcgacgt cctgcgtggt catgcgctcg cgcacgacac gttccatacg 181 ggacaggccg acccggatct ggttctggat cagctcgccg acggtacgca ggcgacggtt 241 accgaagtgg tcgatgtcgt ccacgtccac cggcacctcg acgccgccgg gggcggtcat 301 cgtggtctgg ccctcgtgca ggcgcaccag atactcgatg gtggcgacga catcttcctc 361 ggtgagggtg gtggtggtca cctgagccgg attggtgccg cccaggccca gcttcttgtt 421 caccttgtac cgacccacgc gggccaggtc gtaacgcttc tccttgaaga acaggttctc 481 cagcagggcc tgcgccgact ccttggtcgg cggctcgccc gggcgcagct tgcggtagat 541 gtccagcaac gcctcgtcgg gaccggcgat gttgtccttc tccagggtac ccatcatgat 601 ctcggagaac ccgaagcgct cgacgatctg ctcgttggtc cagcccagcg ccttcagcag 661 cacggtgacc ggctgacggc gcttgcggtc gatgcggacg ccgacggtgt cgcgcttgtc 721 gacgtcgaac tcgagccagg caccgcggcc ggggatgacc ttgacgctgt gcagggtctt 781 ctcggtcgac ttgtcgatgc tctcgtcgaa gtagacaccc ggcgaacgga cgagctgcga 841 gacgaccacg cgctcggtgc cgttgatgat gaaggtgccc atgtcggtca tcatcgggaa 901 atcacccatg aagaccgtct ggctcttgat ctcaccggtg ttgttgttga tgaactcggc 961 cgtgacgaac agcggggccg cgtacgtcat gtccttgtct ttgcactcgt cgacgggcgc 1021 cttgacctcg tcgaagcgcg ggtcgctgaa cgacagcgac atcgagccgg agaagtcttc 1081 gatcggcgaa agctccgtga ggatctcctc aaggccgccg gtcgggttca cctcaccgcg 1141 tgcagtcgca acctcacgcc agcgcggcga tccaaccagc cattcaaagg actccgtctg 1201 cacatcgagc aggccgggaa ccgcaagcgg ttcacgcagc ttggcgaagg aaagtcggct 1261 aggggcccca ggtacggagt tagtagttgc gttatcggtc ttagtctggc gagagactgc 1321 caagatcggt cctccagtgt tgccgcgcat

>Environmental_Location_2
1 acgacgcttg tgggtcagac ccgacagcgg gttgttctgg tccatgaact gcgacagctg 61 gctggttccg aagaactcct tgatcgccgc cacgacggga cggatgttga tcagggtctg 121 cggagtgatc gcctcgacgt cctgagtggt catgcgctcg cgcacgacgc gctccatacg 181 cgacaggccg acccggatct ggttctggat cagctcgccg acagtacgca gacgacggtt 241 accgaagtga tcgatgtcgt cgacctcgac ggggacctcg aggccgccgg gggcggtcat 301 cgtggtctgg ccctcgtgca gacgcaccag gtactcgatg gtggcgacga cgtcttcctc 361 ggtgagcgtg gtggcagtca ccagagccgg gttggcaccg ccaagaccca gcttcttgtt 421 caccttgtac cgacccacgc gggccaggtc gtaacgcttc tccttgaaga acaggttctc 481 cagcagggcc tgcgcggact ccttggtcgg cggctcgccc ggacgcagct tgcggtagat 541 gtccagcagc gcctcgtcgg gaccggcgat gttgtccttc tccagggtcc ccatcatgat 601 ctcggagaac ccgaaacgct cgacgatctg ctcgttggtc cagccgagtg ccttcagcag 661 cacggtgacg ggctgacgac gcttgcgatc gatgcgcaca cccacggtgt cgcgcttgtc 721 gacatcgaac tcgagccatg caccacggcc ggggatgacc ttgacgctgt gcagggtctt 781 ctcggtcgac ttatcgatgt tctcgtcgaa gtagacaccc ggcgaacgga cgagctgcga 841 cacgaccacg cgctcggtgc cgttgatgat gaaggtgccc atctcggtca tcatcgggaa 901 atcacccatg aagaccgtct ggctcttgat ctcgccggtg ttgttgttga tgaactcggc 961 cgtgacgaac agcggagccg cgtacgtcat gtccttgtct ttgcactcgt cgacgggcgc 1021 cttgacctcg tcgaagcgcg ggtcgctgaa agacagcgac atcgagcccg agaagtcctc 1081 gatcggcgaa agctccgtga ggatctcctc aaggccgccg gtcgggttca cctcaccgcg 1141 agcagttgca acttcacgcc accggggtga gccaaccagc cattcaaagg aatccgtctg 1201 cacatccagc aggccgggaa ccgcgagcgg ttcacgcagc ttggcgaagg aaagtcggct 1261 aggggcccca ggtacggagt tagtagttgc gttatcggtc ttagtctggc gagagactgc 1321 caagatcggt cctccagtgt tgccgcgcat

>Environmental_Location_3
1 atcttggcag tctctagcca gagcacagcg aacgctaaca ccaataactc cgtcccagga 61 gcaccaaacc gagtttcctt tgccaagctc cgcgaaccgc ttgaggttcc ggggctgctc 121 gacgttcaga cggattcttt cgactggctc gtgggttcgg atgaatggcg gcagaaggcc 181 gtcgatcgcg gtgagaccga ccccaagggc ggcctcgaag aggtgctcga agagctctcg 241 ccgatcgagg atttctcggg ctcgatgtcg ctgagcttct ccgacccgcg cttcgacgag 301 gtcaaggcgc cggtcgacga gtgcaaagac aaggacatga cgtacgcggc cccgctgttc 361 gtcacggccg agttcatcaa caacaacacc ggtgagatca agagccagac ggtcttcatg 421 ggtgacttcc cgatgatgac cgagaagggc accttcatca tcaacggcac cgagcgtgtc 481 gtggtgagcc agctcgtgcg ctctcccggt gtgtacttcg acgagagcat cgacaagtcc 541 accgagaaga cgctgcacag cgtcaaggtg atccccggcc gcggtgcgtg gctcgagttc 601 gacgtcgaca agcgcgacac cgtcggtgtg cgtatcgacc gcaagcgccg ccagccggtc 661 accgtgctgc tcaaggcgct cggttggacc aacgagcaga tcaccgagcg cttcggcttc 721 tccgagatca tgatgggcac cctggagaag gacagcaccg ccggtcccga cgaggcgctg 781 ctggacatct accgcaagct gcgtccgggc gagccgccga ccaaggagtc cgcgcagacc 841 ctgctggaga acctgttctt caaggagaag cgctacgacc tggcccgcgt cggccgctac 901 aaggtcaaca agaagctggg cctgaacgcc ggccagccga tcacgtcgtc gaccctcacc 961 gaggaagacg tcgtcgccac catcgagtac ctggtgcgcc tgcacgaggg ccagaccacg 1021 atgaccgtcc ccggcggcgt cgaggtcccg gtcgaggtgg acgacatcga ccacttcggt 1081 aaccgtcgtc tgcgtaccgt gggtgagctg atccagaacc agatccgggt cggcctgtcc 1141 cgcatggagc gcgtcgtgcg tgagcgcatg accacccagg acgtcgaggc gatcacgccg 1201 cagaccctga tcaacatccg tcccgtcgtg gcggcgatca aggagttctt cggcaccagc 1261 cagctgtcgc agttcatgga ccagaacaac ccgctgtcgg gtctgaccca caagcgtcg

>Environmental_Location_4
1 atcttggcag tctctagcca gagcaagtca gcgaacgcta tcaccaataa ctccgtccca 61 ggagcaccga accgagtttc atttgccaag ctccgtgaac cgcttgaggt tccggggcta 121 ctcgacgttc agaccgattc cttcgactgg ctcgtcggtg cggatgaatg gcggcagaag 181 gccgtcgatc gcggcgagac cgaccccaag ggcggcctcg aagaggtgct cgaagagctc 241 tccccgatcg aggatttctc gggctcgatg tcgctgagct tctccgaccc gcgcttcgac 301 gaggtcaaag ctccggtcga cgagtgcaaa gacaaggaca tgacgtacgc agccccgctg 361 ttcgtcacgg ccgagttcat caacaacaac accggtgaga tcaagagcca gacggtcttc 421 atgggtgact tcccgatgat gaccgagaag ggcaccttca tcatcaacgg caccgagcgt 481 gtcgtggtga gccagctcgt gcgctctccc ggtgtgtact tcgacgagag catcgacaag 541 tccaccgaga agacgctgca cagcgtcaag gtgatccccg gccgcggtgc gtggctggag 601 ttcgacgtcg acaagcgcga caccgtcggt gtgcgtatcg accgcaagcg tcgtcagccg 661 gtcaccgtgc tgctgaaggc gctgggctgg accaacgagc agatcaccga gcgcttcggc 721 ttctccgaga tcatgatggg caccctggag aaggacagca ccgccggtcc cgacgaggcg 781 ctgctggaca tctaccgcaa gctgcgtccg ggcgagccgc cgaccaagga gtccgcgcag 841 accctgctgg agaacctgtt cttcaaggag aagcgctacg acctggcccg cgtgggccgc 901 tacaaggtca acaagaagct gggcctgaac gccggccagc cgatcacgtc gtcgactctg 961 accgaggaag acgtcgtcgc caccatcgag tacctggtgc gcctgcacga gggccagacc 1021 acgatgaccg tccccggcgg cgtcgaggtc ccggtcgagg tggacgacat cgaccacttc 1081 ggtaaccgtc gtctgcgcac cgtgggcgag ctgatccaga accagatccg cgtcggcctg 1141 tcccgcatgg agcgcgtcgt gcgtgagcgc atgaccaccc aggacgtcga ggcgatcacc 1201 ccgcagaccc tgatcaacat ccgtcccgtc gtggcggcga tcaaggagtt cttcggaacg 1261 tcgcagctgt cgcagttcat ggatcagaac aacccgctgt cgggtctgac ccacaagcgt 1321 cgt

>Environmental_Location_5
acgacgcttg tgggtcaggc ccgacagcgg gttgttctgg tccatgaact gcgacagctg 61 gctggttccg aagaactcct tgatcgccgc cacgacggga cggatgttga tcagggtctg 121 cggggtgatc gcctcgacgt cctgcgtggt catgcgctca cgcacgacgc gctccatacg 181 ggacaggccg acccggatct ggttctgaat cagctcgccg acggtacgca ggcgacggtt 241 accgaagtgg tcgatgtcgt ccacatccac cggcacctcg acgccaccgg gggcggtcat 301 cgtggtctgg ccctcgtgca ggcgcaccag gtactcgatg gtggcgacga cgtcttcctc 361 ggtgagggtg gtggtggtca cctgagccgg attggtgccg cccaggccca gcttcttgtt 421 caccttgtac cgacccacgc gggccaggtc gtaacgcttc tccttgaaga acaggttctc 481 cagcagggcc tgcgccgact ccttggtcgg cggctcgccc gggcgcagct tgcggtagat 541 gtccagcaac gcctcgtcgg gaccggcgat gttgtccttc tccagggtgc ccatcatgat 601 ctcggagaac ccgaagcgct cgacgatctg ctcgttggtc cagcccagcg ccttcagcag 661 cacggtgacc ggctggcggc gcttgcggtc gatgcggacg ccgacggtgt cgcgcttgtc 721 gacgtcgaac tcgagccagg caccgcggcc ggggatgacc ttgacgctat gcagggtctt 781 ctcggtcgac ttgtcgatgc tctcgtcgaa gtagacaccc ggcgaacgga cgagctgcga 841 cacgaccacg cgctcggtgc cgttgatgat gaaggtgccc atatcggtca tcatcgggaa 901 atcacccatg aagaccgtct ggctcttgat ctcaccggtg ttgttgttga tgaactcggc 961 cgtgacgaac agcggggccg cgtacgtcat gtccttgtct ttgcactcgt cgacgggcgc 1021 cttgacctcg tcgaagcgcg ggtcgctgaa cgacagcgac atcgagccgg agaagtcttc 1081 gatcggcgaa agctccgtga ggatctcctc aaggccgccg gtcgggttca cctcaccgcg 1141 tgcagtcgca acctcacgcc agcgcggcga tccaaccagc cattcaaagg actccgtctg 1201 cacatcgagc aggccgggaa ccgcaagcgg ttcacgcagc ttggcgaagg aaagtcggct 1261 aggggcccca ggtacggagt tagtagttgc gttatcggtc ttagtctggc gagagactgc 1321 caagatcggt cctccagtgt tgccgcgcat

Based on this information, which location warrants further investigation, by whole genome sequencing?

Part 3
You are interested in variations in protein sequences of these strains as well, corresponding to the RNA polymerase beta subunit. Translate the environmental isolate sequences to amino acid sequences, using GeneMarkS.

Part 4
Perform a Multiple Sequence Alignment on the translated protein sequences, to examine amino acid variation of the rpoB protein, in the environmental NTM, using CLUSTAL.

Part 5
In addition to being an essential protein involve in RNA transcription, the RNA polymerase beta protein is an important drug target, in both NTM and M. tuberculosis.

Search the Protein Data Bank (PDB), using the Environmental Isolate 5 sequence, to see if protein structures with similar sequences have been determined.