5.1 Mapping of Genome: Genetic and Physical
The genome of an organism is its complete genetic information, located in the nucleoid region and plasmids for prokaryotes, and in the chromosomes and organelles for eukaryotes.
Genome mapping is crucial for comparing and understanding conserved genes or DNA sequences and relationships between organisms.
Maps of genome can be created through genetic and physical mapping techniques, with the former based on recombination frequencies and the latter on direct measurement of DNA.
A genetic map of a prokaryotic/organelle genome includes genetic loci arranged based on genetic distances, while a physical map includes the actual distance between markers.
The complete DNA sequence of each sample can be compared to answer questions about similarities and differences between DNA samples.
======
5.1.1 Genetic Mapping
Genetic mapping provides an estimate of distances between genetic loci associated with known phenotypes.
A genetic map is created through crossover analysis, with one map unit equaling one percent of observed cross-over (recombinants).
The unit centimorgan (cM) denotes genetic distance between genes based on offspring phenotype frequency.
Decreased recombination frequency indicates a lesser distance between genetic loci.
Limited genetic loci responsible for phenotypes results in sparsely populated genetic maps, limiting their utility for fine mapping of new phenotypes to the genome.
======
5.1.2 Physical Mapping
Physical mapping of the genome is an alternative to genetic mapping, where specific locations on the genome are identified using DNA-based map features.
Restriction Fragment Length Polymorphism (RFLP) is a mapping technique that uses DNA endonuclease enzymes called restriction enzymes to cut DNA at a specific base sequence, resulting in fragmentation of DNA.
Simple Sequence Length Polymorphism (SSLP) is a physical mapping technique that uses arrays of different repeat sequences, such as microsatellites and minisatellites, which can be used for physical mapping of the genome based on its length variant.
Sequence Tagged Sites (STS) are unique DNA sequences (200-500 bp) that occur once in a genome and whose unique location in the genome is known. They can be easily demonstrated using polymerase chain reaction (PCR) and serve as a useful location landmark in the creation of physical maps of a genome.
Restriction maps can be combined with STS maps to improve the resolution and effectiveness of association mapping, resulting in a high-resolution physical map of the genome.
======
5.2 High-Throughput DNA Sequencing
High-throughput DNA sequencing is a process that allows for the complete DNA sequence of an organism to be obtained.
Previously, sequencing complete genomes was expensive, so techniques were used to narrow down the region of interest for sequencing.
Advances in sequencing technology now make it possible to sequence the complete DNA of any organism, although initial cost was prohibitive.
DNA sequencing technology has evolved significantly over the past three decades.
The process is now more affordable, making it possible to sequence the DNA of eukaryotic organisms, including human beings.
======
5.2.1 First generation DNA sequencing technology
The first generation DNA sequencing technology was a multi-step procedure involving separation of chromosome on agarose gel using pulsefield gel electrophoresis (PFGE), restriction digestion, ligation to high-capacity cloning vectors, physical mapping, and sub-cloning into sequencing plasmid.
The DNA segments from these clones were sequenced using the first generation DNA sequencing technique called sequencing by chain termination. This method creates a ladder of single stranded DNA, with each fragment terminating at a specific base identified by a specific fluorescence tag.
The fragments are then separated based on their size using capillary gel electrophoresis and a fluorescence detector reads the fluorescence of the single stranded DNA passing in front of it. The output is a chromatogram of fluorescence, where each peak represents a base position and the color of the peak indicates which DNA base is present.
Each capillary in a single run can sequence about 800-1000 base-long DNA and a single DNA sequencer can have up to 96 capillaries operating in parallel, providing up to 96000 bases per run, per machine.
However, this technology is time consuming and labor intensive, leading to a high cost of genome sequencing. The next generation of DNA sequencing technology was developed to overcome this challenge.
======
5.2.2 Next generation (second) DNA sequencing technology
Next generation DNA sequencing uses a massively parallel approach, allowing millions of sequencing reactions to occur at once.
This technology eliminates the need for time-consuming cloning and subcloning steps, reducing the cost of genome sequencing projects.
Illumina Sequencing Technology is a popular example of next generation sequencing, which uses a flow cell for DNA fragment amplification through a process called Bridge PCR Amplification.
Sequencing is carried out using fluorescently tagged dNTPs, with each base attached to a different fluorescent tag. The fluorescence of the incorporated base reveals the sequence of the DNA fragment.
Next generation sequencing technology is capable of sequencing millions of DNA fragments in parallel, but generates shorter sequences (75 to 300 bases long) compared to first generation technology.
======
5.2.3 Some recent advances in DNA sequencing technology
Recent advance in DNA sequencing technology involves the use of nanopores for DNA sequencing, also known as third generation sequencing technology.
The process involves measuring the pattern change in electric current across a nanopore when a specific base in single stranded DNA passes through.
A DNA helicase captures and unwinds a double stranded DNA, pushing one strand through the nanopore, while an ionic current is maintained and monitored continuously.
The porins like molecule interacts differentially with single stranded DNA bases, disrupting the ionic current in a specific way for each base, allowing identification of corresponding bases.
Advantages of nanopore based sequencing technology include rapid and simple sample processing, real-time sequencing results, and production of very long sequencing reads at a relatively low cost.
======
5.3 Other Genome-related Technologies
Genome sequencing has various forms, including whole genome sequencing (WGS), targeted sequencing, and metagenomics.
WGS involves determining the DNA sequence of an entire genome at a single time.
Targeted sequencing focuses on specific regions of interest in a genome.
Metagenomics is the genomic analysis of microbial communities found in different environments.
These different sequencing methods cater to various research and application needs in genomics.
======
5.3.1 Whole Genome Sequencing (WGS)
Whole Genome Sequencing (WGS) is a method of determining the DNA sequence of an organism’s entire genome.
It was first used to sequence the genome of Haemophilus influenzae and has since been used to understand genetic regulation and identify genetic disorders.
The human genome sequencing started in 1990 and was completed in 2003, but with advancements in sequencing technologies, entire genomes can now be sequenced in a matter of days.
WGS can help create personalized plans to treat human diseases and aid physicians in selecting the best chemotherapy by reading variations in cancer cells.
There are two types of WGS: reference-based genome sequencing, which uses a reference genome to assemble individual sequencing reads, and de novo genome sequencing, which does not use a reference.
======
5.3.2 Targeted sequencing
Targeted sequencing is a project that focuses on sequencing and analyzing selected genes or genomic regions of a genome to identify variations like mutations, insertions, or deletions.
It uses a reference genome sequence to assemble the newly sequenced genome and can be used to compare smaller sets of genomic regions among different genomes.
Clinical exome sequencing, a type of targeted sequencing, covers genes known to be disease-associated and is cost-effective for genetic diagnostic tests compared to whole genome sequencing.
ChipSeq sequencing projects map genome-wide DNA binding sites of a transcription regulator to understand the biology of an organism at the genome level.
RNASeq projects study the global gene expression profile (transcriptome) of an organism, tissue, or sample by sequencing cDNA and mapping sequencing reads to various genes on the genome.
======
5.3.3 Metagenomics: Sequencing of DNA or CDNA present in a microbial
Metagenomics is the study of total genetic material obtained directly from microorganisms in a specific environment.
It has applications in medical microbiology, agriculture, and environmental microbiology, aiding in studying microbial community diversity and environmental changes.
Metagenomics can help identify novel genes or enzymes with significant industrial applications, such as enzymes resistant and functional at high temperatures from extreme environments.
The approach provides insights into the genomes of various microbes, like gut, throat, and toilet seat microbes, contributing to understanding their correlations.
Studying diverse virus genomes in metagenomics samples can provide information on virus-host interaction, epidemiology, and evolution; however, multiple genomes present challenges in data analysis, requiring specialized computing algorithms.
======
5.4 Genome Engineering
Genome engineering is a technology used to modify an organism’s genome.
It is done to introduce or remove one or more genes to provide new functionality or modify/remove existing functionality.
Transposon or jumping genes are used in one approach to inactivate or delete genes in a genome.
Another approach is genome editing without causing any additional, unwarranted changes.
The main goal is to precisely modify the genome to suit specific needs or applications.
======
5.4.1 Knock-out and knock-in of a gene by transposon insertion
Transposons, also known as “jumping genes,” are DNA sequences that can move from one location to another on the genome, found in both prokaryotes and eukaryotes.
Transposons typically move via a “cut and paste” mechanism, where they excise from one location and integrate elsewhere in the genome, potentially causing the insertional inactivation of genes at the site of insertion.
This behavior can be exploited to knock-out an existing gene by engineering the transposon to recognize a specific sequence of a target gene, disrupting the coding frame and preventing the production of the original transcript.
Transposons can also alter the genetic locus of interest, adding DNA sequences that were not previously present. This property can be used to knock-in a functional gene, contributing to the generation of animal/plant models for understanding molecular basis and developing new drugs for diseases.
Genome engineering through transposition is a valuable tool in molecular biology and biotechnology, enabling the manipulation of genes and genetic material in various organisms.
======
5.4.2 Genome editing using Clustered Regularly Interspaced Short
Genome editing is a set of techniques used to change the DNA of an organism by adding, removing, or altering specific DNA sequences.
CRISPR-Cas9 is a genome editing technology adapted from a naturally occurring system in bacteria, used to memorize the DNA signature of past viral attacks.
In CRISPR-Cas9, a guide RNA positions the Cas9 protein complex at a specific region of genomic DNA for inducing double-strand break, followed by gene knock-out or gene editing through homologous recombination.
The CRISPR-Cas9 genetic scissors, discovered by Emmanuelle Charpentier and Jennifer A. Doudna, is a gene editing method that can modify the DNA of living organisms with high precision.
Bacteria use CRISPR-Cas systems to recognize and cut the DNA of invading viruses into inactive segments, contributing to immune ‘memory’. Charpentier and Doudna reprogrammed the CRISPR system to direct it to cut DNA at desired target sequences using Cas9 protein.
======
5.5 Structural, Functional and Comparative Genomics
Computational genomics involves the use of high performance computing clusters and workstations to analyze genomic data.
It aids in understanding genome functions through computational and statistical analysis of DNA and RNA sequences and other experimental data.
A key aspect is identifying common genes across different organisms through homology search and gene annotation.
Sequence comparisons can be carried out using tools like BLAST, ClustalW, Phylip to understand evolutionary relationships.
Advancements in structural genomics, driven by powerful computer graphics processing units and GP-GPU/CUDA programming, provide 3D information of proteins encoded by the genome.
The Department of Biotechnology, Government of India, has established three National Genomics Core facilities to support genomic projects for Indian scientists and entrepreneurs.
======
5.5.1 Structural genomics
Structural genomics has two interpretations: 3D structure of proteins encoded by a genome and physical nature of the whole genome.
Originally, structural genomics focused on finding new protein folds or structures in newly sequenced genomic DNA.
The expanded view of structural genomics includes studying structural organization of DNA regions in chromosomes and nucleosome status of the genome.
This expanded view also covers large structural changes in genome organization of related species.
An example of this is observing the redistribution of genes common between mice and humans across different chromosomes of the two species.
======
5.5.2 Functional genomics
Functional genomics is a field that aims to understand how the information encoded in the genome is executed physiologically.
It uses various genomics-based tools and techniques to achieve this, such as:
RNAseq: used to investigate the transcriptome.
ChipSeq: used to detect and map DNA-protein interactions.
Metagenomics: used to study the genetic material from environmental samples.
These tools and techniques help in understanding both physiological and pathological functions related to the state of a cell.
The focus is on functional aspects, particularly in the context of microbial communities.
======
5.5.3 Comparative genomics
Comparative genomics involves comparing genes and genomes of multiple species or individuals of the same species.
It aids in genome annotation of newly sequenced genomes by comparing them with well-known genomes.
It helps identify common genes (core genome) and unique genes specific to a species, which may influence unique behaviors or functions.
Comparative genomics can determine the presence or absence of functional molecules in a genome.
It serves as a foundation for genome-based taxonomy and phylogenetic lineage studies.
======
5.6 Protein Engineering
Protein engineering involves creating and producing engineered proteins with additional or extended properties.
Engineered proteins have greater stability when exposed to harsh conditions like elevated temperature, change in pH, presence of salts or organic solvents.
This process can lead to the development of novel reagents for research, diagnostics, and therapeutics.
Protein engineering is an advanced application of recombinant DNA (rDNA) technology.
The goal is to produce proteins with improved properties compared to the original ones.
======
5.6.1 Applications of protein engineering
Protein engineering can be used to identify critical amino acids and modify their codon sequences to create point mutations, leading to the mutation or replacement of specific amino acids that modulate desired activities.
An example of this is the engineering of subtilisin, a proteolytic enzyme used in detergents, to improve its stability in the presence of bleach by replacing the codon for Met 222 with that of Ala through site-directed mutagenesis.
Another application is the use of a 6-His-tag, a peptide with six consecutive His amino acids, fused with the gene sequence of a recombinant protein of interest for its purification through affinity chromatography.
Protein engineering can also be used to track the location of proteins in a cell by fusing the gene encoding for green fluorescent protein (GFP) with the DNA sequence of interest, resulting in a green fluorescent molecule when exposed to UV light.
Recombinant immunotoxins, rationally engineered protein agents, can be generated by fusing the DNA coding region of an antibody that recognizes a cancer cell marker with the DNA code of a toxin, combining the selectivity of the antibody with the cell-killing potency of the toxin.
======