Each nucleated cell in a multicellular organism contains copies of the same DNA. Similarly, all cells in two pure bacterial cultures inoculated from the same starting colony contain the same DNA, with the exception of changes that arise from spontaneous mutations. If each cell in a multicellular organism has the same DNA, then how is it that cells in different parts of the organism’s body exhibit different characteristics? Similarly, how is it that the same bacterial cells within two pure cultures exposed to different environmental conditions can exhibit different phenotypes? In both cases, each genetically identical cell does not turn on, or express, the same set of genes. Only a subset of proteins in a cell at a given time is expressed.

Genomic DNA contains both structural genes, which encode products that serve as cellular structures or enzymes, and regulatory genes, which encode products that regulate gene expression. The expression of a gene is a highly regulated process. Whereas regulating gene expression in multicellular organisms allows for cellular differentiation, in single-celled organisms like prokaryotes, it primarily ensures that a cell’s resources are not wasted making proteins that the cell does not need at that time.

Elucidating the mechanisms controlling gene expression is important to the understanding of human health. Malfunctions in this process in humans lead to the development of cancer and other diseases. Understanding the interaction between the gene expression of a pathogen and that of its human host is important for the understanding of a particular infectious disease. Gene regulation involves a complex web of interactions within a given cell among signals from the cell’s environment, signaling molecules within the cell, and the cell’s DNA. These interactions lead to the expression of some genes and the suppression of others, depending on circumstances.

Prokaryotes and eukaryotes share some similarities in their mechanisms to regulate gene expression; however, gene expression in eukaryotes is more complicated because of the temporal and spatial separation between the processes of transcription and translation. Thus, although most regulation of gene expression occurs through transcriptional control in prokaryotes, regulation of gene expression in eukaryotes occurs at the transcriptional level and post-transcriptionally (after the primary transcript has been made).

Back to the Top

13.1 Prokaryotic Gene Regulation

In bacteria and archaea, structural proteins with related functions are usually encoded together within the genome in a block called an operon and are transcribed together under the control of a single promoter, resulting in the formation of a polycistronic transcript (Figure 13.1). In this way, regulation of the transcription of all of the structural genes encoding the enzymes that catalyze the many steps in a single biochemical pathway can be controlled simultaneously, because they will either all be needed at the same time, or none will be needed. For example, in E. coli, all of the structural genes that encode enzymes needed to use lactose as an energy source are encoded next to each other in the lactose (or lac) operon under the control of a single promoter, the lac promoter. French scientists François Jacob (1920–2013) and Jacques Monod at the Pasteur Institute were the first to show the organization of bacterial genes into operons, through their studies on the lac operon of E. coli. For this work, they won the Nobel Prize in Physiology or Medicine in 1965.

Diagram of an operon. At one end is a regulatory gene; the operon proper begins further down. The operon is composed of a promoter, an operator, and structural genes (in this case 4, labeled A – D). Transcription produces a single mRNA strand that contains all the structural genes. Translation of this single mRNA produces 4 different proteins (A, B, C, D).


Figure 13.1 Schematic Representation of an Operon. In prokaryotes, structural genes of related function are often organized together on the genome and transcribed together under the control of a single promoter. The operon’s regulatory region includes both the promoter and the operator. If a repressor binds to the operator, then the structural genes will not be transcribed. Alternatively, activators may bind to the regulatory region, enhancing transcription.

Figure from: Parker, N., et. al. (2019) Microbiology. Openstax

Each operon includes DNA sequences that influence its own transcription; these are located in a region called the regulatory region. The regulatory region includes the promoter and the region surrounding the promoter, to which transcription factors, proteins encoded by regulatory genes, can bind. Transcription factors influence the binding of RNA polymerase to the promoter and allow its progression to transcribe structural genes. A repressor is a transcription factor that suppresses transcription of a gene in response to an external stimulus by binding to a DNA sequence within the regulatory region called the operator, which is located between the RNA polymerase binding site of the promoter and the transcriptional start site of the first structural gene. Repressor binding physically blocks RNA polymerase from transcribing structural genes. Conversely, an activator is a transcription factor that increases the transcription of a gene in response to an external stimulus by facilitating RNA polymerase binding to the promoter. An inducer, a third type of regulatory molecule, is a small molecule that either activates or represses transcription by interacting with a repressor or an activator.

In prokaryotes, there are examples of operons whose gene products are required rather consistently and whose expression, therefore, is unregulated. Such operons are constitutively expressed, meaning they are transcribed and translated continuously to provide the cell with constant intermediate levels of the protein products. Such genes encode enzymes involved in housekeeping functions required for cellular maintenance, including DNA replication, repair, and expression, as well as enzymes involved in core metabolism. In contrast, there are other prokaryotic operons that are expressed only when needed and are regulated by repressors, activators, and inducers.

Prokaryotic operons are commonly controlled by the binding of repressors to operator regions, thereby preventing the transcription of the structural genes. Such operons are classified as either repressible operons or inducible operons. Repressible operons, like the tryptophan (trp) operon, typically contain genes encoding enzymes required for a biosynthetic pathway. As long as the product of the pathway, like tryptophan, continues to be required by the cell, a repressible operon will continue to be expressed. However, when the product of the biosynthetic pathway begins to accumulate in the cell, removing the need for the cell to continue to make more, the expression of the operon is repressed. Conversely, inducible operons, like the lac operon of E. coli, often contain genes encoding enzymes in a pathway involved in the metabolism of a specific substrate like lactose. These enzymes are only required when that substrate is available, thus expression of the operons is typically induced only in the presence of the substrate.

The trp Operon: A Repressible Operon

E. coli can synthesize tryptophan using enzymes that are encoded by five structural genes located next to each other in the trp operon (Figure 13.2). When environmental tryptophan is low, the operon is turned on. This means that transcription is initiated, the genes are expressed, and tryptophan is synthesized. However, if tryptophan is present in the environment, the trp operon is turned off. Transcription does not occur and tryptophan is not synthesized.

When tryptophan is not present in the cell, the repressor by itself does not bind to the operator; therefore, the operon is active and tryptophan is synthesized. However, when tryptophan accumulates in the cell, two tryptophan molecules bind to the trp repressor molecule, which changes its shape, allowing it to bind to the trp operator. This binding of the active form of the trp repressor to the operator blocks RNA polymerase from transcribing the structural genes, stopping expression of the operon. Thus, the actual product of the biosynthetic pathway controlled by the operon regulates the expression of the operon.

Diagram of the trp operon. The top image shows the operon in the absence of tryptophan. The trp repressor dissociates from the operator and RNA synthesis proceeds. RNA polymerase is bound to the promoter and an arrow indicates that transcription will occur. The repressor is not bound ot anything. The bottom image shows the operon in the presence of tryprophan. When tryptophan is present, the trp repressor binds to the operator and RNA synthesis is blocked. Tryptophan is shown bound to the repressor which is bound to the operator. RNA polymerase is bound to the promoter but is blocked from moving forward by the repressor.Figure 13.2 The Trp Operon. The five structural genes needed to synthesize tryptophan in E. coli are located next to each other in the trp operon. When tryptophan is absent, the repressor protein does not bind to the operator, and the genes are transcribed. When tryptophan is plentiful, tryptophan binds the repressor protein at the operator sequence. This physically blocks the RNA polymerase from transcribing the tryptophan biosynthesis genes.

Figure from: Parker, N., et. al. (2019) Microbiology. Openstax

The Lac Operon: An Inducible Operon

The lac operon is an example of an inducible operon that is also subject to activation in the absence of glucose. The lac operon encodes three structural genes, lacZ, lacY, and lacA, necessary to acquire and process the disaccharide lactose from the environment (Fig 13.3A).

Figure 13.3 Biological Activity of the lac Operon. (A) Schematic representation of the lac operon in E. coli. The lac operon has three structural genes, lacZ, lacY, and lacA that encode for β-galactosidase, permease, and galactoside acetyltransferase, respectively. The promoter (p) and operator (o) sequences that control the expression of the operon are shown. Upstream of the lac operon is the lac repressor gene, lacI, controlled by the lacI promoter (p). (B) Shows the lac repressor inhibition of the lac operon gene expression in the absence of lactose. The lac repressor binds with the operator sequence of the operon and prevents the RNA polymerase enzyme which is bound to the promoter (p) from initiating transcription. (C) In the presence of lactose, some of the lactose is converted into allolactose, which binds and inhibits the activity of the lac repressor. The lac repressor-allolactose complex cannot bind with the operator region of the operon, freeing the RNA polymerase and causing the initiation of transcription. Expression of the lac operon genes enables the breakdown and utilization of lactose as a food source within the organism.

Figure modified from: Esmaeili, A., et. al. (2015) BMC Bioinformatics 16:311

The lacZ gene encodes the β-galactosidase (β-gal) enzyme responsible for the hydrolysis of lactose into simple sugars glucose and galactose (Fig. 13.4A). The β-gal enzyme can also mediate the breakdown of the alternate substrate 5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside (Xgal) (Fig. 13.4B). The breakdown product, 5-bromo-4-chloro-3-hydroxyindole – 1, spontaneously dimerizes to form the intensely blue blue product, 5,5′-dibromo-4,4′-dichloro-indigo – 2. Thus, Xgal has been a valuable research tool, not only in the study of the enzymatic activity of β-gal, but also in the development of the commonly used blue-white DNA cloning system that utilizes the β-gal enzyme as a marker in molecular cloning experiments.

The lac operon contains two more genes, in addition to lacZ (Fig. 13.3A). The lacY gene encodes a permease that increases the uptake of lactose into the cell and lacA encodes a galactoside acetyltransferase (GAT) enzyme. The exact function of GAT during lactose metabolism has not been conclusively elucidated but acetylation is thought to play a role in the transport of the modified sugars.

Figure 13.4 Reactions Controlled by the Expression of the Lac Operon. (A) Expression of the β-galactosidase enzyme enables the breakdown of lactose into the simple sugars, glucose and galactose for E. coli to use as a food resource. (B) The β-galactosidase enzyme also mediates the breakdown of the non-native substrate 5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside (Xgal).  Breakdown product (1) 5-bromo-4-chloro-3-hydroxyindole quickly dimerizes into the intensely blue product (2) 5,5′-dibromo-4,4′-dichloro-indigo making it a useful tool for molecular biology. (C) β-D-1-thiogalactopyranoside (IPTG) can serve as a non-native inducer of the lac operon. It mimics the structure of lactose and binds with the Lac Repressor.

Figure modified from: Andreas Piehler, Yikrazuul, and NUROtiker

For the lac operon to be expressed, lactose must be present. This makes sense for the cell because it would be energetically wasteful to create the enzymes to process lactose if lactose was not available.

In the absence of lactose, the lacI gene is constituitively expressed, expressing the lac repressor protein (Fig. 13.3 B). The lac repressor binds with an operator region of the lac operon and physically prevents RNA polymerase from transcribing the structural genes (Fig. 13.3 B). However, when lactose is present, the lactose inside the cell is converted to allolactose. Allolactose serves as an inducer molecule, binding to the repressor and changing its shape so that it is no longer able to bind to the operator DNA (Fig. 13.3 C). Removal of the repressor in the presence of lactose allows RNA polymerase to move through the operator region and begin transcription of the lac structural genes. In addition to lactose, laboratory experiments have revealed that the non-natural compound Isopropyl β-D-1-thiogalactopyranoside (IPTG) can also bind with the lac repressor and cause the expression of lac operon (Figure 13.4 C). Similar to Xgal, this compound has also been used as a research tool for molecular cloning.

The Lac Operon: Activation by Catabolite Activator Protein

Bacteria typically have the ability to use a variety of substrates as carbon sources. However, because glucose is usually preferable to other substrates, bacteria have mechanisms to ensure that alternative substrates are only used when glucose has been depleted. Additionally, bacteria have mechanisms to ensure that the genes encoding enzymes for using alternative substrates are expressed only when the alternative substrate is available. In the 1940s, Jacques Monod was the first to demonstrate the preference for certain substrates over others through his studies of E. coli’s growth when cultured in the presence of two different substrates simultaneously. Such studies generated diauxic growth curves, like the one shown in Figure 13.5. Although the preferred substrate glucose is used first, E. coli grows quickly and the enzymes for lactose metabolism are absent. However, once glucose levels are depleted, growth rates slow, inducing the expression of the enzymes needed for the metabolism of the second substrate, lactose. Notice how the growth rate in lactose is slower, as indicated by the lower steepness of the growth curve.

Graph with time (hours) on the X axis and Log of E. coli cells on the Y axis. For the first hour the graph is relatively flat but then becomes quite steep for the next 3 hours. The graph increases from 0.3 to 1 in 3 hours. This part of the graph is labeled E. coli uses glucose. The next part of the graph begins with another flat region of about an hout and then there is another increase. This increase goes from 1.2 to 1.9 in 4 hours. This part of the graph is labeled E. coli uses lactose.

Figure 13.5. Utilization of Glucose in E. Coli. When grown in the presence of two substrates, E. coli uses the preferred substrate (in this case glucose) until it is depleted. Then, enzymes needed for the metabolism of the second substrate are expressed and growth resumes, although at a slower rate.

Figure from: Parker, N., et. al. (2019) Microbiology. Openstax

The ability to switch from glucose use to another substrate like lactose is a consequence of the activity of an enzyme called Enzyme IIA (EIIA). When glucose levels drop, cells produce less ATP from catabolism and EIIA becomes phosphorylated. Phosphorylated EIIA activates adenylyl cyclase, an enzyme that converts some of the remaining ATP to cyclic AMP (cAMP), a cyclic derivative of AMP and important signaling molecule involved in glucose and energy metabolism in E. coli (Fig. 13.6). As a result, cAMP levels begin to rise in the cell. This is an indicator to the cell, that overall energy levels are low and that ATP is being depleted.

ATP contains 3 phosphate groups. Adenylyl cyclase removes two of these phosphate groups. The remaining phosphate group is linked into the sugar to make cAMP. Cyclic AMP is made of a ribose sugar with oxygens at both carbons 2 and 3 (the carbons at the bottom of the pentagon). The oxygen bound to carbon 3 is also bound to the phosphorus. Similarly, the oxygen bound at carbon 5 was already bound to the phosphorus. This forms a ring where the phosphorus is linked with an oxygen to both carbons 3 and 5.Figure 13.6. Conversion of ATP to cAMP. When ATP levels decrease due to depletion of glucose, some remaining ATP is converted to cAMP by adenylyl cyclase. Thus, increased cAMP levels signal glucose depletion.

Figure from: Parker, N., et. al. (2019) Microbiology. Openstax

The lac operon also plays a role in this switch from using glucose to using lactose. When glucose is scarce, the accumulating cAMP caused by increased adenylyl cyclase activity binds to catabolite activator protein (CAP), also known as cAMP receptor protein (CRP). The complex binds to the promoter region of the lac operon (Figure 13.7). In the regulatory regions of these operons, a CAP binding site is located upstream of the RNA polymerase binding site in the promoter. Binding of the CAP-cAMP complex to this site increases the binding ability of RNA polymerase to the promoter region to initiate the transcription of the structural genes. Thus, in the case of the lac operon, for transcription to occur, lactose must be present (removing the lac repressor protein) and glucose levels must be depleted (allowing binding of an activating protein). When glucose levels are high, there is catabolite repression of operons encoding enzymes for the metabolism of alternative substrates. Because of low cAMP levels under these conditions, there is an insufficient amount of the CAP-cAMP complex to activate transcription of these operons.

Diagram of the lac operon with and without cAMP. A) In the absence of cAMP, CAP does not bind the promoter. RNA polymerase does bind to the promoter and transcription occurs at a low rate. In the presence of cAMP, CAP binds the promoter and increases RNA polymerase activity. This is shown with a circle labeled cAMP + CAP bound to the promoter. RNA polymerase is also bound to the promoter and a thick arrow indicates faster transcription. B) cAMO-CAP complex stimulates RNA polymerase activity and increases RNA synthesis. However, even in the presence of cAMP-CAP complex, RNA synthesis is blocked when repressor is bound ot he operator. This is shows as the cAMP + CAP circle as well as the RNA polymerase bound to the promoter. The repressor is bound to the operator and this blocks RNA polymerase from moving forward.Figure 13.7 Effect of CAP on the Lac Operon. (a) In the presence of cAMP, CAP binds to the promoters of operons, like the lac operon, that encode genes for enzymes for the use of alternate substrates. (b) For the lac operon to be expressed, there must be activation by cAMP-CAP as well as removal of the lac repressor from the operator.

Figure from: Parker, N., et. al. (2019) Microbiology. Openstax

Global Responses of Prokaryotes

In prokaryotes, there are also several higher levels of gene regulation that have the ability to control the transcription of many related operons simultaneously in response to an environmental signal. A group of operons all controlled simultaneously is called a regulon.


When sensing impending stress, prokaryotes alter the expression of a wide variety of operons to respond in coordination. They do this through the production of alarmones, which are small intracellular nucleotide derivatives, such as guanosine pentaphosphate (pppGpp) (Fig. 13.8).

Guanosine pentaphosphate - Wikipedia

Figure 13.8 Structure of Guanosine Pentaphosphate (pppGpp)

Figure from: Yikrazuul

Alarmones change which genes are expressed and stimulate the expression of specific stress-response genes. For example, pppGpp signaling is involved in the stringent response in bacteria, causing the inhibition of RNA synthesis when there is a shortage of amino acids present. This causes translation to decrease and the amino acids present are therefore conserved. Furthermore, pppGpp causes the up-regulation of many other genes involved in stress response such as the genes for amino acid uptake (from surrounding media) and biosynthesis.

The use of alarmones to alter gene expression in response to stress appears to be important in pathogenic bacteria, as well. On encountering host defense mechanisms and other harsh conditions during infection, many operons encoding virulence genes are upregulated in response to alarmone signaling. Knowledge of these responses is key to being able to fully understand the infection process of many pathogens and to the development of therapies to counter this process.

Back to the Top

Quorum Sensing

Quorum sensing (QS) is an intercellular communication mechanism of bacteria used to coordinate the activities of individual cells in population level in response to surroundings through production and perception of diffusible signal molecules such as Acyl Homoserine Lactones or small singaling peptides (Fig. 13.9). The signal synthase, signal receptor, and signal molecules are three essential elements of the basic QS circuit machinery (Fig. 13.9). Genes encoding signal generating proteins are also included among the QS target genes. This forms an autoinduction feedback loop to modulate generation of signal molecules. Several bacterial behaviors including virulence factors expression, secondary metabolites production, biofilm formation, motility, and luminescence are regulated by QS. Through complex regulatory networks bacteria are capable of expressing corresponding genes according to their own population size and of behaving in a coordinated manner.

Figure 13.9 Examples of Quorum Sensing Pathways. (Left panel) Typical Gram-negative quorum sensing mechanism. Acyl homoserine lactone molecules, synthesized by LuxI, passively pass the bacterial cell membrane and when a sufficient concentration is reached (threshold level) activate the intracellular LuxR which subsequently activates target gene expression in a coordinated way. Note that a single cell is shown for simplicity. However, acyl homoserine lactones will commonly diffuse and target neighboring cells within the colony to mediate a communal or population response within the bacterial colony. (Right panel) Quorum sensing peptides are synthesized by the bacterial ribosomes as pro-peptidic proteins and undergo posttranslational modifications during excretion by active transport. The quorum sensing peptides bind membrane associated receptors which get autophosphorylated and activate intracellular response regulators via phosphor-transfer. These phosphorylated response regulators induce increased target gene expression.

Figure from: Verbeke, F., et.al. (2017) Frontiers in Neuroscience 11:183.

For example, some microbial species, such as Staphylococcus aureus, can encase their community within a self-produced matrix of hydrated extracellular polymeric substances that include polysaccharides, proteins, nucleic acids, and lipid molecules. These encasements are known as biofilms. The formation of the biofilm on solid surfaces is a step-wise process comprising several stages (Fig. 13.10). It starts with the conditioning of the surface through the coating with macromolecules from the aqueous surrounding, which enables initial reversible adhesion of microorganisms. The next step is a formation of stronger, irreversible attachments to the surface, followed by the proliferation and aggregation of microorganisms into multicellular and multilayered clusters, which actively produce extracellular matrix. Some cells in the mature biofilms continuously detach and separate from the aggregates, representing a continuous source of planktonic bacteria that can subsequently spread and form new microcolonies.

Pharmaceutics 08 00018 g001 1024

Figure 13.10 Schematic drawing of biofilm formation.

Figure from: Rukavina, Z., and Vanic, Z. (2016) Pharmaceutics 8(2):18.

Biofilms are a common cause of chronic, nosocomial and medical device-related infections, due to the fact that they can develop either on vital or necrotic tissue as well as on the inert surfaces of different implanted materials. Moreover, biofilms are linked with high-level resistance to antimicrobials, frequent treatment failures, increased morbidity and mortality. As a consequence, biofilm infections and accompanying diseases have become a major health concern and a serious challenge for both modern medicine and pharmacy. The rough estimation shows that more than 60% of hospital-associated infections are attributable to the biofilms formed on indwelling medical devices, which result in more than one million cases of infected patients annually and more than $1 billion of hospitalization costs per year in the USA.

Biofilm infections share some common characteristics: slow development in one or more hot-spots, delayed clinical manifestation, persistency for months or years, usually with interchanging periods of acute exacerbations and absence of clinical symptoms. Even though they are less aggressive than acute infections, their treatment is challenging to a greater extent. The main reason for the aforesaid is up to 1000-fold decrease in susceptibility of biofilms to antimicrobial agents and disinfectants as well as resistance to host immune response. Thus, ways to reduce or inhibit biofilm formation are highly sought.  The majority of the proposed biofilm-control methods focuses on: (i) prevention and minimization of biofilm formation by selection and surface modifications of anti-adhesive materials; (ii) debridement techniques including ultrasound and surgical procedures; (iii) disruption of biofilm QS-signaling system; or (iv) achieving proper drug penetration and delivery to formed biofilms by the use of electromagnetic field, ultrasound waves, photodynamic activation or specific drug delivery systems.

Alternate σ Factors

Since the σ subunit of bacterial RNA polymerase confers specificity as to which promoters should be transcribed, altering the σ factor used is another way for bacteria to quickly and globally change what regulons are transcribed at a given time. The σ factor recognizes sequences within a bacterial promoter, so different σ factors will each recognize slightly different promoter sequences. In this way, when the cell senses specific environmental conditions, it may respond by changing which σ factor it expresses, degrading the old one and producing a new one to transcribe the operons encoding genes whose products will be useful under the new environmental condition. For example, in sporulating bacteria of the genera Bacillus and Clostridium (which include many pathogens), a group of σ factors controls the expression of the many genes needed for sporulation in response to sporulation-stimulating signals.

Prokaryotic Attenuation and Riboswitches

Although most gene expression is regulated at the level of transcription initiation in prokaryotes, there are also mechanisms to control both the completion of transcription, as well as translation, concurrently. Since their discovery, these mechanisms have been shown to control the completion of transcription and translation of many prokaryotic operons. Because these mechanisms link the regulation of transcription and translation directly, they are specific to prokaryotes, because these processes are physically separated in eukaryotes.

One such regulatory system is attenuation, whereby secondary stem-loop structures formed within the 5’ end of an mRNA being transcribed determine if transcription to complete the synthesis of this mRNA will occur and if this mRNA will be used for translation. Beyond the transcriptional repression mechanism already discussed, attenuation also controls expression of the trp operon in E. coli (Fig. 13.11). The trp operon regulatory region contains a leader sequence called trpL between the operator and the first structural gene, which has four stretches of RNA that can base pair with each other in different combinations. When a terminator stem-loop forms, transcription terminates, releasing RNA polymerase from the mRNA. However, when an antiterminator stem-loop forms, this prevents the formation of the terminator stem-loop, so RNA polymerase can transcribe the structural genes.

a) At high level of tryptophan a complete polypeptide is made from mRNA regions 1 and 2. Regions 3 and 4 form a terminator stem loop and RNA polymerase stops transcription. B) At low levels of tryptophan an incomplete polypeptide is produced from region 1. Regions 2 and 3 form an antiterminator stem loop which causes the ribosome to stall. This allows RNA polymerase to transcribe the trp regulated genes.

Figure 13.11. Attenuation of Transcription and Translation. When tryptophan is plentiful, translation of the short leader peptide encoded by trpL proceeds, the terminator loop between regions 3 and 4 forms, and transcription terminates. When tryptophan levels are depleted, translation of the short leader peptide stalls at region 1, allowing regions 2 and 3 to form an antiterminator loop, and RNA polymerase can transcribe the structural genes of the trp operon.

Figure from: Parker, N., et. al. (2019) Microbiology. Openstax

A related mechanism of concurrent regulation of transcription and translation in prokaryotes is the use of a riboswitch, a small region of noncoding RNA found within the 5’ end of some prokaryotic mRNA molecules (Figure 13.12). A riboswitch may bind to a small intracellular molecule to stabilize certain secondary structures of the mRNA molecule. The binding of the small molecule determines which stem-loop structure forms, thus influencing the completion of mRNA synthesis and protein synthesis.

a) The mRNA forms a loop called a riboswitch and a loop called an antiterminator stem loop. RNA polymerase can proceed transcription. This is labeled

Figure 13.12. Riboswitch Form and Function. Riboswitches found within prokaryotic mRNA molecules can bind to small intracellular molecules, stabilizing certain RNA structures, influencing either the completion of the synthesis of the mRNA molecule itself (left) or the protein made using that mRNA (right).

Figure from: Parker, N., et. al. (2019) Microbiology. Openstax

Back to the Top

13.2 Eukaryotic Gene Regulation

As seen in Chapter 10, the initiation of transcription requires the assembly of a multitude of transcription factors (TF) localized at the promoter region. Transcription can also utilize far reaching interactions of enhancers, that bind at a distant DNA site and loop back around to stabilize the RNA polymerase at the promoter. Control of transcriptional initiation is dependent on TF factor activation, TF binding with specific DNA recognition sequences, and chromatin remodeling.

Transcription Factor (TF) Activation

Many TF are expressed within cells and held in an inactive conformation until the right environmental stimulus is present within the cell. Cellular signaling pathways can cause post-translational protein modifications leading to TF activation or small molecules may physically bind and allosterically modify the protein structure to mediate activation. Here we will use examples from the cell cycle signaling cascade and steroid hormone receptor pathways to highlight some mechanisms of TF activation. A key element to take away from this section is that transcription factor activation is often highly pleiotropic and has many cellular affects. Depending on the cell type and the environmental conditions, different combinations of downstream target genes may be activated or inactivated. Teasing apart these intricacies and the physiological effects that they have within an organism is a major goal of ongoing research.

Cell Cycle Regulation by p53

p53 is one of the most studied proteins in science. To date, over 68,000 papers appear in PubMed containing p53 or TP53 in the title and/or abstract. Originally described as an oncogene (since a mutated, functionally altered form of the protein was first characterized), p53 is now recognized as the most frequently inactivated tumor suppressors in human cancers. It is a transcription factor that controls the expression of genes and miRNAs affecting many important cellular processes including proliferation, DNA repair, programmed cell death (apoptosis), autophagy, metabolism, and cell migration (Fig. 13.13). Many of those processes are critical to a variety of human pathologies and conditions extending beyond cancer, including ischemia, neurodegenerative diseases, stem cell renewal, aging, and fertility. Notably, p53 also has non-transcriptional functions, ranging from intrinsic nuclease activity to activation of mitochondrial Bak (Bcl-2 homologous antagonist killer) and caspase-independent apoptosis.

As a transcription factor, p53 responds to various genotoxic insults and cellular stresses (e.g., DNA damage or oncogene activation) by inducing or repressing the expression of over a hundred different genes. p53 transcriptional regulation plays a dominant role in causing the arrest of damaged cells, facilitating their repair and survival, or inducing cell death when DNA is damaged irreparably. p53 can also cause cells to become permanently growth arrested, and there is compelling in vivo evidence that these “senescent” cells secrete factors that enhance their clearance by the immune system, leading to tumor regression. Through these mechanisms, p53 helps maintain genomic stability within an organism, justifying its long-held nickname “guardian of the genome”. Other p53 gene targets are involved in inhibiting tumor cell angiogenesis, migration, metastasis and other important processes (such as metabolic reprogramming) that normally promote tumor formation and progression

Cancers 07 00030 g001

Figure 13.13. Cellular stress leads to p53 transcriptional activation of downstream targets. Normally, p53 levels are kept low by its major antagonist, Mdm2, an E3 ubiquitin ligase that is itself a transcriptional target of p53. Stress signals, such as DNA damage, oncogene activation and hypoxia, promote p53 stability and activity by inducing post-translational modifications (PTMs) and tetramerization of p53. p53 functions as a transcription factor that binds to specific p53 response elements upstream of its target genes. p53 affects many important cellular processes linked to tumor suppression, including the induction (green) of senescence, apoptosis, and DNA repair as well as inhibition (red) of metabolism, angiogenesis, and cell migration. These functions are largely mediated through transcriptional regulation of its targets (examples given).

Figure from: Reed, S.M., and Quelle, D.E. (2015) 7(1):30-69.

p53 protein function is regulated post-translationally by coordinated interaction with signaling proteins including protein kinases, acetyltransferases, methyl-transferses, and ubiquitin-like modifying enzymes (Figure 13.14). The majority of the sites of covalent modification occur at intrinsically unstructured linear peptide docking motifs that flank the DNA-binding domain of p53 which play a role in anchoring or in allosterically activating the enzymes that mediate covalent modification of p53. In undamaged cells, p53 protein has a relatively short half-life and is degraded by a ubiquitin-proteasome dependent pathway through the action of E3 ubiquitin ligases, such as MDM2 (Fig 13.13). Following stress, p53 is phosphorylated at multiple residues, thereby modifying its biochemical functions required for increased activity as a transcription factor. Post-translational modifications help to stabilize the tetramer formation of the protein and enhance the translocation of the protein from the cytoplasm into the nucleus. The tetrameric form of p53 is then functional to bind to DNA in a sequence-specific manner and either activate or repress transcription, depending on the target sequence. Some post-translational modifications, such as acetylation, are DNA-dependent and can play a role in chromatin remodeling and activation of p53 target gene expression.

An external file that holds a picture, illustration, etc. Object name is aging-01-490-g001.jpg

Figure 13.14 Sites of Post-Translational Modification on p53. Schematic representation of the 393 amino acid domain structure of human p53 showing the sites of post-translational modification including phosphorylation, acetylation, ubiquitination, methylation, neddylation, and sumoylation. Abbreviations: N-terminal transactivation domain (TAD); proline-rich domain (PRD); tetramerisation domain (TET); C-terminal regulatory domain (REG); arginine (R); lysine (K); serine (S); threonine (T).

Figure from: Maclaine, N.J., and Hupp T.R. (2009) Aging 1(5):490-502

It should be noted that single point mutations that modify the ability of the protein to be phosphorylated in one position, typically do not show a decrease in the stabilization or activation of the protein following a damage or stress event. Thus, multiple modifications likely allow for redundancy within this pathway and ensure the activation of the protein following a stress event. Furthermore, the environment within the cell can lead to different p53 phenotypes, such as the activation of growth arrest and DNA repair processes (ie if there is not a lot of damage) or it can lead to the activation of apoptosis or programmed cell death pathways (ie if damage is too extensive to be repaired).

Steroid Hormone Receptors

Steroid hormone receptors (SHRs) belong to the superfamily of nuclear receptors (NRs), which are one of the essential classes of transcriptional factors. NRs play a critical role in all aspects of human development, metabolism and physiology. Since they generally act as ligand-activated transcription factors, they are an essential component of cell signaling. NRs form an ancient and conserved family that arose early in the metazoan lineage. NR molecular evolution is characterized by major events of gene duplication and gene losses. Phylogenetic analysis revealed a distinct separation of NR ligand binding domains (LBDs) into 4 monophyletic branches, the steroid hormone receptor-like cluster, the thyroid hormone-like receptors cluster, the retinoid X-like and steroidogenic factor-like receptor cluster and the nerve growth factor-like/HNF4 receptor cluster (Fig. 13.15).

Figure 13.15 Phylogenetic tree of the nuclear receptors’ ligand binding domain. Four distinct monophyletic branches are visible. Those monophyletic branches are divided into subcategories. The phylogenetic trees confidently separate the steroid hormone-like (branch colored green), the retinoid X-like and steroidogenic factor-like receptors cluster (branch colored orange), the thyroid hormone-like receptors cluster (branch colored blue) and the nerve growth factor-like/hepatocyte nuclear factor-4 receptors cluster (branch colored yellow).

Figure from: Mitsis, T., et. al. (2020)  World Acad Sci J 1: 264-274, 2019

Here we will focus on the Steroid Hormone-Like Receptors branch (SHRs). SHRs plays a key role in many important physiological processes like organ development, metabolite homeostasis, and response to external stimuli. The estrogen receptor comes in two major forms, ERα and ERβ. Other members of this subgroup include the cortisol binding glucocorticoid receptor (GR), the aldosterone binding mineralocorticoid receptor (MR), the progesterone receptor (PR), and the dihydrotestosterone (DHT) binding androgen receptor (AR) (Fig. 13.16). 

An external file that holds a picture, illustration, etc. Object name is nrs05003.f1.jpg

Figure 13.16 Overview of Steroid Hormone Receptor Family (SHR). A. Phylogenetic tree of the Steroid Hormone Receptor (SHR) family showing the evolutionary interrelationships and distance between the various receptors. Based on alignments available at The NucleaRDB [Horn et al., 2001]. B. All steroid receptors are composed of a variable N-terminal domain (A/B) containing the AF-1 transactivation region, a highly conserved DNA Binding Domain (DBD), a flexible hinge region (D), and a C-terminal Ligand Binding Domain (LBD, E) containing the AF-2 transactivation region. The estrogen receptor α is unique in that it contains an additional C-terminal F domain. Numbers represent the length of the receptor in amino acids.

Figure from: Griekspoor, A., et. al. (2007) Nucl Recept Signal. 5:e003

The members of the Steroid Hormone Receptor family share a similar, modular architecture, consisting of a number of independent functional domains (Fig. 13.16B). Most conserved is the centrally located DNA binding domain (DBD) containing the characteristic zinc-finger motifs. The DBD is followed by a flexible hinge region and a moderately conserved Ligand Binding Domain (LBD), located at the carboxy-terminal end of the receptor. The estrogen receptor α is unique in that it contains an additional F domain of which the exact function is unclear. The LBD is composed of twelve α-helices (H1-H12) that together fold into a canonical α-helical sandwich. Besides its ligand binding capability, the LBD also plays an important role in nuclear translocation, chaperone binding, receptor dimerization, and coregulator recruitment through its potent ligand-dependent transactivation domain, referred to as AF-2. A second, ligand independent, transactivation domain is located in the more variable N-terminal part of the receptor, designated as AF-1. To date, no crystal structure of a full-length SHR exists, though structures of the DBD and LBD regions of most SHRs are available. These have helped significantly in understanding the molecular aspects of DNA and ligand binding, but have to some extent also led to biased attention to these parts of the receptor only. For example, many coregulator interaction studies are still performed with the LBD only, while numerous studies have demonstrated that the AF-2 domain often tells only part of the story. With the help of biophysical techniques, however, it is feasible to study the full-length receptor in its native environment (Figure 13.16).

Most SHRs remain in the cytoplasm of the cell until they are bound with the appropriate steroid (Fig 13.17). Steroid binding causes the dimerization of SHRs and localization to the cell nucleus, where the SHRs interact with the DNA at sequence specific motifs known as Hormone Response Elements (HREs) (Fig. 13.17, Step 5). Many SHRs can also interact with membrane-bound receptors and affect cellular signaling pathways, in addition to the activation of gene expression (Fig. 13.17, step 6).

An external file that holds a picture, illustration, etc. Object name is nrs05003.f2.jpg

Figure 13.17 Steroid Hormone Receptors (SHR) act as hormone dependent nuclear transcription factors. Upon entering the cell by passive diffusion, the hormone (H) binds the receptor, which is subsequently released from heat shock proteins, and translocates to the nucleus. There, the receptor dimerizes, binds specific sequences in the DNA, called Hormone Responsive Elements or HREs, and recruits a number of coregulators that facilitate gene transcription.

Figure from: Griekspoor, A., et. al. (2007) Nucl Recept Signal. 5:e003

Steroid Hormones, such as the estrogens, reach their target cells via the blood, where they are bound to carrier proteins. Naturally occurring estrogens include estradiol, estrone, estriol, and estretrol and differ primarily in structure on the presence of hydroxyl-groups (Fig. 13.18). Estradiol is the predominant estrogen during reproductive years both in terms of absolute serum levels as well as in terms of estrogenic activity. During menopause, estrone is the predominant circulating estrogen and during pregnancy estriol is the predominant circulating estrogen in terms of serum levels. Another type of estrogen called estetrol (E4) is produced also produced predominantly during pregnancy (Fig 13.18). Estrogens function in many physiological processes, including the regulation of the menstrual cycle and reproduction, maintaining bone density, brain function, cholesterol mobilization, maturation of reproductive organs during development, and they play a role in controlling inflammation.

Figure 13.18 Naturally Occurring Estrogens.

Figure from: Wikipedia (2020) Estrogen.

Because of their lipophilic nature it is thought that steroid hormones, such as estrogen, pass the cell membrane by simple diffusion, although some evidence exists that they can also be actively taken up by endocytosis of carrier protein bound hormones. For a long time it has been assumed that binding of the ligand resulted in a simple on/off switch of the receptor (Fig. 13.17, step 1). While this is likely the case for typical agonists like estrogen and progesterone, this is not always correct for receptor antagonists, used in drug therapy. These antagonists come in two kinds, so-called partial antagonists (for the estrogen receptors known as SERMs for Selective Estrogen Receptor Modulators) and full antagonists. The partial antagonist can, depending on cell type, act as a SHR agonist or antagonist. In contrast, full antagonists (for ER known as SERDs for Selective Estrogen Receptor Downregulators) always inhibit the receptor, independent of cell type, in part by targeting the receptor for degradation. Binding of either type of antagonist results in major conformational changes within the LBD and in release from heat shock proteins that thus far had protected the unliganded receptor from unfolding and aggregation (Fig. 13.17 step 2).

Back to the Top

Trancription Factor (TF) Recognition and Binding to DNA

TF control gene expression by binding to their target DNA site to recruit, or block, the transcription machinery onto the promoter region of the gene of interest. Their function relies on the ability to find their target site quickly and selectively. In living cells TFs are present in nM concentrations and bind the target site with comparable affinity, but they also bind any DNA sequence (nonspecific binding), resulting in millions of low affinity (i.e., >10−6 M) competing sites. Nonspecific binding facilitates the search for the target site by three major mechanisms (Fig. 13.19). One of the main scenarios involves a ‘sliding’ mechanism, in which the protein moves from its initial non-specific site to its actual target site by sliding along the DNA (also known as 1-dimensional (1D) sliding) (Fig. 13.19). When the TF starts to move and shift counterions from the phosphate backbone, the same number of counterions binds to the site left free by the protein. The sliding rate is also dependent on the hydrodynamic radius of the protein; the required rotational movement over the DNA backbone is greater for larger proteins, that tend to slide slowly. The second scenario is a ‘hopping’ mechanism, in which a TF might hop from one site to another in 3D space by dissociating from its original site and subsequently binding to the new site. This may happen within the same chain and re-association occurs adjacent to the former dissociated site. A third search mechanism is described as ‘intersegmental transfer’. In this scenario, the protein moves between two sites via an intermediate ‘loop’ formed by the DNA and subsequently bind at two different DNA sites. This mechanism is applicable to TFs with two DNA-binding sites. Proteins with two DNA-binding sites can occasionally bind non-specifically to two locations situated far apart within the DNA strand, that are brought into close contact through the formation of these loops. Such TFs transfer across a point of close contact without dissociating from the DNA.


Figure 13.19 Protein-DNA recognition mechanisms. The main three protein-DNA recognition mechanisms are shown. When the transcription factor (pink ring) moves from one site to another by means of sliding along the DNA and is transferred from one base pair to another without dissociating from the DNA, this mechanism is called sliding (top). Hopping occurs when the transcription factor moves on the DNA by dissociating from one site and re-associating with another site (center). Intersegmental transfer describes the mechanism by which the transcription factor gets transferred through DNA bending or the formation of a DNA loop, resulting in the protein being bound transiently to both sides and subsequently moving from on site to the other (bottom).

Figure from: Yesudhas, D., et.al. (2017) Genes 8(8):192

Each eukaryotic TF controls tens to hundreds of genes scattered throughout the genome, and expressing each gene needs various TFs simultaneously binding to their sites to form the transcription complex, an extremely rare event in probabilistic terms. As result, the in vivo site occupancy patterns of eukaryotic TFs are more complex than predicted by their in vitro site-specific binding profiles and do not strongly correlate with the actual levels of gene expression. An interesting feature highlighted by genome analysis is an accumulation of potential TF binding sites in regions flanking eukaryotic genes. Such clusters of degenerate recognition sites are assumed to be key for transcription control, and thus are generally classified as gene regulatory regions (RR). For example, the affinity of the Drosophila TF Engrailed to the RRs of its target genes is strongly amplified by long tracts of degenerate consensus repeats that are present in such regions.

Histone Modification and Chromatin Remodeling

Regulation of transcription involves dynamic rearrangements of chromatin structure. Recall that eukaryotic DNA is complexed with histone octamers, which are composed of dimers of the core histones H2A, H2B, H3 and H4. 147 bp of DNA are wrapped 1.65 times around each octamer forming nucleosomes, the basic packaging units of chromatin. Nucleosomes, connected by linker DNA of variable length as “beads on a string”, generate the 11 nm linear structure. The linker histone H1 is positioned at the top of the core histone octamer and enables higher organized compaction of DNA into transcriptionally inactive 30 nm fibres.

To understand the role of chromatin for regulation of transcription it is important to know where nucleosomes are positioned and how positioning is achieved. Basically there are four groups of activities which change chromatin structure during transcription: (1) histone modifications, (2) eviction and repositioning of histones, (3) chromatin remodeling and (4) histone variant exchange. Histone modifiers introduce post-translational, covalent modifications to histone tails and thereby change the contact between DNA and histones. These modifications govern access of regulatory factors.  Histone chaperones aid eviction and positioning of histones. A third class of chromatin restructuring factors are ATP dependent chromatin remodelers. These multi-subunit complexes utilize energy from ATP hydrolysis for various chromatin remodeling activities including nucleosome sliding, nucleosome displacement and the incorporation and exchange of histone variants.

Post-translational modifications (PTMs) of histone proteins is a primary mechanism that controls chromatin architencture. Over 20 distinct types of histone PTMs have been described, among which the most abundant ones are acetylation and methylation of lysine residues. Histone PTMs can be deposited on and removed from chromatin by different enzymes, known as histone PTM ‘writers’ and ‘erasers’. Histone PTMs exert their regulatory effects via two main mechanisms. First, histone PTMs serve as docking sites for various nuclear proteins––histone PTM ‘readers’––that specifically recognize modified histone residues through their modification-binding domains. Recruitment of these proteins at specific genomic loci promotes key chromatin processes, such as transcriptional regulation and DNA damage repair. Second, some histone PTMs, such as acetylation, directly affect chromatin higher-order structure and compaction, thereby controlling chromatin accessibility to protein machineries such as those involved in transcriptiion. Chromatin may adopt one of two major states in an interchangeable manner. These states are heterochromatin and euchromatin. Heterochromatin is a compact form that is resistant to the binding of various proteins, such as transcriptional machinery. In contrast, euchromatin is a relaxed form of chromatin that is open to modifications and transcriptional processes (Fig. 13.20).  Histone methylation promotes the formation of Heterochromatin whereas, histone acetylation promotes euchromatin.

Figure 1

Figure 13.20 Schematic drawing of histone methylation and acetylation in relation to chromatin remodeling. Addition of methyl groups to the tails of histone core proteins leads to histone methylation, which in turn leads to the adoption of a condensed state of chromatin called ‘heterochromatin.’ Heterochromatin blocks transcription machinery from binding to DNA and results in transcriptional repression. The addition of acetyl groups to lysine residues in the N-terminal tails of histones causes histone acetylation, which leads to the adoption of a relaxed state of chromatin called ‘euchromatin.’ In this state, transcription factors and other proteins can bind to their DNA binding sites and proceed with active transcription.

Figure from: Kim, S., and Kaang, B-K. (2017) Exp & Mol. Med. 49:e281

Chromatin remodeling can also be an ATP-dependent process and involve histone dimer ejection, full nucleosome ejection, nucleosome sliding, and histone variant exchange (Fig 13.21). ATP-dependent chromatin remodeling complexes bind to nucleosome cores and the surrounding DNA, and, using energy from ATP hydrolysis, they disrupt the DNA-histone interactions, slide or eject nucleosomes, alter nucleosome structures, and modulate the access of transcription factors to the DNA (Figure 13.21). In addition to modulating gene expression, some of the complexes are involved in nucleosome assembly and organization, following transcription at locations in which nucleosomes have been ejected, packing of DNA, following replication and DNA repair.

Overview of the functions of ATP-dependent chromatin remodeling complexes. (a) A subset of ISWI and CHD complexes are involved in nucleosome assembly, maturation, and spacing. (b) SWI/SNF complexes are primarily involved in histone dimer ejection, nucleosome ejection, and nucleosome repositioning through sliding, thus modulating chromatin access. (c) INO80 complexes are involved in histone exchange. It should be noted that the complexes might be involved in other chromatin remodeling functions (figure adapted from [52]).

Figure 13.21 Overview of the functions of ATP-dependent chromatin remodeling complexes. (a) A subset of ISWI and CHD complexes are involved in nucleosome assembly, maturation, and spacing. (b) SWI/SNF complexes are primarily involved in histone dimer ejection, nucleosome ejection, and nucleosome repositioning through sliding, thus modulating chromatin access. (c) INO80 complexes are involved in histone exchange. It should be noted that the complexes might be involved in other chromatin remodeling functions.

Figure from: Hasan, N., and Ahuja, N. (2019) Cancers 11(12):1859

Another level of chromatin regulation is accomplished by a dynamic exchange of canonical histones with specific histone variants. Histone variants are non-allelic isoforms of canonical histones that differ in their primary sequence and functional properties. For example, the histone variant H3.3 has been found to progressively accumulate in various mouse somatic tissues with age, resulting in near complete replacement of the canonical H3.1/2iso-forms by the age of 18 months. Deletion of H3.3 in mice is lethal and in the fruit fly, Drosophila, causes sterility. Within the nematode, C. elegans, loss of H3.3 exhibit a significant ‘bagging’ phenotype which involves eggs hatching inside the animal body. Furthermore, in organisms that had deficient insulin signaling, loss of H3.3 caused a reduction in lifespan (although this phenotype is not observed in animals with a wildtype insulin signaling pathway) (Fig. 13.22). H3.3 also appears to acculumate with age in humans, and its accumulation is often absent in tumor cells. Overall, histone variant replacement is associated with changes in post translational modifications (such as methylation), and has multiple effects on overall chromosome structure.

Figure 13.22 The Effects of Histone Variant H3.3 on C. elegans Lifespan. H3.3 expression increases over time in C. elegans during their normal lifespan. In organisms with impaired Inulin/IGF-1 signaling, germline deficiency of H3.3 resulted in significant decreases in lifespan.

Figure from: Piazzesi, A., et. al. (2016) Cell Rep 17(4):987-996.

Back to the Top

13.3 Protein-DNA Interactions

Proteins use a wide range of DNA-binding structural motifs, such as homeodomain (HD), helix-turn-helix (HTH), and high-mobility group box (HMG) to recognize DNA. HTH is the most common binding motif and can be found in several repressor and activator proteins (Fig. 13.23). Despite their structural diversity, these domains participate in a variety of functions that include acting as substrate interaction mediators, enzymes to operate DNA, and transcriptional regulators. Several proteins also contain flexible segments outside the DNA-binding domain to facilitate specific and non-specific interactions. For example, many HD proteins use N-terminal arms and a linker region to interact with DNA. The Encyclopedia of DNA Elements (ENCODE) data suggest that about 99.8% of putative binding motifs of TFs are not bound by their respective TFs in the genome. It is, therefore, clear that the presence of a single binding motif per TF is not adequate for TF binding.

An external file that holds a picture, illustration, etc. Object name is genes-08-00192-g002.jpg

Figure 13.23 Representative figures of the transcription factor binding domains. The figure shows the crystal structures of different types of TF domains (3l1p, 4m9e, 5d5v, 1lbg, 1gt0, and 1nkp). The structures were obtained from the Protein Data Bank (PDB) and redrawn using chimera. The respective domains and important regions have been labeled. HTH stands for helix-turn-helix domain. bHLH stands for basic helix-loop-helix motif. HD and HMG stand for homeodomain and high-mobility group box domain, respectively.

Figure from: Yesudhas, D., et.al. (2017) Genes 8(8):192

Most of the searching mechanism studies that try to determine how TFs find their binding sites are limited to naked DNA-protein complexes, which do not reflect the actual crowded environment of a cell. Studies with naked DNA and transcription factors have shown that many DNA-binding proteins travel a long distance by 1D diffusion. However, the search process for eukaryotes must occur in the presence of chromatin, which has the ability to hinder protein mobility. In this case, the protein must dissociate from the DNA, enter a 3D mode of diffusion state, and continue the target site searching process.

The sliding and intersegmental transfer mechanisms can be explained through the example of the lac repressor. The lac repressor contains 4 identical monomers (a dimer of dimers) for its DNA-binding. The binding sequence of these dimers is symmetric or pseudo-symmetric, and each half is identified by these identical monomers. The HTH domain of the lac repressor is the DNA-binding domain that facilitates the interaction with its target site on DNA (Fig. 13.24). As a result of a rapid search (sliding) along the DNA molecule and intersegmental transfer between distant DNA sequences, the lactose repressor finds its target sites faster than the diffusion limit. The section comprised between residues 1–46 of the HTH protein domain, characterized by three α-helices, maintains its secondary structure through specific and non-specific binding (Fig 13.24). When the repressor binds to a non-specific site, the HTH domain interacts with the DNA backbone and maintains the interaction with its helix region in the major groove juxtaposition. This arrangement facilitates the interaction of the recognition helix with the edges of the DNA bases, enabling the repressor to walk or search for its specific site on the DNA. The C-terminal residues of the DNA-binding domain, residues 47–62, form the hinge region, and are normally disordered during non-specific recognition; however, during specific site recognition, residues 50–58 acquire an α-helix configuration (hinge helix) (Fig. 13.24). The disordered hinge region and the flexibility of the HTH domain allow the protein to move freely along the DNA to search for its target site. In specific binding complexes, the hinge helix of each monomer is located at the symmetrical center of the binding site, thereby causing the hinge helices to interact with each other (intersegmental transfer) to allow better stability. Moreover, DNA bends at the symmetrical center of the specific binding site (37° angle), thereby supporting monomer-monomer interactions (Fig 13.24).

Figure 13.24. The Helix-Turn-Helix Motif of the Lac Repressor. Lac repressor binds to DNA non-specifically, enabling it to slide rapidly along the DNA double helix until it encounters the lac operator sequence. The DNA-binding domain employs a helix-turn-helix (HTH) motif (Alpha Helices, Turns). During non-specific binding, the hinge region is disordered. The DNA double helix is depicted as straight in the model when the Lac Repressor binds non-specifically. Upon recognizing the specific operator sequence, the non-specific binding converts to specific binding. During this conversion, the hinge region changes from disordered loops to Alpha Helices, which bind to the minor groove of the DNA. As explained below, this binding stabilizes a kinked (“bent”) DNA double helix conformation.

Figure from: Protopedia – Life in 3D

In addition to the helix-turn-helix structure, the zinc finger motif is also very common, especially in eukaryotic TFs (Fig. 13.25). Proteins that contain zinc fingers (zinc finger proteins) are classified into several different structural families. Unlike many other clearly defined supersecondary structures such as Greek keys or β hairpins, there are a number of types of zinc fingers, each with a unique three-dimensional architecture. A particular zinc finger protein’s class is determined by this three-dimensional structure, but it can also be recognized based on the primary structure of the protein or the identity of the ligands coordinating the zinc ion. In spite of the large variety of these proteins, however, the vast majority typically function as interaction modules that bind DNA, RNA, proteins, or other small, useful molecules, and variations in structure serve primarily to alter the binding specificity of a particular protein. The most common type of zinc finger motif utilizes two Cys and two His residues (CCHH) coordinating the Zn(II) ion to adopt a ββα fold with three hydrophobic residues responsible for the formation of a small hydrophobic core which offers additional stabilization of the zinc finger domain (Fig. 13.25).


Figure 13.25 Sequence alignments of the CCHH zinc fingers and a representative structure. (a) Alignment of the TFIIIA-like zinc finger domains from different organisms. Green color denotes residues that are responsible for the hydrophobic core formation in most CCHH zinc fingers (L17, F11 and L2). Yellow and blue indicate the coordinating Cys and His residues, respectively. (b) The 3D NMR structure of 15-th ZF from zinc finger protein 478 [PDB: 2YRH].

Figure from: Kluska, K., Adamczyk, J., and Krezel, A. (2018) Coord Chem Rev 367:18-64

Overall, zinc finger motifs display considerable versatility in binding modes, even between members of the same class (e.g., some bind DNA, others protein), suggesting that they are stable scaffolds that have evolved specialised functions. For example, zinc finger-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organization, epithelial development, cell adhesion, protein folding, chromatin remodeling, and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

The last binding domain that we will consider in detail here is the helix-loop-helix domains found in Leucine zipper-containing proteins. Specifically, bZIPs (Basic-region leucine zippers) are a class of eukaryotic transcription factors. The bZIP domain is 60 to 80 amino acids in length with a highly conserved DNA binding basic region and a more diversified leucine zipper dimerization region. The two regions form α-helical structures that are connected together via a looped region.  This forms a core helix-loop-helix (HLH) structure within each monomer of the protein. Two monomers then join through the fomation of a leucine zipper junction forming a heterodimeric protein structure. The resulting heterodimer can bind with DNA in a sequence-specific manner through the basic α-helices (Fig. 13.26).

Specifically, basic residues, such as lysines and arginines, interact in the major groove of the DNA, forming sequence-specific interactions (Fig 13.26). Most bZIP proteins show high binding affinity for the ACGT motifs. The bZIP heterodimers exist in a variety of eukaryotes and are more common in organisms with higher evolution complexity.

Figure 13.26 Leucine Zipper Transcription Factors from the bZIP family. The monomer subunits of a heterodimeric bZIP protien contain a Helix-loop-Helix (HLH) core structure, where one helix forms the leucine zipper with the other monomer, and the basic helices of each monomer interact with the major groove of the target DNA. The helices are held together by a flexible loop region. (One monomer is shown in blue and one monomer is shown in green).

Figure from: Latacca

Back to the Top

13.4 Epigenetics and Transgenerational Inheritence

Even though all somatic cells of a multicellular organism have the same genome, different cell types have different transcriptomes (set of all expressed RNA molecules), different proteomes (set of all proteins) and, hence, different functions. Cell differentiation during embryonic development requires the activation and repression of specific sets of genes by the action of cell lineage defining transcription factors. Within a cell lineage, gene activity states are often maintained over several rounds of cell divisions (a phenomenon called “cellular memory” or “cellular inheritance”). Since the rediscovery of epigenetics some 30 years ago (it was originally proposed by Conrad Hal Waddington in the early 1940s), cellular inheritance has been attributed to gene regulatory feedback loops, chromatin modifications (DNA methylation and histone modifications) as well as long-lived non-coding RNA molecules, which collectively are called the “epigenome”. Among the different chromatin modifications, DNA methylation and polycomb-mediated silencing are probably the most stable ones and endow genomes with the ability to impose silencing of transcription of specific sequences even in the presence of all of the factors required for their expression.

Defining Transgenerational Epigenetic Inheritance

The metastability of the epigenome explains why development is both plastic and canalized, as originally proposed by Waddington. Although epigenetics deals only with the cellular inheritance of chromatin and gene expression states, it has been proposed that epigenetic features could also be transmitted through the germline and persist in subsequent generations. The widespread interest in “transgenerational epigenetic inheritance” is nourished by the hope that epigenetic mechanisms might provide a basis for the inheritance of acquired traits. Yes, Lamarck has never been dead and every so often raises his head, this time with the help of epigenetics.

Although acquired traits concerning body or brain functions can be written down in the epigenome of a cell, they cannot easily be transmitted from one generation to the next. For this to occur, these epigenetic changes would have to manifest in the germ cells as well, which in mammals are separated from somatic cells by the so-called Weismann barrier. Further, the chromatin is extensively reshaped during germ cell differentiation as well as during the development of totipotent cells after fertilization, even though some loci appear to escape epigenetic reprogramming in the germline. Long-lived RNA molecules appear to be less affected by these barriers and therefore more likely to carry epigenetic information across generations, although the mechanisms are largely unsolved.

Evidence for Transgenerational Epigenetic Inheritance

In the past 10 years, numerous reports on transgenerational responses to environmental or metabolic factors in mice and rats have been published. The factors include endocrine disruptors, high fat diet, obesity, diabetes, undernourishment as well as trauma. These studies investigated DNA methylation, sperm RNA or both. For example, when male mice are made prediabetic by treatment with streptozotocin it affects the DNA methylation patterns in their resulting sperm, as well as the pancreatic islets of F1 and F2 of the resulting offspring. Furthermore, studies have shown that traumatic stress in early life altered behavioral and metabolic processes in the progeny and that injection of sperm RNAs from traumatized males into fertilized wild-type oocytes reproduced the alterations in the resulting offspring.

In humans, epidemiological studies have linked food supply in the grandparental generation to health outcomes in the grandchildren. An indirect study based on DNA methylation and polymorphism analyses has suggested that sporadic imprinting defects in Prader–Willi syndrome are due to the inheritance of a grandmaternal methylation imprint through the male germline. Because of the uniqueness of these human cohorts these findings still await independent replication. Most cases of segregation of abnormal DNA methylation patterns in families with rare diseases, however, turned out to be caused by an underlying genetic variant. Thus, it is important that studies of this nature rule out the effects of traditional genetic inheritence as being a factor of the observed phenotypes.

Genetic inheritance alone cannot fully explain why we resemble our parents. In addition to genes, we inherited from our parents the environment and culture, which in parts have been constructed by the previous generations (Fig. 13.27). A specific form of the environment is our mother’s womb, to which we were exposed during the first 9 months of our life. The maternal environment can have long-lasting effects on our health. In the Dutch hunger winter, for example, severe undernourishment affected pregnant women, their unborn offspring and the offspring’s fetal germ cells. The increased incidence of cardiovascular and metabolic disease observed in F1 adults, is not due to the transmission of epigenetic information through the maternal germline, but a direct consequence of the exposure in utero, a phenomenon called “fetal programming” or—if fetal germ cells and F2 offspring are affected—“intergenerational inheritance”.


Figure 13.27. Transgenerational inheritance systems. a Offspring inherit from their parents genes (black), the environment (green) and culture (blue). Genes and the environment affect the epigenome (magenta) and the phenotype22. Culture also affects the phenotype, but at present there is no evidence for a direct effect of culture on the epigenome (broken blue lines). It is a matter of debate, how much epigenetic information is inherited through the germline (broken magenta lines). G genetic variant, E epigenetic variant. b An epimutation (promoter methylation and silencing of gene B in this example) often results from aberrant read-through transcription from a mutant neighboring gene, either in sense orientation as shown here or in antisense orientation. The presence of such a secondary epimutation in several generations of a family mimics transgenerational epigenetic inheritance, although it in fact represents genetic inheritance. Black arrow, transcription; black vertical bar, transcription termination signal; broken arrow, read-through transcription

Figure from: Horsthemke, B. (2018) Nat Comm. 9:2973

Roadmap to Proving Transgenerational Epigenetic Inheritance

  • Rule out genetic, ecological and cultural inheritance. For studies in mice and rats, inbred strains and strictly controlled environments need to be used. When a pregnant female animal is exposed to a specific environmental stimulus, F3 offspring and subsequent generations must be studied in order to exclude a direct effect of the stimulus on the embryos’ somatic cells and germ cells. Even more desirable is the use of in vitro fertilization (IVF), embryo transfer and foster mothers. When a male animal is exposed to an environmental stimulus, F2 offspring must be studied in order to exclude transient effects on germ cells. To ensure that any phenotype is exclusively transmitted via gametes, IVF must be used, controlling for possible artifacts relating to IVF. In contrast with laboratory animals, it is impossible to rule out ecological and cultural inheritance in humans, but genetic effects should and can be excluded. If an epimutation apparently follows Mendelian inheritance patterns, be cautious: you are more likely looking at a secondary epimutation and genetic inheritance. Study the haplotype background of the epimutation: if in a given family it is always on the same haplotype, you are again most likely dealing with a secondary epimutation. Do whole genome sequencing to search for a genetic variant that might have caused the epimutation and be aware that this variant might be distantly located. Good spots to start looking are the two neighboring genes, where a mutation might cause transcriptional read-through in sense or antisense orientation into the locus under investigation. Unfortunately, if you don’t find anything, you still cannot be 100% sure that a genetic variant does not exist.
  • Identify the responsible epigenetic factor in the germ cells. Admittedly, this is easier said than done, especially in female germ cells, which are scarse or unavailable. Be aware that germ cell preparations may be contaminated with somatic cells or somatic DNA. Use swim-up (sperm) or micromanipulation techniques to purify germ cells to the highest purity. Exclude the presence of somatic cells and somatic DNA by molecular testing, for example by methylation analysis of imprinted genes, which are fully methylated or fully unmethylated only in germ cells.
  • Demonstrate that the epigenetic factor in the germ cells is responsible for the phenotypic effect in the next generation. If possible, remove the factor from the affected germ cells and demonstrate that the effect is lost. Add the factor to control germ cells and demonstrate that the effect is gained. While RNA molecules can and have been extracted from sperm of exposed animals and injected into control zygotes, DNA methylation and histone modifications cannot easily be manipulated (although CRISPR/Cas9-based epigenome editors are being developed and used for this purpose), and all of these experiments can hardly be done in humans. In light of these problems, this might currently be too much to ask for to prove transgenerational epigenetic inheritance in humans, but should, nevertheless, be kept in mind and discussed.

13.5 References

  1. Parker, N., Schneegurt, M., Thi Tu, A-H., Lister, P., Forster, B.M. (2019) Microbiology. Openstax. Available at: https://opentextbc.ca/microbiologyopenstax/
  2. Chan, K-G., Liu, Y-C., and Chang C-Y. (2015) Inhibiting N-acyl-homoserine lactone synthesis and quenching Pseudomonas quinolone quorum sensing to attenuate virulence. Front. Microbiol. 6:1173. Available at:  https://www.frontiersin.org/articles/10.3389/fmicb.2015.01173/full
  3. Rukavina, Z., and Vanic Zeljka. (2016) Current trends in development of liposomes targeting bacterial biofilms. Pharmaceutics 8(2):18. Available at: https://www.mdpi.com/1999-4923/8/2/18/htm
  4. Wikipedia contributors. (2020, April 18). Guanosine pentaphosphate. In Wikipedia, The Free Encyclopedia. Retrieved 16:26, August 23, 2020, from https://en.wikipedia.org/w/index.php?title=Guanosine_pentaphosphate&oldid=951778776
  5. Verbeke, F., De Craemer, S., Debunne, N., Janssens, Y., Wynendaele, E., Van de Wiele, C., and De Spiegeleer. B. (2017) Peptides as quorum sensing molecules: measurement techniques and obtained levels in vitro and in vivo. Frontiers in Neuroscience 11:183. Available at: https://www.researchgate.net/publication/316055402_Peptides_as_Quorum_Sensing_Molecules_Measurement_Techniques_and_Obtained_Levels_In_vitro_and_In_vivo
  6. Yesudhas, D., Batool, M., Anwar, M.A., Panneerselvam, S., and Choi, S. (2017) Proteins recognizing DNA: Structural uniqueness and versatility of DNA-binding domains in Stem Cell Transcription Factors. Genes 8(8):192. Available at: https://www.mdpi.com/2073-4425/8/8/192/htm
  7. Castellanos, M., Mothi, N., and Muñoz, V. (2020) Eukaryotic transcription factors can track and control their target genes using DNA antennas. Nature Comm. 11:540. Available at: https://www.nature.com/articles/s41467-019-14217-8
  8. Neideracher, G., Klopf, E., and Schüller, C. (2011) Interplay of dynamic transcription and chromatin remodelling: Lessons from yeast. Int J Mol Sci 12(8):4758-4769. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3179130/
  9. Kim, S., and Kaang, B-K. (2017) Epigenetic regulation and chromatin remodeling in learning and memory. Exp. & Mol. Med. 49:e281. Available at: https://www.nature.com/articles/emm2016140#Fig1
  10. Tvardovskly, A., Schwämmle, V., Kempf, S., Rogowska-Wrzesinka, A., and Jensen, O.N. (2016) Accumulation of histone variant H3.3 with age is assocaiated with profound changes in the histone methylation landscape. Nuc. Acids Res. 45(16):1093. Available at: https://www.researchgate.net/publication/318987684_Accumulation_of_histone_variant_H33_with_age_is_associated_with_profound_changes_in_the_histone_methylation_landscape
  11. Cipolletti, M., Fernandez, V.S., Montalesi, E., Marino, M., Fiochetti, M. (2018) Beyond the antioxidant activity of dietary polyphenols in cancer: The modulation of estrogen receptors (ERs) signaling. Int J. Mol Sci 19(9)2624. Available at: https://www.mdpi.com/1422-0067/19/9/2624/htm
  12. Griekspoor, A., Zward, W., Neefjes, J., and Michalides, R. (2007) Visualizing the action of steroid hormone receptors in living cells. Nucl. Recept. Signal. 5:e003 Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1853070/
  13. Mitsis, T., Papargeorgiou, L., Efthimiadou, A., Bacopoulou, F., Vlachakis, D., Chrousos, G.P., Eliopoulos, E. (2020) A comprehensive structural and functional analysis of the ligand binding domain of the nuclear receptor superfamily reveals highly conserved signaling motifs and two distinct canoncial forms through evolution. World Acad Sci J 1: 264-274, 2019. Available at: https://www.spandidos-publications.com/10.3892/wasj.2020.30
  14. Reed, S.M., and Quelle, D.E. (2015) p53 Acetylation: Regulation and consequences. Cancers 7(1):30-69. Available at: https://www.mdpi.com/2072-6694/7/1/30/htm.
  15. Maclaine, N.J., and Hupp, T.R. (2009) The regulation of p53 by phosphorylation: a model for how distinct signals integrate into the p53 pathway. Aging 1(5):490-502. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2806026/
  16. Wikipedia contributors. (2020, August 1). Estrogen. In Wikipedia, The Free Encyclopedia. Retrieved 01:28, September 6, 2020, from https://en.wikipedia.org/w/index.php?title=Estrogen&oldid=970560042
  17. Kluska, K., Adamczyk, J., and Krezel, A. (2018) Metal binding properties, stability, and reactivity of zinc fingers. Coord. Chem Rev. 367:18-64. Available at: https://www.sciencedirect.com/science/article/pii/S0010854517305441
  18. Wikipedia contributors. (2020, July 4). Leucine zipper. In Wikipedia, The Free Encyclopedia. Retrieved 07:00, September 7, 2020, from https://en.wikipedia.org/w/index.php?title=Leucine_zipper&oldid=965962667
  19. Wikipedia contributors. (2020, April 15). Zinc finger. In Wikipedia, The Free Encyclopedia. Retrieved 18:28, September 7, 2020, from https://en.wikipedia.org/w/index.php?title=Zinc_finger&oldid=951116840
  20. Horsthemke, B. (2018) A critical view on transgenrational epigenetic inheritence in humans. Nat. Comm. 9:2973. Available at: https://www.nature.com/articles/s41467-018-05445-5#rightslink

Back to the Top