Mem Inst Oswaldo Cruz, Rio de Janeiro, VOLUME 115 | JANUARY 2020
A new gene inventory of the ubiquitin and ubiquitin-like conjugation pathways in Giardia intestinalis
1Universidad EAN, Departamento de Ciencias Básicas, Bogotá, Colombia
2Universidad El Bosque, Laboratorio de Virología, Bogotá, Colombia
3Universidad Nacional de Colombia, Laboratorio de Investigaciones Básicas en Bioquímica, Bogotá, Colombia
BACKGROUND Ubiquitin (Ub) and Ub-like proteins (Ub-L) are critical regulators of complex cellular processes such as the cell cycle, DNA repair, transcription, chromatin remodeling, signal translation, and protein degradation. Giardia intestinalis possesses an experimentally proven Ub-conjugation system; however, a limited number of enzymes involved in this process were identified using basic local alignment search tool (BLAST). This is due to the limitations of BLAST’s ability to identify homologous functional regions when similarity between the sequences dips to < 30%. In addition Ub-Ls and their conjugating enzymes have not been fully elucidated in Giardia.
OBJETIVE To identify the enzymes involved in the Ub and Ub-Ls conjugation processes using intelligent systems based on the hidden Markov models (HMMs).
METHODS We performed an HMM search of functional Pfam domains found in the key enzymes of these pathways in Giardia’s proteome. Each open reading frame identified was analysed by sequence homology, domain architecture, and transcription levels.
FINDINGS We identified 118 genes, 106 of which corresponded to the ubiquitination process (Ub, E1, E2, E3, and DUB enzymes). The E3 ligase group was the largest group with 82 members; 71 of which harbored a characteristic RING domain. Four Ub-Ls were identified and the conjugation enzymes for NEDD8 and URM1 were described for first time. The 3D model for Ub-Ls displayed the β-grasp fold typical. Furthermore, our sequence analysis for the corresponding activating enzymes detected the essential motifs required for conjugation.
MAIN CONCLUSIONS Our findings highlight the complexity of Giardia’s Ub-conjugation system, which is drastically different from that previously reported, and provides evidence for the presence of NEDDylation and URMylation enzymes in the genome and transcriptome of G. intestinalis.
Ubiquitin (Ub) and ubiquitin-like modifiers (Ub-Ls) are small proteins that covalently attach to protein substrates and regulate various cellular processes such as the cell cycle, endocytosis, signaling pathways, intracellular trafficking, DNA repair and transcription, among others.(1)
Ub and Ub-Ls share two common features: a β-grasp fold composed of a five-stranded β-sheet and a C-terminal diglycine motif (GG) used for conjugation to target proteins. Currently, the Ub-L family includes 10 members: small ubiquitin modifier (SUMO), neural precursor cell expressed developmentally downregulated 8 (NEDD8) or Related to Ubiquitin 1 (RUB 1) in yeast, Ubiquitin-Related Modifier-1 (URM1), Ubiquitin-fold Modifier 1 (UFM1), autophagy-related proteins 8 and 12 (ATG8 and ATG12), interferon-stimulated gene 15 (ISG15), human leukocyte antigen (HLA)-F adjacent transcript 10 (FAT10), fan ubiquitin-like protein 1 (FUB1), and histone mono-ubiquitination 1 (HUB1).(2)
The first step in the Ub-conjugation cascade is activation, which is mediated by the E1 protein (UBA1 in the budding yeast). Further, Ub is transferred to a Ubconjugating enzyme or E2 (UBC) through a transesterification reaction. Finally, Ub ligase or E3 directly or indirectly transfers Ub to the substrate.(3)
E3 enzymes are a wide and diverse group of proteins that can be classified into three groups according to conserved structural domains and the transfer mechanism of Ub to the substrate. The family of Homologous to E6-associated protein carboxyl terminus (HECT) uses an indirect or two-step mechanism in which Ub is transferred from E2 to E3 and then to the substrate. The family of Really Interesting New Gene (RING) and RING-related E3s have a domain containing short motifs of cysteines and histidines, which coordinate two zinc ions (Zf-C3HC4, Zf-UBR, Zf-B Box, PHD, and Zf-Mynd domains) and act as scaffolds for conjugation, thus promoting direct transfer of Ub from E2 to the substrate.(3)
Ubiquitination is a reversible process in which the deubiquitinating enzymes (DUBs) hydrolyze poly-Ub chains or remove Ub molecules. The human genome codifies approximately 90 DUBs that are classified into six families: ubiquitin C-terminal hydrolases (UCHs), ubiquitin-specific proteases (USPs), Machado-Joseph Disease (MJD), Permuted Papain fold Peptidases of DsRNA viruses and Eukaryotes (PPPDE), ovarian tumor (OTU), metalloproteases with a JAMM/MPN motif, and the recently described motif interacting with Ub-containing protein (MINDY-4).(4)
Giardia intestinalis is a protozoan parasite that is considered to be an early divergent eukaryote; it lacks typical eukaryotic organelles such as mitochondria, peroxisomes, and Golgi apparatus. Giardia is an important eukaryotic model because it could have only the key components of the principal regulation systems that characterize higher eukaryotes.(5) Our laboratory previously reported that a large number of ubiquitinated proteins exist during the motile, active metabolic, and replicative stage of Giardia (trophozoite); 151 proteins distributed over 14 functional categories were identified. However, in the infective stage (cyst), only 55 ubiquitinated substrates were observed. Despite this marked decrease, ubiquitination of enzymes involved in cyst wall biogenesis suggested that Ub modification plays a crucial role in this stage of the cell cycle.(6) Therefore, Giardia might be a suitable biological model to define the fundamental elements of the Ub-conjugation pathway. The proteasome components have recently been analysed using bioinformatics, confirming findings reported earlier where a remarkable conservation was observed.(7)
Previous studies have identified three genes for Ub, one E1 enzyme, 11 E2 enzymes, four E3 ligases, and 9 DUBs;(8, 9, 10, 11, 12, 13) however, the most divergent genes may have been overlooked. Herein, we performed an exhaustive search using an intelligent systems approach based on hidden Markov models (HMMs) with profiles from the Pfam and Superfamily databases. Approximately 120 genes were identified, 88 of which correspond to new findings; among these genes, 76 were E3 ligases. Furthermore, we identified NEDD8 and URM1 conjugation pathways.
MATERIALS AND METHODS
Inventory building - The full proteome database from G. intestinalis was downloaded from Eupath database version 5.0 (available at http://giardiadb.org). In addition, 66 Pfam HMM profiles associated with Ub and Ub-Ls conjugation systems were selected and downloaded from the Pfam database version 31.0 (http://pfam. xfam.org/) (Table I). The HMMER package version 3.1 (http://hmmer.org) was used to search each Pfam profile against the entire proteome dataset using the hmmsearch tool and a threshold E-value ≤ 0.1. The repetitive tasks were automated using a perl script.
Each result was analysed for the respective domain, and other structural features were verified by basic local alignment search tool (BLAST) searches in the EMBL Pfam database (http://pfam.xfam.org/) and SMART analysis program (http://smart.embl-heidelberg.de/ smart/set_mode.cgi). Finally, each sequence identified in Giardia proteome was used as a query on BLASTp tool from www.giardiadb.org using the UniProtKB/ Swiss-Prot database to identify orthologs.
Expression analysis - To analyse gene expression profiles, RNA-seq and microarray datasets were employed. The data were downloaded from NCBI’s gene expression omnibus (GEO), with accession numbers GSE36490 and GSE25460 respectively, and parsed using in-house perl scripts.
Protein model building and evaluation - 3D models for Ub and Ub-Ls were obtained using Phyre2 (www. sbg.bio.ic.ac.uk/phyre2/). 2LRW, 2QJL, 1YX5, and 1A5R PDB structures were used to model Ub, URM1, NEDD8, and SUMO, respectively. The predicted models were subjected to energy minimisation using YASARA (http://www.yasara.org/), and the stereochemical stability was verified using PROCHECK and ProSA analysis (http://www.ebi.ac.uk/thornton-srv/databases/pdbsum; https://prosa.services.came.sbg.ac.at/prosa.php). Ramachandran plots were computed using Rampage to determine the stereochemical quality and predicted accuracy of the structures.
Phylogenetic analysis - Protein sequences were retrieved from Uniprot and aligned using CLUSTAL Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/). A neighbor-joining phylogenetic tree was constructed using the MEGA 7 program. Bootstrap values were obtained from 1000 replicates.
Our methodology for searching proteins that compose Ub and Ub-L conjugation systems in G. intestinalis identified 118 sequences that were classified into five groups: Ub and Ub-like, E1 and E1-like, E2, E3, and DUB enzymes (Table II).
The first group contains seven sequences: one for free Ub, two for fused Ub (Ub-L40 and Ub-S27), and four for Ub-like proteins: SUMO, URM1, RUB1, which is an ortholog of mammalian NEDD8, and UFM1. Among proteins identified, RUB1 is the closest to Ub (41% identical), whereas UFM1 is the least similar, with 16% identity. To characterise these sequences structurally, three-dimensional structure predictions for each protein were performed. The predicted structures for Ubiquitin, SUMO, NEDD8, and URM1 were similar to Ub-Ls as they possessed a β grasp fold; the characteristic diglycine motif at the C-terminus and the hydrophobic core (Ile-66, Leu-65, and Leu-74) which is conserved in URM1 orthologs, was also identified.(2) Ramachandran plots were then built to assess the quality of the structures. The plots indicated that the distribution of residues in the allowed and disallowed regions as well as the plot analysis predicting the stability of the models are within reliable ranges (Fig. 1).
The second group includes proteins identified with the ThiF domain, which is characteristic of Ub activating enzymes and members of the bacterial ThiF/MoeB/ HesA family. E1 enzymes for Ub (UBA1), SUMO (UBA2), NEDD8 (UBA3), URM1 (UBA4), and one ortholog of MoeB were identified (Table II). These enzymes harbor two catalytic activities required for activation; adenylation and thioester bond formation. The presence of the nucleotide binding motif, GXGXXGCE, and the catalytic cysteine motif, PZCTXXXXP, which are conserved among canonical E1s (UBA1, 2, and 3) in sequences GL50803_4083, GL50803_10661, and GL50803_6288 confirming our findings (Fig. 2). The sequences GL50803_12853 and GL50803_11436 exhibited some degree of similarity with the noncanonical E1 U BA4 a nd t he p rokaryotic p rotein M oeB, r espectively. UBA4 (MOCS3 in human) has dual functions in both protein urmylation and in sulfur transport within the tRNA thiolation pathways. MoeB activates MoaD through its C-terminal end during the first step to incorporate sulfur during molybdenum cofactor biosynthesis. (14) The comparison of Giardia sequences with their orthologs demonstrated that UBA4 is more closely related to MoeB than to canonical E1s (Fig. 2).
E2 enzymes are characterised by a highly conserved domain of approximately 150 amino acids or UBC, which contains a conserved catalytic cysteine and interacts with E1. E2 are classified into four groups based on the existence of additional extensions flanking the UBC domain that confer functional differences. Class I has only the UBC domain, classes II and III have one extension (N- or C-terminal), and class IV has both. In Giardia, we found 11 class I conjugating enzymes and one class III enzyme. GL50803_5921 has a C-terminal extension of 170 amino acids and does not present any similarities within the protein database. One of the 12 genes classified in this group is a new finding; GL50803_8638 shares 43% identity with UBC6 from Schizosaccharomyces pombe.
The fourth group includes 82 sequences corresponding to E3 ligases. Five are HECT enzymes: GL50803_137754 and GL50803_17386 showed 53% and 54% similarity to human E6-AP, which is the founding member of the HECT family.(15) GL50803_16321 is similar to Pub 1 from S. pombe, which ubiquitinates cdc25 (the mitotic phosphatase) in vivo. Furthermore, GL50803_32730 and GL50803_3117 are 31% identical to E3 ligases from Caenorhabditis elegans and Eimeria maxima, respectively (Table II).
The remaining sequences were classified as RING type, of which 71 displayed a typical C3HC4 domain. Within this class, we could distinguish three subgroups of metazoan E3 ligases. The first subgroup included 30 sequences containing ankyrin repeats and single C-terminal RING domain; this arrangement of domains has been reported in the XB3 family in plants.(16) Although 14 of these proteins show some degree of similarity with Mind bomb (MIB) proteins, they do not conserve the typical modular architecture, including two substrate recognition modules at the N-terminal, a series of ankyrin repeats, and multiple RING domains at the C-terminal. (17) Phylogenetic analysis was performed to determine whether Giardia ankyrin-RING proteins are members of the XB3 family. Our results revealed that the ankyrin- RING proteins were distributed into four distinct clusters, one of which appears to be related to XB3 (Fig. 3).
The second subgroup of RING ligases contains proteins with some degree of similarity to TRIM proteins, which are characterised by a tripartite motif composed of one RING domain, one or two B-Box domains, and a coiled-coil domain. TRIM proteins play important roles in various processes including cell growth, DNA damage signaling, senescence, tumor suppression, and innate antiviral response;(15) the sequences identified share similarity with RING domains exclusively (Table II, Fig. 4).
In the last subgroup, there are several orthologs of ring finger and CHY zinc finger domain-containing protein 1 (RCHY1) also known as p53-induced protein with a RING-H2 domain (Pirh2); these proteins regulate cell-cycle progression, cell proliferation, and cell death through the ubiquitination and degradation of diverse substrates such as p53, p27Kip1, p63, p73, c-Myc, and Chk2.(15)
Furthermore, there were E3 ligases involved in nuclear functions, orthologs for Bre1, Pep5, and Hel 2 required for the degradation of histones in yeast. Orthologs for RAD18, Pirh2, and the endonuclease SLX1 involved in DNA damage repair. RNF12, E3 ligase for c-Myc, and orthologs to E3s, which engage in apoptosis and cellular signaling (e.g., IAP and XIAP), were also identified.
In contrast to the high number of sequences with a RING domain, only six harbor RING-related domains; one protein containing the Zf-UBR domain, which is involved in ubiquitination/degradation through the N-terminal rule; two proteins containing a B-Box domain; and two proteins with a PHD domain (C4HC3).
The fifth group corresponds to deubiquitinating enzymes; three new findings were reported: One deNEDDylase (GL50803_10218); an OTU enzyme (GL50803_88556), and a member of the MINDY-4 family (GL50803_7349). Other OTU enzymes have been reported in protozoans such as Plasmodium falciparum, Cryptosporidium parvum, Toxoplasma gondii and Eimeria acervulina,(18, 19) while orthologs of MINDY have not yet been reported in parasites.
Finally, to verify the expression of the genes identified during the trophozoite stage, transcriptomics data from Franzen et al.(20) were used. These data were obtained from Illumina transcript analysis of WB strain assemblage A; genes with transcript levels formulated as fragments per kilobase per million fragments mapped (FPKM) < 0.5 were regarded as not expressed. From our analysis, transcription was detected in all of the genes but two (GL50803_16687 and GL50803_4430). FPKM ranged from 0.7 to 26.493, and a wide variation of expression levels within the five groups of the ubiquitination pathway genes was observed, with ubiquitin, sumo, nedd8, and urm1 with the highest levels of transcripts (Fig. 5, Supplementary data). Although we analysed the data available from encystation,(21) none of the genes were overexpressed during this differentiation process (data not shown).
Approximately 30 genes associated with the Ub-conjugation pathway have been identified in G. intestinalis, 12 of which were previously identified at our laboratory using BLAST.(10) This number is considerably lower than the hundreds of genes associated with this system reported in other eukaryotes. In humans, for example, there are two Ub activating enzymes (UBA1 and UBA6), approximately 40 conjugating proteins, and hundreds of E3 ligases.(1, 3) In this study, after an exhaustive search using the HMM, > 100 genes were identified.
In most organisms, Ub is codified in three forms: as a monomer, as polyUb, or as fusion proteins with ribosomal proteins. The Ub sequence is extensively conserved among all eukaryotes with similarities > 98% among humans, yeast, and apicomplexan parasites.(18) However, Giardia’s Ub gene is one of the most divergent sequences reported; it shares only 70% similarity with the human sequence.(8, 10)
Currently, the Ub-L family includes 10 members; some of them, such as FUB1, ISG15, and FAT10, are exclusive to metazoan and are involved in the immune response, T cell activation, and antiviral and antimicrobial defense, respectively.(2) Here, we identified SUMO, NEDD8, URM1, and UFM1 homologs; however, we did not identify ATG8 and HUB1. Orthologs of HUB1 have been identified from yeasts to humans, in apicomplexan parasites, and in Entamoeba, (deep-branching eukaryote, such as Giardia).(11, 18, 22) In S. cerevisiae, S. pombe, and humans, HUB1 is mainly involved in alternative splicing of pre-mRNA.(23) Notably, in Giardia, few introns have been reported, and splicing is not essential for the parasite’s viability.(24)
ATG8 plays a central role in the autophagy network, and its conjugation requires the activities of the ATG7 (activating) and ATG3 (conjugating) enzymes. In related organisms such as Entamoeba, Cryptosporidium, and Trichomonas vaginalis and in other protozoa parasites, the ATG8 system has been identified, and autophagy plays a principal role in the parasites’ survival.(11, 18, 22) In this study, ATG8, ATG7, and ATG3 enzymes were not identified, as previously reported by Bagchi et al.,(25) who suggested that in Giardia, autophagy does not operate by the classical mechanism based on these proteins.
UFM1 is a highly conserved protein in metazoa and plants but not in yeasts. Initially, this modification was considered to be metazoa-specific and was associated with the endoplasmic reticulum stress response; however, recently, Gannavaram et al.(26) demonstrated the existence of UFM1 and its conjugation enzymes: UBA5 (E1), UFC1 (E2), and UFL1 (E3) in Trypanosomatidae parasite proteomes. In Leishmania donovani, is associated with the mitochondria, and plays an important role in pathogenesis.(26) UFM1 homologs have not been identified in other unicellular parasites; although we identified a UFM1 domain-containing protein (GL50803_104982) with a high E-value (2.27e-30), we did not find the conjugation enzymes and the predicted structure was not typical for Ub-L (Fig. 1).
URM1 acts as a protein modifier (urmylation) and belongs to the superfamily MoaD and ThiS in prokaryotes; these are small sulfur carrier proteins involved in molybdenum cofactor (MoaD) and thiamin (ThiS) biosynthesis, respectively. The URM1 sequence and structure are similar to those of bacterial proteins than those of Ub-Ls, and its conjugation process depends entirely on UBA4.(14) We identified URM1, UBA4, and bacterial MoeB homologs; however, we did not identify MoaD or ThiS protein (Table II). The sequences for UBA4 and MoeB share approximately 24% sequence identity, and they are differentially expressed during encystation and temperature or redox stress response (data available at www.giardiadb. org), suggesting that they are involved in two different processes. However, whether URM1 can be activated by two enzymes, or if there is another member of the MoaD/ ThiS family to be identified, is yet unclear.
Among Ub-Ls, NEDD8 has the highest identity with Ub (approximately 60%) and is highly conserved from yeast to humans. NEDDylation is catalysed by the specific enzymes: UBA3, UBE2F, UBC12, RING-box proteins (Rbx1 and Rbx2) and is reversed by specific proteases, such as DEN1/SENP8 and UCH L3.(2) This conjugation cascade has been identified in other protozoans, such as Plasmodium spp, T. gondii, C. parvum, Entamoeba spp, and Trypanosoma brucei.(18, 27) Although cullins are the most abundant substrates, proteins such as DNA damage binding protein 1 (DDB1), translation elongation factor α1, the chaperone DnaJ, in addition to the NEDDylation enzymes have been reported as targets.(27) In this study we found the NEDD8 ortholog and enzymes involved in the conjugation and de-NEDDylation processes (Table II).
The covalent modification of proteins by SUMO is the unique ubiquitination-like process that was described previously in Giardia; a single gene for SUMO, SUMO activating Enzyme subunit 2 (SAE2), SUMO conjugating enzyme (Ubc9), and one deSUMOylase were reported by Vranych et al.(13) The same sequences were identified here; however, this finding differs from that of higher eukaryotes, in which multiple members of the SUMO family exist and SAE2 is a heterodimer of UBA1 and AOS1 subunits.(2) Although only two SUMOylated proteins, arginine deiminase and α-tubulin have been fully identified in Giardia, SUMOylation of other proteins and the participation of SUMOylation in encystation, cell-cycle progression, cell growth, and morphology maintenance were recently demonstrated.(28)
Regarding the ubiquitination process, the extensive conservation exhibited by the E2 enzymes enabled the identification of most of Giardia’s enzymes using basic local alignment tools in the past;(10) however, a novel noncanonical ubiquitin-conjugating enzyme (NCUBE) was identified here (GL50803_8638); NCUBE enzymes are localised in the lumen of endoplasmic reticulum (ER) and participate in ER-associated degradation. Nevertheless, Giardia’s protein and one ortholog of Entamoeba lack the hydrophobic C-terminal tail required for ER localisation; the function of this “truncated” Ubc6-like protein has not yet been established.(29)
The E3 Ub ligase family is the largest family of proteins involved in ubiquitination because it is required for specific substrate recognition. In Giardia, we identified approximately 80 enzymes, with five containing the HECT domain; our results agree with those reported for Entamoeba, apicomplexan, and yeast, which are organisms that codify for five or six HECT ligases.(22)
Considering the RING E3 ligases, our results largely differ from those previously reported by Gallego et al.;(10) they identified only one putative protein with the RING domain, whereas we identified approximately 70. Similar to other eukaryotes, in Giardia, ubiquitination is involved in numerous cellular processes such as protein quality control, metabolic pathways, endocytosis, cell signaling, DNA/RNA metabolism, and differentiation.(6) The large and diverse group of RING ligases reported here corresponds to multiple processes regulated by ubiquitination.
The over-representation of members of the XB3 family (Ankyrin repeat C3HC4 RING finger) (Figs 3, 4) suggests an important role in parasite growth; this type of protein has not been reported in other parasites or in yeast. Our data provide the first reference for the XB3 family in a single-cell eukaryote. In plants, these proteins play important roles in development, stress responses, cell death induction, and pathogen response.(16)
Another interesting result is the absence of multisubunit E3 enzymes, which are important cell-cycle regulators in higher eukaryotes. The anaphase promoter complex (APC) and SCF are E3 ligases that ubiquitinate cyclins involved in the M and G1/S phases, respectively. The APC/C holoenzyme comprises at least 14 different proteins distributed into three subcomplexes: a scaffolding subcomplex, a catalytic subcomplex (containing APC11 RING ligase, cullin-like subunit APC2, and APC10), and the substrate recognition subcomplex.(30) Our search identified only one small protein (GL50803_8432; 78 amino acids) that is 44% similar to the APC11 from Mus musculus (84 amino acids). The SCF complex consists of three subunits: Skp1/Cullin1 (scaffolding protein), Rbx1 (RING Ligase), and an interchangeable F-box-protein that determines substrate specificity. This E3 complex ubiquitinates the S phase CDK inhibitor (Sic1p/p27kip1) involved in G1/S phase progression.(30) Our search failed to identify any sequence for Skp1 and Cullin1 homologs, although two Rbx proteins were identified (GL50803_8432 and GL50803_8241). These results agree with those reported by Eme et al.(31) and Gourguechon et al.;(32) none of them found any of the highly conserved components of these Ub ligases in Giardia. In addition, Gourguechon’s study showed that APC substrates as mitotic cyclin B, aurora and polo-like kinases were not ubiquitinated and that proteasome inhibition did not cause cell-cycle arrest, suggesting that no ubiquitination and protein degradation via proteasome are involved in Giardia cycle progression.
In 1994, Krebber suggested that the ubiquitination system in Giardia was a basal acquired system;(8) nonetheless, the set of genes reported here demonstrates that the Giardia genome codifies members of the Ub and Ub-L conjugation system, similar to that described for higher eukaryotes.
ICC participated in the study design, accomplished gene inventory and 3D structure modeling; EPC participated in the study design, gene inventory validation, analysed the data, and wrote the manuscript; MW participated in the study design and reviewed the manuscript. All authors reviewed the results and approved the final version of the manuscript. The authors declare that there is no conflict of interest regarding the publication of this paper.