Mem Inst Oswaldo Cruz, Rio de Janeiro, VOLUME 117 | 2022
An updated gene regulatory network reconstruction of multidrug-resistant Pseudomonas aeruginosa CCBH4851
1Fundação Oswaldo Cruz, Programa de Computação Científica, Rio de Janeiro, RJ, Brasil
2Fundação Oswaldo Cruz, Instituto Nacional de Infectologia, Laboratório de Pesquisa Clínica em Doenças Febris Agudas, Rio de Janeiro, RJ, Brasil
3Laboratório Nacional de Computação Científica, Petrópolis, RJ, Brasil
4Instituto de Física, Universidade Federal Fluminense, Niterói, RJ, Brasil
5Fundação Oswaldo Cruz, Instituto Oswaldo Cruz, Laboratório de Pesquisa em Infecção Hospitalar Rio de Janeiro, RJ, Brasil
BACKGROUND Healthcare-associated infections due to multidrug-resistant (MDR) bacteria such as Pseudomonas aeruginosa are significant public health issues worldwide. A system biology approach can help understand bacterial behaviour and provide novel ways to identify potential therapeutic targets and develop new drugs. Gene regulatory networks (GRN) are examples of in silico representation of interaction between regulatory genes and their targets.
OBJECTIVES In this work, we update the MDR P. aeruginosa CCBH4851 GRN reconstruction and analyse and discuss its structural properties.
METHODS We based this study on the gene orthology inference methodology using the reciprocal best hit method. The P. aeruginosa CCBH4851 genome and GRN, published in 2019, and the P. aeruginosa PAO1 GRN, published in 2020, were used for this update reconstruction process.
FINDINGS Our result is a GRN with a greater number of regulatory genes, target genes, and interactions compared to the previous networks, and its structural properties are consistent with the complexity of biological networks and the biological features of P. aeruginosa.
MAIN CONCLUSIONS Here, we present the largest and most complete version of P. aeruginosa GRN published to this date, to the best of our knowledge.
Pseudomonas aeruginosa is a ubiquitous and opportunistic pathogen of which infections can affect the lower respiratory tract, skin, urinary tract, eyes, soft tissues, surgical wound, and gastrointestinal system, among others, leading to bacteraemia, endocarditis, and other complications, particularly in health care settings and in immunocompromised patients.(1, 2, 3) This Gram-negative bacteria is one of the most difficult to treat,(4) due to its intrinsic resistance, acquisition of resistance through chromosomal gene mutations, and horizontally acquired resistance mechanisms to multiple drugs.(3) Multidrug resistance (MDR) imposes the central difficulty in the selection of appropriate antibiotic treatment and reduces treatment options, especially in nosocomial settings such as healthcare-associated infections (HAI).(5, 6)
HAI is a severe public health issue related to high morbidity and mortality rates in hospitalised patients and high healthcare costs.(7) Worldwide, P. aeruginosa is one of the most prevalent agents of HAI.(8)
In Brazil, the Brazilian Health Surveillance Agency(9) ranked P. aeruginosa as the third most common causative agent of HAI in hospitalised patients in adult intensive care units (ICU) and the second in paediatric ICU, being nearly 40% of the reported strains resistant to carbapenems.(9) This class of beta-lactam antibiotics has been widely administered worldwide for treating P. aeruginosa infections and other MDR Gram-negative bacterial infections.(10) Indeed, a significantly higher mortality rate was observed among patients infected with MDR P. aeruginosa clones (44.6%) compared to those infected with non-MDR (24.8%).(6)
The most epidemiologically important mechanism of carbapenem resistance is the production of carbapenemases. Among MDR P. aeruginosa clinical isolates in Brazil, the most prevalent carbapenemase is the São Paulo metallo-β -lactamase (SPM-1).(11) This enzyme is encoded by the gene blaSPM-1, located on the P. aeruginosa chromosome,(12) and it confers resistance to almost all classes of beta-lactams. The first register of an MDR P. aeruginosa strain carrying the blaSPM-1gene found in Brazil is from 2003.(13) Widely disseminated in distinct Brazilian geographic regions, SPM-1-producing P. aeruginosa is associated with the clone SP/ST277 and has been isolated from hospital sewage systems, rivers, and microbiota of migratory birds.(11, 12) The strain P. aeruginosa CCBH4851 belongs to clone SP/ST277, and was involved in an endemic outbreak in Brazil in 2008.(14) This strain is resistant to most antimicrobials of clinical importance, such as aztreonam, amikacin, gentamicin, ceftazidime, cefepime, ciprofloxacin, imipenem, meropenem, and piperacillin-tazobactam, being susceptible only to polymyxin B, and has several mechanisms of mobile genetic elements.(2, 14)
To better understand P. aeruginosa’s behaviour, more comprehensive knowledge of gene expression patterns predicted by analysing its gene regulatory network (GRN) is of great value. A GRN consists of a set of transcription factors (TF) that interact selectively and nonlinearly with each other and with other molecules in the cell to regulate mRNA and protein expression levels.(15)
Mathematical modelling and computational simulations are approaches for analysing the GRN and other complex cellular systems influenced by numerous factors. These models allow the construction of biological networks, predict its behaviour under unusual conditions, identify how a disease might develop, and intervene in such development to prohibit cells from reaching undesirable states.(16) In addition, due to their lower cost and high accuracy, such approaches contribute to developing new drugs.(17)
The P. aeruginosa PAO1 strain had its genome sequence published in 2000, providing information regarding genome size, genetic complexity, and ecological versatility.(18) It has been extensively studied since then.
Published in 2011 by Galán-Vasquez et al.,(19) the first P. aeruginosa GRN was based on the PAO1 strain (PAO1-2011). Then, in 2019, Medeiros et al.(2) described a GRN reconstruction of CCBH4851 strain (CCBH-2019). Finally, in 2020, Galán-Vasquez et al.(20) published the updated GRN of P. aeruginosa with the PAO1 strain (PAO1-2020), which was much larger than the previous ones, containing new interactions. All works analysed the GRNs main structural properties and regulatory interactions.
This manuscript describes CCBH-2022, an updated GRN of the MDR P. aeruginosa based on the CCBH4851 strain, using as references both CCBH-2019 and PAO1-2020. We characterise regulators, target genes (TGs), transcription factors (TFs), auto-activation interactions, and influential genes of the network.
We analyse the main structural properties of the network, such as degree distribution, clustering coefficient, and relative abundance of network motifs. Finally, we compare the results of our analyses with those from previous GRNs.
MATERIALS AND METHODS
In this work we study the P. aeruginosa CCBH4851 strain, which is deposited at the Culture Collection of Hospital-Acquired Bacteria (CCBH) located at the Laboratório de Pesquisa em Infecção Hospitalar, Instituto Oswaldo Cruz/Fundação Oswaldo Cruz (Fiocruz) (WDCM947; 39 CGEN022/2010). The genome sequence is available in the GenBank database (Accession CP021380.2).(14)
CCBH-2019 and PAO1-2020 models were the bases for the reconstruction of this GRN. CCBH-2022 GRN results from the orthology analysis between the P. aeruginosa PAO1 and CCBH4851 gene sequences. CCBH-2022 model also inherits the orthologs between CCBH4851 and P. aeruginosa PA7(21) and P. aeruginosa PA14(22) strains, which were already present in CCBH-2019 GRN. The evolutionary histories of genes and species reconstruction are based critically on the accurate identification of orthologs.(23) Orthology refers to a specific relationship between homologous characters that arose by speciation at their most recent point of origin,(24, 25) a common ancestor. One of the most common approaches to determining orthology in comparative genomics is the Reciprocal Best Hits (RBH), which relies on BLAST.(26) An RBH occurs when two genes from different genomes find themselves the best scoring match in the opposite genome.(27, 28) Regulatory interactions between TFs and TGs in the PAO1 GRN were propagated to CCBH-2022 GRN if the TF and the TG formed an RBH. Medeiros et al.(2) designed and implemented an algorithm using the Python programming language to automate and generate a list of RBHs in a tabular format (available as Supplementary data). All the protein sequences from P. aeruginosa CCBH4851 (P1) and P. aeruginosa PAO1 (P2) were considered. BLAST+(29) was used to query the proteins from P1 against those from P2 (forward results) and P2 against P1 (reverse results). Each P1 query sequence was considered in turns, and its best match from P2 was identified from forwarding results (x). Likewise, each P2’s query sequence was considered from the reverse results, with its best match in P1 (x’). If x = x’, then they are RBH. Local BLASTP searches of each protein set against the other were executed, with the following cut-off parameters: identity ≥ 90%, coverage ≥ 90%, and E-value ≤ 1 e-5, showing the results in tabular format. If the search returned no hits, the gene was considered to have no ortholog within the opposite genome. Manual BLASTP was used to prevent false negatives, aligning these gene sequences with the opposite genome, considering the above parameters. If they still returned no hits but were present in either PAO1-2020 or CCBH-2019 models, the results were evaluated with a literature search to determine if they were accurate and whether they should be part of CCBH-2022.
The final GRN table is available as Supplementary data and is organised into six columns: Regulatory gene, Ortholog of the regulatory gene, Target gene, Ortholog of the target gene, Mode of regulation, and Reference. The first column lists the regulatory genes of P. aeruginosa CCBH4851, while the second column contains orthologs of regulatory genes in the reference strain (PAO1 and PA7 or PA14 from the exclusive interactions in CCBH-2019; the same applies to TG’s orthologs). The third column refers to the target gene in P. aeruginosa CCBH4851, while the fourth column lists orthologs of TGs in the reference strain. Finally, the fifth column describes the mode of regulation, and the sixth column indicates the corresponding data source.
The interactions between transcription factor proteins and the genes they regulate in an organism define a directed graph. For the computational analysis, the structure of GRN can be represented as a directed graph, formed by a set of vertices (or nodes) connected by a set of directed edges (or links). Basic network measurements are related to vertex connectivity, the occurrence of cycles, and the distances between pairs of nodes, among other possibilities.(30)
The degree of vertices is the most elementary characterisation of node i, and k(i) is defined as its number of edges. In directed networks, there are incoming (k-in degree) edges and outgoing (k-out degree) edges.(31) The degree distribution can follow a functional form P(k) = Ak -γ , called power-law distribution, where P(k) is the likelihood that a randomly chosen node from the network has k direct interactions, A is a constant that ensures that the P(k) values add up to one, and γ is the degree exponent.(32, 33, 34, 35, 36) According to Albert,(37) this function indicates high diversity of node degrees, with the P(k) value decaying as a power law that is free of a characteristic scale, resulting in the absence of a typical node in the network that could be used to characterise the rest of the nodes. Most real networks with structural information available exhibit this scale-free behaviour, deviating from a Poisson distribution expected in a classical random network.(38, 39)
Studies have shown a scale-free structure in cellular metabolic networks,(32, 40) protein interaction networks, including in cancer,(41, 42) transcription regulatory networks, and GRN.(20, 43, 44, 45) Following the literature,(36, 37, 46, 47, 48) there are some qualitative and quantitative characteristics to ensure that a network is scale-free: the power-law distribution appears as a straight line on a log-log plot; the γ value usually is in the range 2<γ <3; and the presence of high-degree nodes, called hubs, the most highly connected nodes,(47) with most nodes clustered around them. The hubs demonstrate the absence of a uniform connectivity distribution in the network, presenting the 80-20 rule (also referred to as the Pareto principle), with small-degree nodes being the most abundant. However, the frequency of high-degree nodes decreases slowly.(37) Hubs are fundamental for determining therapeutic targets against an infectious agent.(2) Scale-free networks are heterogeneous,(49) so random node disruptions generally do not lead to a significant loss of connectivity. However, the loss of the hubs causes the breakdown of the network into isolated clusters.(50) Some studies validate these general conclusions for cellular networks.(51, 52, 53)
In the GRN, determining the vertices with the highest k-out degrees is a method for identifying a hub,(2) The degree threshold is the exact number of interactions that characterise a hub, and this criterion differs from one study to another.(54) The degree threshold adopted in this work was the average number of connections of all nodes having at least two edges, resulting in a cut-off value of 16 connections.
Motifs are connectivity patterns, a small set of recurring regulation patterns from which the networks are built(55, 56) that are associated with specific functions.(57) A triangle, i.e., three fully connected vertices, is the simplest type of motif.(47) These genes are a regulator, X, which regulates Y, and gene Z, which is regulated by both X and Y.(58) Triangles can be closed (three connections within the set) or open (two edges).(38) This 3-genes motif is the feedforward loop (FFL) and the most common in GRN, appearing in gene systems in bacteria and other organisms,(59, 60) with the possibility of either activation or repression in each of the three regulatory interactions.(61) The coherent type-1 FFL and the incoherent type-2 FFL occur more frequently in transcriptional networks.(58)
The clustering coefficient is the probability that two genes with a common neighbour in a graph are also interconnected.(19) This measure has two popular definitions: the local and global clustering coefficient. The local clustering coefficient of vertex i, Ci, is defined as Ci = 2ei / ki (ki - 1), where ei is the number of edges connecting node i with degree ki, and ki (ki - 1) / 2 is the maximum number of edges in the neighbourhood of node i.(36) In GRNs, the local clustering coefficient C(i) is interpreted as the interaction between genes forming the regulatory groups.(2) The clustering coefficient of a network, C, is calculated by the average of Ci over all vertices.(19, 62) Not considering the directionality of the edges, the global clustering coefficient is the ratio between the number of closed triangles and the total number of triangles (open or closed) in the network.(2) C(k) represents the mean clustering coefficient over the vertices with degree k.(36) Some biological networks tend to present high clustering coefficient values, e.g., in the protein-protein interaction network of S. cerevisiae, <C> ≈ 0.18.(30, 47)
The network density measure is the number of edges of the network over the maximum possible number of edges, measuring the interconnectivity between vertices, and is strongly correlated to the potential to generate gene expression heterogeneity.(63) The network diameter is the path length between the two most distant nodes.(36) The average path length is the measure that indicates the distances between pairs of vertices (the average of the shortest path length over all pairs of nodes in the network).(46)
Several genes are connected in the GRN. When the nodes interact through a direct or an indirect link (intermediate connections), they are considered part of a connected component. These associations are the concept of network connectivity, and for this analysis in the present work, network interactions were considered undirected.(2)
Analysing the structural characteristics (connected components, hubs, and motifs) can help determine the best approach to disturb a network to promote a desired phenotype in the cell.(64)
For CCBH-2022 structural analyses, the R programming language and RStudio were used.(65) Scales, dplyr, tibble, readr and igraph packages were used for data manipulation and plotting the structural analyses.(2, 20, 66) The igraph library was used to compute most properties described above: the in and out degrees, centrality, clustering coefficients, feed-forward loop motifs, connectivity, cycles, paths, and hierarchical levels analyses.(67)
The illustrations of the GRN, the hub’s network, and the connectivity analysis were made in Cytoscape.(68) All figures are presented with higher resolution in the Supplementary data.
The codes for the structural analysis in R and for finding RBH in python, implemented by Medeiros et al.,(2) and the CCBH-2022 file in CSV format are available as Supplementary data in our Github repository (https://github.com/FioSysBio/CCBH2022).
CCBH-2022 consists of 5452 regulatory interactions among 3186 gene products, of which 218 were identified as regulatory genes and 2968 as target genes. Of the 218 regulatory proteins,87 are TFs, 19 are sigma factors (SF), and 13 are RNAs.Of these 13 RNAs, 11 are SF as well. The tables containing their relations are presented in the Supplementary data.
Given the 6577 predicted protein-coding genes of P. aeruginosa CCBH4851, the model organism in this study, the current network represents roughly 50% of the genome, against 16.52% from CCBH-2019.
Specific regulatory genes and their interactions were kept as described in CCBH-2019, such as the ones resulting from the P. aeruginosa PA7 and P. aeruginosa PA14 orthology, and in dedicated biological databases and scientific literature, e.g., IHF(integration host factor). This bacterial DNA-bending protein, essential in gene expression regulation, is absent in the CCBH4851 genome. However, Delic-Attree et al.(69) demonstrated that P. aeruginosa contains the IHF protein composed of the products of the himA and himD genes. These genes act in combination as a TF for several TGs,(2) and all were listed as regulatory genes in CCBH-2019. Consequently, equivalent annotations to the previous CCBH4851 GRN were maintained.
CCBH-2022 has 5452 edges, and these interactions were classified into activation (“+”), repression (“-”), dual (“d”, when, depending on the conditions, the regulatory gene act as an activator or a repressor), and unknown (“?”), as described in biological databases and scientific literature. An illustration of CCBH-2022 is presented in Fig. 1.
Regarding the structural measurements of the updated network, the summarised statistical results are presented in Table I. It contains the standard measures (the number of nodes and edges, number of autoregulatory motifs, network diameter, and average path length), the number of feed-forward motifs, and clustering coefficients. Also, Table I presents a comparison with data from PAO1-2011, CCBH-2019, PAO1-2020, and CCBH-2022.
Since both CCBH-2022 and PAO1-2020 contain significant updates from their previous counterparts, the comparison between CCBH-2022 and PAO1-2020 is most relevant. CCBH-2022 had a density of 5.99e-04, slightly lower than the density of PAO1-2020 (6.07e-04) but showed the same order of magnitude. The diameter was 12, the same as CCBH-2019 and PAO1-2020 and higher than PAO1-2011, which was 9. The average shortest path distance was 4.67, higher than PAO1-2020 (4.01) but slightly lower than CCBH-2019 (4.80). Similar to the previous GRN, CCBH-2022 was disconnected, showing one large connected component (3102 genes) and 20 small connected components.
The degree distributions of the four networks can be seen in Fig. 2A-D, with A and B being the incoming and C and D the outcoming degree distribution. Fig. 2B, D is on a log-log axis, and the straight line is consistent with a power-law distribution. For k-in, the estimated value for γ was 2.79, within the range 2<γ <3, consistent with a power law distribution. For PAO1-2020, the corresponding value was 2.67, 2.89 for CCBH-2019 and 2.71 for PAO1-2011.
The distribution of local clustering coefficients can be seen in Fig. 2E. CCBH-2022 had a global clustering coefficient equal to 4.42e-03, higher than PAO1-2020 (3.03e-03). The scatter plot in Fig. 2F shows the correlation between the local clustering coefficient C(i) and the degree k(i).
The most frequent mode of regulation in CCBH-2022 is activation, occurring in 70,2% of the total interactions in the network, followed by roughly 12% of repression mode and 17.8% of dual or unknown mode. Autoregulation occurs when a gene regulates its expression, and the prevalence in CCBH-2022 is of negative autoregulatory motifs.
The most abundant motif in all four networks was the coherent type I FFL, with 239 in CCBH-2022 (PAO1-2011: 82; CCBH-2019: 79; PAO1-2020: 226). In addition, there were 10 incoherent type II FFL motifs in CCBH-2022 (PAO1-2011: 3; CCBH-2019: 4; PAO1-2020: 8).
Table II shows the 30 most influential hubs in CCBH-2022 and PAO1-2020.
An analysis was performed to determine whether the hubs are interconnected through direct interactions (Fig. 3).
The reconstruction and analysis of the P. aeruginosa GRN contribute to a better understanding of its antibiotic resistance mechanisms. It also contributes to a greater knowledge of related cellular behaviours, such as adaptation and pathogenicity, mainly based on an MDR strain such as CCBH4851.
In this work, we have good coverage of roughly 50% of the genome on this updated network. The genome of reference strain PAO1 has 6.2Mbp, and PAO1-2020 has a coverage of 50% as well, with 5040 interactions and 3006 genes.(20) However, considering that the CCBH4851 genome has 6.8Mbp and has 5452 edges, and 3186 nodes, we can affirm that, to the best of our knowledge, this study presents the largest GRN of P. aeruginosa that has been assembled to date.
On the structural aspects, the charts in Fig. 2 and data in Table I make clear that CCBH-2022 represents a substantial improvement in terms of network completeness and complexity when compared with the previous P. aeruginosa GRNs since it includes more nodes, edges, and network motifs, and when comparing clustering coefficients (Fig. 2E-F). For the in silico approach, the network structural analysis is essential to understand the network architecture and performance.
The structural differences between CCBH-2022 and PA01-2020 results from additional information available due to the new version of PAO1 and recent experimental work on characterising the complete closed genome of P. aeruginosa CCBH4851.(104)
The structural measures of CCBH-2022, such as node degree distribution and clustering coefficient, are consistent with a qualitative description of a scale-free network type. Indeed, the degree distribution followed the power-law distribution (Fig. 2B, D): a small number of nodes had many connections (the hubs) and many nodes had few connections.
The local clustering coefficient and node degree correlation (Fig. 2F) showed that nodes with lower degrees had greater local clustering coefficients than nodes with higher degrees. These characteristics are representative of several biological processes, e.g., RNA binding.(105, 106)
CCBH-2022 showed a lower density value than PAO1-2020. The density of both GRNs was low due to the dynamic and structural flexibility of the networks, a characteristic typical of natural phenomena-based networks,(107) and because the nodes were not all interconnected.(2) However, CCBH-2022 density was lower probably because it has 20 small connected components disconnected from the larger one (Fig. 1), while PAO1-2020 had 12 separated components. The variation in the number of connected components is plausible due to their size difference and the biological information about interactions available for the reconstruction.
All the previous P. aeruginosa GRNs are disconnected graphs, showing one large connected component and a separated few small connected components, and there may be several reasons for this disconnection in specific points. According to Medeiros et al.,(2) interactions among all genes are not expected since some genes in an organism are independent of each other, compartmentalised or global, constitutive or growth phase-dependent, and are triggered in different growth phases, thus resulting in a disconnected network, which corroborates with the observed low density. The reason can also be from loss of existing interactions or a gain of interactions still not fully described from additional strain-specific blocks of genes acquired by horizontal gene transfer.(108) The large number of connected components found in CCBH-2022 results from connectivity parameters and the global clustering coefficient. Both structural measures are affected by the same biological behaviours.(107)
CCBH-2019 presented more negative regulations than PAO1-2011, a trend that continued between CCBH-2022 and PAO1-2020. Also, the most frequent regulatory activity in CCBH-2022 is activation, but ~50% of the autoregulation was negative, which may be a consequence of the increase in negative autoregulation in the overall network interactions compared to the previous ones. Negative auto-regulation in biological systems is commonly observed.(109) The Escherichia coli GRN exhibited the same pattern, with negative autoregulation prevailing concurrently with the positive regulation in the overall network.(110) The continuity of biological processes is ensured by positive autoregulation.(111) For example, quorum sensing, biofilm formation, secretion of toxins, virulence, and resistance factors production, once initiated, must reach a final stage to have the expected effect.(2) In CCBH-2022, genes involved in these processes, such as lasR,(80) rhlR,(90) pvdS,(83) algU,(72) dnr(102) and anr,(88) have positive autoregulation (and are amongst the 30 principal hubs).
Negative cycles are also crucial for life-sustaining cyclic processes such as metabolic processes(112) and cellular homeostasis.(113) In CCBH-2022, genes involved in arginine metabolism (iscR, desT, lexA, hutC, and mvat)(110) showed a predominance of negative mode of autoregulation. Negative autoregulation is associated with cellular stability.(114) It rapidly responds to variations in concentrations of proteins, toxins, and (or) metabolites to avoid undesired effects such as the energy cost of unneeded synthesis.(115) In CCBH-2022, algZ (transcriptional activator of AlgD, involved in alginate production),(116) lexA (involved in the SOS response),(117) metR (involved in swarming motility and methionine synthesis),(118, 119) ptxR (affects exotoxin A production)(120) and rsaL (quorum-sensing repressor)(121) presented negative autoregulatory interactions. Autoregulation is common among genes positioned upstream in GRN with crucial developmental functions.(122, 123)
The FFL motifs are essential for the modulation of cellular processes according to environmental conditions.(124) CCBH-2022 has 968 FFL motifs, which are patterns of structural structures, while PAO1-2020 has 702. There are 239 coherent type I FFL motifs in CCBH-2022, an abundant presence. According to Mangan and Alon(61) these motifs act as sign-sensitive delay elements, i.e., a circuit that responds rapidly to step-like stimuli in one direction (ON to OFF) and as a delay to steps in the opposite direction (OFF to ON); the temporary removal of the stimulus ceases transcription, so the activation of expression requires a persistent signal to carry on. The incoherent type II FFL motif was less represented but also found in all the GRNs, with a total of 10 in CCBH-2022. Contrastingly with the coherent FFL, the type II FFL acts as a sign-sensitive accelerator, i.e., a circuit that responds rapidly to step-like stimuli in one direction but not in the other.(61)
One last characteristic revealed by the structural analysis was the presence of hubs. The hub’s network (Fig. 3) shows the connection among their interactions; they are all interconnected and belong to the largest connected component of the GRN (Fig. 1A). This connectivity reflects the importance of the influential genes. The hubs can be considered the basis of the GRN. They are crucial in searching for potential drug targets for developing new drugs, as in direct interaction with their specific targets or for an indirect interaction with the subsequent process regulation triggered by them. CCBH-2022 hubs are mainly associated with efflux pump mechanisms (mexT, pmrA, soxR),(91, 95, 98) alginate biosynthesis (algU, algR, rpoN),(125) and biofilm formation (rpoN, rpoS, gacA, amrZ).(126)
Table II shows the 30 hubs of PAO1-2020. They are very similar to CCBH-2022 hubs, with some changes in the k-out connections. However, two CCBH-2022 hubs were not in the 30 most influential hubs of PAO1-2020: vfr, a global virulence factor regulator(127) that directly regulates 37 genes, and rsaL, associated with bacterial tolerance to antibiotics, including ciprofloxacin and carbenicillin(128) which directly regulates 34 genes. In PAO1-2020, vfr directly regulates only 12 target genes, while rsaL regulates 19, being one of these an exclusive PAO1-2020 interaction. There are 25 exclusive vfr interactions in CCBH-2022 and 16 exclusive rsaL interactions compared with PAO1-2020. These distinctions can be explained by the fact that the P. aeruginosa CCBH4851 strain is more virulent and multidrug-resistant and also because CCBH-2022 is 7.6% larger in the number of regulatory interactions (5452) than PAO1-2020 (5040), and 20,7% larger in the number of regulatory genes (218) than PAO1-2020 (173). The tables containing these exclusive interactions are in the Supplementary data. These facts strongly indicate that the operation of the main network hubs is not identical. The functioning of CCBH4851 is different, probably due to the greater influence of these two critical genes associated with multi-drug resistance and antibiotic tolerance mechanisms.
The Vfr gene’s role in regulating virulence factor production is related to the production of exotoxin A, a toxin that modifies specific target proteins within mammalian cells and induces necrosis in different tissues and organs in MDR P. aeruginosa infections.(129, 130) The Vfr gene also regulates the las and consequently the rhl quorum-sensing system, two systems that together control the expression of several genes associated with virulence factor production,(131) including alkaline protease, exotoxin A, pyocyanin, and rhamnolipid, as well as critical genes such as rpoS (the 5th most influential gene in CCBH-2022).(132) The signal receptor (R gene) is one of the essential components of the las and rhl QS systems. It is necessary for coding the transcriptional activator protein (R protein).(133) The lasR and rhlR genes are among the 20 principal hubs. In interactions present only in CCBH-2022, vfr regulates genes associated with virulence and alginate production.(134)
The rsaL gene has an important role in P. aeruginosa as a global regulator of quorum sensing, virulence, and biofilm formation.(103, 135) Fan et al.(128) showed that the mutation of rsaL increased bacterial tolerance to ciprofloxacin and carbenicillin. In interactions present only in CCBH-2022, rsaL regulates mostly genes of the phz1 and phz2 clusters, showing the control that rsaL has on phenazine expression in P. aeruginosa, driving the production of phenazine-1-carboxylic acid (PCA) which is further converted in the virulence factor pyocyanin.(136, 137) The pyocyanin production contributes to bacterial tolerance to ciprofloxacin and carbenicillin.(128)
Pseudomonas aeruginosa evades antimicrobial activity during treatment and exerts antimicrobial resistance by mainly intrinsic resistance mechanisms. Examples of resistance mechanisms are multi-drug efflux pumps, biofilm synthesis, enzymatic inactivation/degradation, drug permeability restriction, production of beta-lactamases, acquired resistance by a mutation in drug targets, and acquisition of resistance genes via horizontal gene transfer.(138)
There is a directed regulatory connection from alginate biosynthesis to iron metabolism and some antibiotic resistance mechanisms.(139) The algU, algR, rpoN, pvdS, and fecI genes are related to these processes(140, 141) and are among the most influential hubs.
Pseudomonas aeruginosa has multiple efflux pump systems that prevent the antimicrobial agents from accumulating in adequate concentration to cause an effect in the cell, extruding the drug out.(138) Efflux pump systems are associated with resistance to beta-lactams, fluoroquinolones, tetracycline, chloramphenicol, macrolides, and aminoglycosides.(142) Differential expression or mutations of efflux system genes are also contributing factors for carbapenem and aminoglycoside resistance.(143) The mexT, pmrA, soxR genes, related to multidrug antibiotic efflux pumps, are also amongst the most influential hubs.
The fleQ gene is also among the hubs and affects psl (polysaccharide synthesis locus) genes and regulates the efflux pump genes, mexA, mexE, and oprH, by brlR.(2, 144) The psl cluster comprises 15 exopolysaccharide biosynthesis-related genes organised in tandem that are important for biofilm formation.(145)
The mexT and soxR genes positively regulate an efflux pump system, and several virulence factors,(146, 147) and pmrA regulate efflux pumps and the polymyxin B and colistin resistance.(95, 148, 149)
Efflux pumps also help biofilm formation.(150) Biofilms are also related to protection from the host immune system and antibiotic penetration and tolerance, preventing them from entering the microbial population and inhibiting its action as a first-line defense mechanism.(123, 151, 152) The rpoN, rpoS, gacA, algR and amrZ hubs participate in the regulation of P. aeruginosa biofilm.
This system biology approach to characterise the MDR P. aeruginosa CCBH4851 regulatory network may lead to the development of strategies to disrupt the connectivity of these essential processes, thus, possibly decreasing the pathogenicity and suppressing the resistance of this bacterium.
In conclusion - This manuscript reports the reconstruction and structural analysis of the largest P. aeruginosa regulatory network available in the literature to date. This work can give new insights into identifying novel candidate antibiotic targets and contributes to an increase in our understanding of the behaviour of this bacterium.
This network’s dynamic model construction is one of our future studies, intending to help researchers working on experimental drug design and screening. The goal is to predict the dynamic behaviour better and improve the understanding of P. aeruginosa, allowing the simulation of normal and stress conditions to discover potential therapeutic targets and help develop new drugs against P. aeruginosa bacterial infection.
MSC - Performed the GRN reconstruction, its visualisation and drafted the manuscript; FMF - performed the structural analysis; FABS - supervised this study; FMF, MTS, MAM, APDCA and FABS - provided scientific advice and contributed to revision of the text. All authors read and approved the final manuscript. The authors declare that they have no competing interests.