Mem Inst Oswaldo Cruz, Rio de Janeiro, 113(1) January 2018
Simple Protocol for population (Sanger) sequencing for Zika virus genomic regions
1NDSS, Retrovirus Laboratory, Virology Center, Adolfo Lutz Institute, São Paulo, Brazil
2NDTV, Virology Center, Adolfo Lutz Institute, São Paulo, Brazil
3NDR, Virology Center, Adolfo Lutz Institute, São Paulo, Brazil
BACKGROUND A number of Zika virus (ZIKV) sequences were obtained using Next-generation sequencing (NGS), a methodology widely applied in genetic diversity studies and virome discovery. However Sanger method is still a robust, affordable, rapid and specific tool to obtain valuable sequences.
OBJECTIVE The aim of this study was to develop a simple and robust Sanger sequencing protocol targeting ZIKV relevant genetic regions, as envelope protein and nonstructural protein 5 (NS5). In addition, phylogenetic analysis of the ZIKV strains obtained using the present protocol and their comparison with previously published NGS sequences were also carried out.
METHODS Six Vero cells isolates from serum and one urine sample were available to develop the procedure. Primer sets were designed in order to conduct a nested RT-PCR and a Sanger sequencing protocols. Bayesian analysis was used to infer phylogenetic relationships.
FINDINGS Seven complete ZIKV envelope protein (1,571 kb) and six partial NS5 (0,798 Kb) were obtained using the protocol, with no amplification of NS5 gene from urine sample. Two NS5 sequences presented ambiguities at positions 495 and 196. Nucleotide analysis of a Sanger sequence and consensus sequence of previously NGS study revealed 100% identity. ZIKV strains described here clustered within the Asian lineage.
MAIN CONCLUSIONS The present study provided a simple and low-cost Sanger protocol to sequence relevant genes of the ZIKV genome. The identity of Sanger generated sequences with published consensus NGS support the use of Sanger method for ZIKV population studies. The regions evaluated were able to provide robust phylogenetic signals and may be used to conduct molecular epidemiological studies and monitor viral evolution.
Zika virus (ZIKV)is a member of the genus Flavivirus, of the Flaviviridae family. ZIKV was firstisolated in Uganda in 1947 from a sentinel rhesus monkey. Since then, only sporadiccases of human infection and isolation from mosquitoes of the genus Aedeshas been reported in Africa and Asia. ZIKV has been considered an emergent pathogensince 2007, when an epidemic was reported in Micronesia (Faria et al. 2016 8 ).
ZIKV was firstidentified in the Americas in March 2015 during an outbreak of an exanthematicdisease in the state of Bahia, Brazil (Campos et al. 2015). In September 2015,an increase in the number of infants born with microcephaly was observed inareas where ZIKV had been previously reported, and by mid-February 2016, morethan 4300 cases of microcephaly had been notified in the country. Due to technicallimitations in serological tests, biological confirmation of ZIKV infectionis based mostly on detection of viral RNA in serum/plasma or urine by real-time-polymerasechain reaction (qRT-PCR). It is well known that ZIKV RNA is detectable in urineat a higher load and with a longer duration than in serum (Gourinat et al. 2015 11 ).Specific antibody detection is mostly hampered due to serological cross-reactivitywith other circulating flaviviruses such as dengue virus or yellow fever virus(Lanciotti et al. 2008, Tappe et al. 2014, 2015).
Recently studieson molecular epidemiology supports the hypothesis that the Brazilian ZIKV strainsbelong to the Asian lineage (Faria et al. 2016 8 ). Genetic and genomic evaluationis important to viral evolution knowledge, vaccine development, improvementof diagnostic assays, as well as contribute to the understating on non-vectorialtransmissions pathways, including sexual transmission (Barjas-Castro et al.2016, Bonaldo et al. 2016 3 ). Next-generation sequencing (NGS) is commonly usedin quasiespecies diversity studies, especially of RNA viruses, and a powerfultool for phylogenetic studies, generating consensus of major variant sequences.ZIKV sequences have been obtained using this methodology (Behura et al. 2016 2 ,Gu et al. 2017 13 ). Nevertheless, major variant sequences can also be obtainedusing conventional Sanger sequencing. The aim of this study was to develop asimple and robust Sanger sequencing protocol targeting relevant genetic regionsof the ZIKV, as envelope protein and nonstructural protein 5 (NS5). In addition,phylogenetic analysis of the ZIKV strains obtained using the present protocoland their comparison with previously published NGS sequences were also carriedout.
Samples- Six cell culture supernatants obtained from serum samples (see below) andone urine sample were available to develop the protocol. All samples were previouslytested by qRT-PCR (Lanciotti et al. 2008) that confirmed ZIKV infection. Twoof the six samples were obtained from a donor (isolate BR17829) and recipient(isolate BR22482) pair of a reported ZIKV transmission through blood transfusion(Barjas-Castro et al. 2016 1 ). One serum sample and the urine sample pair werefrom a same patient (isolate BR31016), with three serum samples from other,unrelated patients (isolates BR18147/ZH100, BR19147/23101702 and BR2716). Theserum samples (20 µL)were first inoculated in C6/36 cell lineage (Aedes albopictus cells,ATCC-CRL-1660) in order to replicate flaviviruses to high titers (Ciota et al.2007). Cell cultures were incubated for nine days at 28ºC. Indirect immunofluorescentantibody (IFA) tests were performed using flavivirus polyclonal antibodies asdescribed by Gubler et al. (1984) in order to confirm ZIKV infection. Theseisolates were storage in -70ºC. The isolates obtained from C6/36 cellswere then inoculated in Vero cells (African green monkey kidney cells, ATCC-CRL-81)and incubated at 37ºC with CO2 5%. The tubes were observed daily,and when a cytopathogenic effect was observed, the supernatants were used toconduct the molecular assays.
Primers design- In order to design primer suitable PCR sets to entire ZIKV envelope proteinand partial NS5, 15 sequences (KU647676, KU509998, KU681082, KJ776791, KR815990,KR815989, KR816336, KU497555, KU365778, KU365777, KU365779, KU365780, KU232301,KU232300 and KU232298) of ZIKV were obtained from NCBI and imported into BioEditsequence alignment editor (version 188.8.131.52) program. The process of primer designingwas conducted manually, and no automated software packages were used.
The primers weredesign to conduct a nested reverse transcription-polymerase chain reaction (RT-PCR)protocol. For ZIKV envelope protein amplification the primers set used were:(i) First round (one-step RT-PCR) Zika1_out_Forward AGCAGCAGCTGCCATCGCTTG (777-797bp)and Zika2_out_Reverse GTACCT GTCCCTCCAGGCTTC (2478-2458pb), resulting in a 1,701kb product; and (ii) Second round (nested PCR) Zika3_Inner_Foward GATACTGCTGATTGCCCCGGCATA(843-866pb) and Zika4_Inner_Reverse TTCTTTGAGAAGTCCACCGAGCAC (2414-2391pb),generating a fragment of 1,571 Kb. These primers pair allowed the amplificationof the entire ZIKV envelope protein, comprising nucleotide position 873-2370based on MR-766 strain (accession number NC_012532) (Kuno & Chang 2007).
ZIVK NS5 proteinamplification used an (i) outer primer pair (one step RT-PCR) Zika1_out_fowardTGAGAGGAGAGTGCCAGAGT (8891-8910pb) and Zika2_out_reverse ATAAAGGAGCTGCCACATTTG(9843-9864pb), producing a 0,973 kb fragment; and (ii) inner pair (nested PCR)Zika3_inner_foward TGGAAAGGCCAAGGGCAGC (8958-8976pb) and Zika4_inner_ReverseGTGGCGGCAGGGAACCACAAT (9736-9756pb), generating a fragment of 0,798 Kb. Thesepair of primers permitted the partial amplification of the NS5 protein, comprisingnucleotide position 8958-9756 based on MR-766 strain (accession number NC_012532)(Kuno & Chang 2007).
Nucleic acidextraction - ZIKV RNA was extracted from both Vero cell culture and urineby (QIAmp® viral RNA mini kit (Qiagen, Hilden, Germany) according manufacture'sprotocol. Urine was extracted in duplicate: directly from sample and after concentrationof 1 mL of urine by centrifugation (21,000 x g) for 1 h at 4ºC.
RT-PCR and nestedPCR protocols - In ZIKV envelope protein one-step RT-PCR, extracted RNAwas reverse-transcribed and amplified using SuperScript® III One-step RT-PCRsystem with Platinum Taq High Fidelity (Life Technologies, USA). The total reactionmixture volume of 50 ?Lcontained the following: 2x reaction mix (25 µL),10 µMprimers (1 µLeach), enzyme mix (reverse transcriptase and Taq polymerase, 1 µL),extracted viral RNA template (10 µL),and RNase-free water (12 µL).RT-PCR conditions for envelope amplification were as follows: reverse transcriptionat 55ºC for 30 min, initial PCR activation at 94ºC for 5 min, 18 amplificationcycles of denaturation at 94ºC for 30 s, annealing at 56ºC for 30s, extension at 68ºC for 2 min 30 s, 17 amplifications cycles of denaturationat 94ºC for 30 s, annealing at 60ºC for 30 s, extension at 68ºCfor 2 min 30 s (a total of 35 amplification cycles), and a final extension at68ºC for 10 min. For nested PCR, the RT-PCR product (2,5 µL), 10µMprimers (1 µL each), and RNase-free water (8 µL) were added to aGo Taq® Green Master Mix 2X (12,5 µL) (Promega Biosciences, CA). PCRconditions were as follows: initial denaturation at 94ºC for 3 min, 35cycles of denaturation at 94ºC for 30 s, annealing at 55ºC for 30s extension at 72ºC for 2 min, and a final extension at 72ºC for 10min.
In ZIKV NS5 genomicregion one-step RT-PCR, extracted RNA was reverse-transcribed and amplifiedusing SuperScript® III One-step RT-PCR system with Platinum Taq High Fidelity(Life Technologies, USA). The total reaction mixture volume of 50 µLcontained the following: 2x reaction mix (25 µL),10 µMprimers (1 µLeach), enzyme mix (reverse transcriptase and Taq polymerase, 1 µL),extracted viral RNA template (10 µL), and RNase-free water (12 µL).RT-PCR conditions for NS5 amplification were as follows: reverse transcriptionat 55ºC for 30 min, initial PCR activation at 94ºC for 5 min, 35 amplificationcycles of denaturation at 94ºC for 30 s, annealing at 53ºC for 30s, extension at 68ºC for 1 min 30 s, and a final extension at 68ºCfor 10 min. For nested PCR, the RT-PCR product (2,5 µL), 10 µMprimers (1 µL each), and RNase-free water (8 µL) were added to aGo Taq® Green Master Mix 2X (12,5 µL) (Promega Biosciences, CA). PCRconditions were as follows: initial denaturation at 94ºC for 3 min, 35cycles of denaturation at 94ºC for 30 s, annealing at 58ºC for 30s extension at 72ºC for 2 min, and a final extension at 72ºC for 10min.
The products ofRT-PCR and nested PCR were loaded onto a 1.5% agarose gel and visualised underultraviolet light.
Sequencing- The 1,571 Kb PCR product (complete protein) of ZIKV envelope protein amplificationwas sequenced using eight primers. Four primers were designed to sequencingthe ~800 bp fragment obtained from partial NS5 region amplification (Table).Each sequencing reaction was performed using 0,5 µLof BigDye Terminator v3.1 cycle sequencing kit (Applied Biosystems) and 1,6µLfor each primer (1 µM) in 10 µLfinal volume per reaction. Dye-labelled products were sequenced using an ABI3130 sequencer (Applied Biosystems). Sequencing chromatograms were edited manuallyusing Sequencher 4.7 software (Gene Codes, USA).
Phylogeneticanalysis - Sequences were aligned using Clustal W multiple alignment andedited manually in Bioedit. Phylogenetic relationships were inferred with Bayesiananalysis using Markov chain Monte Carlo (MCMC) with BEAST v.1.8.0 under GTR+ G + I model. The MCMC chain was run for 10,000,000 generations, sampling every1,000 generations and a constant coalescent tree prior. The maximum clade credibilitytree (MCCT) was chosen from the posterior distribution of 10,001 sampled treeswith the program TreeAnnotator version v1.8.0. Statistical support for the inferredBayesian trees was assessed by posterior probabilities.
Nucleotide sequencesaccession numbers - The nucleotide sequences were deposited on GenBank underthe following accession numbers: MF048802-MF048807 for envelope genes, and MF077458-MF07763NS5 genes.
Ethical approval- This study was carried out in accordance with the Declaration of Helsinkias revised in 2000, and approved by the Ethics Committee of the Adolfo LutzInstitute, São Paulo, Brazil. Study participants were not required toprovide informed consent as this study was considered by the Ethics Committeeto be part of routine surveillance activities.
Six cell culturesamples and one clinical sample (urine) were analysed by means of both ZIKVenvelope and NS5 genes amplification, following Sanger sequencing. The presenceof inhibitors for ZIKV detection was not evaluated, and no method was used toremove them from urine sample. However, a viral concentration method was employedin this sample. ZIKV envelope and NS5 genes were successfully amplified in cellculture samples, resulting in a specific 1701 pb and 798 pb amplification product,respectively. The ZIKV envelope gene was also effectively amplified from urinesamples (concentrated and non-concentrated). However, albeit several attemptsto obtain the NS5 fragment from these samples, they were unsuccessful.
A total of sevensequences of complete envelope protein (1515 bp) and six sequences of partialNS5 region (667 bp) were obtained in this study. The sequences were alignedusing BioEdit sequence alignment editor (version 184.108.40.206) program. ZIKV enveloperegion showed to be more conserved than the NS5 gene, with a significant lowerpercentage of nucleotide substitutions (0.26 x 0.95, respectively p = 0.042,Fisher two-tailed). Two NS5 sequences (isolates BR18147/ZH100 and BR31016),presented ambiguities at positions 495 and 196, respectively. The ambiguityfound in isolate BR18147/ZH100 (MF077463) is synonymous (R = A or G), with bothnucleotides coding for a Lysine at position 165 (Fig.1A); whereas the ambiguity in sample BR31016 (MF077459) (Y = C or T) leadsto a non-synonymous amino acid substitution at position 66 (coding for Histidineor Tyrosine) (Fig.1B).
Two strains belongingto African lineage (HQ234500 and djLC002500), six strains from Asian lineage(HQ234499, EU545988, KU681082, KU509998, KJ776791 and KU647676), and nine Braziliansstrains (KU707826, KU365778, KU365780, KU365779, KU365777, KU926309, KU497555,KU321639, KU527068) were used to infer genetic relationships between the worldwideZIKV samples and the strains characterised in the present study. The newly identifiedZIKV strains in countries of Americas are all close to Asian and Pacific strainsas well as the samples characterised in the present study. In addition, both,envelope and NS5 genes of Brazilian ZIKV strains detected here could be discriminatedinto three clusters phylogenetically distinct, designated A, B and C. GroupA is formed by donor and recipient samples (isolates BR22482 and BR17829), GroupB is composed by cell culture isolate BR2716, and Group C is constituted bycell culture and urine pair samples (isolate BR31016) and other two distinctcell culture isolates (BR18147/ZH100 and BR19147/23101702) (Fig.2A-B).
According to theidentity score provided by BLAST/NCBI, Group A showed 100% of nucleotide identityto two samples previously detected in Recife, Brazil in 2015 (KR872956 and KX197192)(Donald et al. 2016 7 ) for both envelope and NS5 genes. Group B exhibited 99%of nucleotide identity (envelope and NS5) to more than 20 ZIKV strains, includingsamples from French Guiana detected in 2015 (KU758871 and KU758870), and PuertoRico identified in 2015 (KX087101 and KX601168) and 2016 (KY075934). Group Cdisplayed a homology of 100% at the envelope protein to 17 ZIKV sequences, comprisingfour strains detected in Nicaragua in 2016 (KY765327, KY765326, KY765325 andKY765324), two strains isolated in Honduras in 2016 (KX262887 and KY785414),six strains from French Polynesia detected in 2013 (KX447519, KX447518, KX447513,KX447510 and KX369547) and 2014 (KX447520), two strains identified in UnitedStates in 2016 (KY325479 and KY325465) (Grubaugh et al. 2017 12 ), two strains reportedin Paraiba, Brazil in 2015 (KX576684 and KX280026), and one strain detectedin Rio de Janeiro, Brazil in 2016 (KY014313) (Metsky et al. 2017 18 ). Finally,Group C presented a similarity of 99% at the NS5 region to more than 30 strains,highlighting strains isolated in Central America: Peru in 2016 (KY693679), Hondurasin 2016 (KY785452), Mexico in 2016 (KY606272), and Ribeirão Preto, Brazil(KY559015).
To evaluate theinformative potential of Sanger population sequences compared to NGS sequences,and to verify if the region here sequenced are useful to provide phylogeneticinformation suitable to molecular epidemiology studies, we used the recipientsequence generated by Barjas-Castro et al. (2016) using NGS (KU321639) and therecipient sequence generated here using Sanger sequencing (MF48805). A 1515nucleotide analysis revealed that the two samples were 100% identical and asexpected clustered together in a monophyletic branch with high posterior probability(i.e. µ0.95). The partial ZIKV sequences generated by Sanger in this study were evaluatedamong well described African and Asian lineages reference sequences, showinggood support to discriminate this lineages (Fig.3).
One of the potentialthreats to public health microbiology in 21st century is the morbidity causedby ZIKV. The severity of ZIKV infection urged World Health Organization (WHO)to declare this virus as a global concern (Shankar et al. 2017 19 ). The rapid geographicexpansion of ZIKV, genetic diversity, multiple transmission pathways, adaptabilityto infect distinct vectors, and its association with severe neurological diseaseshas highlighted a need for robust molecular tools that can be used to efficientlyand quickly detect and characterise ZIKV genomes (Leguia et al. 2017 17 ). Herewe described a targeted RT-PCR amplification and Sanger sequencing strategydeveloped for complete envelope ZIKV and partial NS5 genes, considered potentialdrug and vaccine targets (Shankar et al. 2017 19 ). The main purpose of that strategywas to introduce easy laboratory procedures in order to handle ZIKV samplesand reliably generate genome sequence data in a quick and cost effective manner.
Our primers weredesigned based on an alignment of 15 ZIKV reference sequences. The referenceswere selected in order to include the diversity of ZIKV strains, and to avoidbiasing ZIKV strains that had undergone multiple passages in viral culture orisolated from non-human hosts were not included. The primers have been validatedwith a scarce set of cell culture and clinical (urine) samples collected during2016 from a restrict area in Brazil. Although samples for other lineages werenot tested, primer sets were designed using global references, so it is likelythat it can be used to successfully amplify a myriad of samples.
Conventional one-stepRT-PCR has been successfully used for amplification of ZIKV genes (Faye et al.2008). We developed here a nested RT-PCR. The option for this approach was basedon: (i) sensitivity can be further increased, (ii) avoid further amplificationof primer-dimer artifacts or nonspecific products generated in the first round,and (iii) a different set of primers could be employed in the second round (Goodeet al. 2002). This protocol showed to be effective for ZIKV amplification fromcell culture supernatants, but only partially for clinical samples. Nested RT-PCRmethod did not allow the NS5 amplification in urine sample (both concentratedand non-concentrated). The presence of possible PCR inhibitors in urine wasnot evaluated in the present study, and they might be hindered the NS5 amplification.Internal controls are necessary for future optimisation of RNA extraction andamplification procedures. A non-amplified NS5 protein could also indicate degradationof the nucleic acid in the urine sample or an unusual sequence. Mismatches inthe primer-binding region are known to affect amplification. However the NS5genomic sequence is highly conserved among Asian lineage strains worldwide (Metskyet al. 2017), suggesting that mismatches were not the probable cause of non-amplification.It is worth to mentioning that only one clinical sample was tested here, andit is important to further evaluate the performance of the present protocolusing additional clinical samples, including serum, saliva and urine.
The majority ofZIKV sequences available were obtained using NGS (Barjas-Castro et al. 2016 1 ,Leguia et al. 2017 17 , Metsky et al. 2017 18 ), and this methodology provide valuableinformation on viral diversity, being pivotal in the analysis of viral quasispecies(van Boheemen et al. 2017 22 ). However this tool may be cost effective in specialisedcore laboratories working with high quality samples and bioinformatics support,a situation not commonly available in clinical and public health laboratories,especially in resource constrained settings. The use of cell culture isolatesobtained from small serum samples and the nested RT-PCR followed by Sanger sequencingpresented here was a suitable low-cost methodology to sequenced relevant regionsof ZIKV genome. Moreover, in the context of outbreaks, where high numbers ofsamples need to be processed quickly and accurately, these types of tailoredstrategies can significantly impact operations (Leguia et al. 2017 17 ).
It is well knownthat Sanger sequencing may not have the sensibility to detect minor variantsof the RNA viruses quasispecies; nevertheless is an alternative tool, easy touse, robust, affordable, rapid and specific to obtain sequences from the majorvariant. Thus, it may be an important alternative methodology to NGS. In thepresent study we were able to demonstrate that major variant of ZIKV envelopegene identified in the recipient transfusion patient (Barjas-Castro et al. 2016 1 )could be recognised by both NGS and Sanger methodologies.
Similarly to thesequences described in the recent widespread epidemic of ZIKV in the Americas,the partial genome sequences characterised in this study clustered with theAsian clade, covering sequences from New World, Pacific, Micronesian and Malaysianstrain (Faria et al. 2016 8 , Metsky et al. 2017 18 ). ZIKV envelope protein is responsiblefor virus entry and represents a major target for neutralising antibodies. Onthe other hand, NS5 is critical for ZIKV replication. Therefore, envelope glycoproteinand NS5 polymerase are major targets for ZIKV antiviral and vaccine developments(Shankar et al. 2017 19 ). Nucleotide ambiguities were identified in NS5 regionin two sequences analysed here (isolates BR18147/ZH100 and BR31016). The preciseimpact of amino acid changes cannot be predicted from sequence information aloneand studies attempting to correlate nucleotide differences with antigenic differencesare extremely important (Dai et al. 2016 6 )
In conclusion,the present study provided a simple and low-cost Sanger protocol to sequencerelevant genes of the ZIKV genome able to provide robust phylogenetic signalsthat allow molecular epidemiological studies.
Sequence data- Sequences are available at GenBank with accession numbers: MF048802 to MF048807and MF077458 to MF07763.
GBC, JLPF, RPSand LFMB conceived the study; GBC, JLPF and LFMB designed the study protocol;GBC, JLPF, RPS, MSC and CAF participated in the conduct of the study; RPS, MSCand CAF acquired the data; MSC and CAF performed the cell culture assays; GBCand JLPF conducted molecular assays; GBC, JLPF and AL analysed and interpretedthe data; GBC, JLPF and AL drafted the manuscript; RPS, MSC and LFMB criticallyrevised the manuscript for intellectual content. All authors read and approvedthe final version.