Until the mid-1990’s there was no doubt on how a biomedical science researcher should proceed to start solving a problem or providing explanation to a natural phenomenon: use the “scientific method”. As taught in undergraduate science training programs, the “scientific method” consists of observing a fact or identifying a relevant question, formulate a hypothesis, collect data obtained under rigorous experimental set up, analyse these data with appropriate statistical methods, discuss them in view of the current knowledge, draw conclusions and finally (ideally) make a decision to reject or not reject the hypothesis. From the viewpoint of a bench lab researcher the steps described in the traditional “scientific method” were so clear that the final result of tackling any biological problem was as sure as the day and night cycle: a published “paper” describing the research results that tested a new hypothesis and expanded the frontiers of science. It seems however that many published papers in the biomedical sciences do not to follow this formal pathway to expand the scientific knowledge. Although this observation is not unique to the biomedical sciences, it is a fact that research in this area is still dependent on the use of descriptive methods and equipments that gather information from either collected/immobilised/processed samples or field trips. Squeezing descriptive results and high-tech data into the traditional article format, with or without a hypothesis, is common practice to a large part of the biomedical research community. Nevertheless, is unquestionable that biomedical/biological sciences have forged a singular research path leading to important discoveries that have improved many aspects of human society, including public health, disease control, vaccines, new medicines, agriculture, food production, biotech innovation, wildlife conservation and biodiversity.
The contrast between “discovery research” and “hypothesis driven” in biomedical sciences gained traction at the dawn of “omics” technology, i.e., genomics, transcriptomics, proteomics, metabolomics etc. After the torrent of “scientific articles” describing new genome sequences, thousands of variants mRNAs in transcriptomes, and countless “proteins set” generated under the most diverse experimental conditions, it seemed that biological research was reaching a new height. The covers of scientific journal in late 1990 and mid 2000 often referred to genome and transcriptome sequencing progress from the “Grand Old” model organisms, (e. g., Saccharomyces cerevisiae, Escherichia coli, Caenorhabditis elegans, Arabidopsis thaliana, Rattus norvegicus), reaching the apex during the announcement of the complete sequence from the most awaited species: Homo sapiens genome (in 1999, with gaps yet). These breakthroughs were soon followed by the ones from hard-to-culture microorganisms and endangered (or exotic) species. Within a decade however, molecular data was being poured into databases at both high rate and massive amounts, nucleic acid databanks grew enormously, and journals started refusing “papers” that presented single “omics” data. The reason? There was no novelty or challenge in getting a single genome, transcriptome, proteome sequenced and analysed. From this time on to publish a “paper” on any “omics”, hundreds of them should be sequenced and analysed. We learned from this period in biomedical publishing that if no hypothesis is explicitly formulated, the alternative is to deploy more of the omics data to improve the prospect of the message in that scientific article.
Dividing today the biological research into “discovery driven” and “hypothesis driven” seems to be irrelevant, as the routines and operational procedures to inquiry a biological phenomenon or to solve a problem in life sciences depends both on gathering a large amount of molecular data (omics data, for example) and formulating a hypothesis as well. The development of both molecular techniques and sophisticated tools for biomedical research has made it easy for researchers to get huge amounts of data, provided that some technical training and proper financial resources are available to the research group. But more data does not mean better science. After all, most hardware and software is designed to produce information (plenty of data) regardless of the quality of the input sample. In a popular saying, for the tools available today, if “garbage in, garbage out”. The consequence is that many “scientific papers” describing non rigorous (or irrelevant) data will inevitably get published because the current publishing system is under the pressure of the “conflict of interest” inherent in the business model “author-pays publishing” (a.k.a. Article Processing Charge - APC journals).
Whatever the “knowledge generation method” used by the biological/biomedical researchers nowadays, a new path should be envisioned by this community. “Data Gathering, Analysis and Announcement” accounts for most of the time consuming work in research activity, and will certainly reach a relevant position in the framework for the assessment and recognition of individual achievement. Thus, the standard (traditional) scientific article format should not be the only alternative for the public display of biological research results and data. Indeed, initiatives like the DNA repositories in the beginning of 1980 might be seen as the first step to promote the independence of “molecular data”, since every researcher is expected to (most of the times, they “must” to) deposit his/her nucleic acid sequence into these repositories before formal publication. It is worth mentioning that a large part of the DNA sequences publicly available in these databases have never been linked to a “published” article. But this does not mean that these sequences have not been useful! It is highly probable that someone somewhere has at least visualised the accession number of any of these sequences after a blast search.
One solution to the problem of “alternative format” for data communication is now within reach of researchers: sharing and opening data through public or institutional repositories, followed by some sort of “data announcement” and “call to peers” to scrutinise, reuse, reanalyse and distribute the open data. It would be something as “Res ipsa loquitur”, equivalent to “Data Speak for Itself”. Thus, no need to write a formal article, i.e., Introduction, Materials and Methods, Results, Discussion (IMRAD paper), conclusions, references, acknowledgements plus the myriad of journal format/styles every time one needs to submit a paper. What is more relevant at this point is the decrease in the burden currently posed on the peer review system, which is expected to guarantee the quality of scientific articles in public domain. To our disappointment the peer reviewing is failing to this mission, we have been watching a lot of fraudulent and bad science articles going to public view, contributing to the corruption of the scientific registry (and also for the loss of confidence in science).
Let us give Scientific Data the independence from the “standard, conventional Article/Paper IMRAD” format. This is a three and a half centuries old message package that now has to be modified to adapt to technological achievements expected to transform the science communication and the human society in general: knowledge and innovation produced by algorithms with capability (in the near future…) to make up their own decisions!
Adeilton Brandão
Editor in chief, Memórias do IOC