Yesterday I stumbled upon a very interesting article with the provocative -yet misleading- title: “The scientific paper is obsolete”. The writer, James Somers, makes excellent points on the current form of scientific writing and presents more modern alternatives like Mathematica and Jupyter (former Ipython) notebooks. I mostly agree with its content, but felt the need to elaborate on some of its main aspects.
Below, are some of my (biased) thoughts on the theme of the article. As a disclaimer, I should state that all my opinions below consider the case of STEM related papers, as this is my domain of expertise. Some points probably even focus more than needed to software and machine learning paradigms. My sincere apologies for that to any reader who is not relevant to these subjects.
So, here it goes:
The PDF, oh the PDF!
The scientific paper is not dead, but it’s seriously outdated. Its static form as a (most often only favorable) snapshot of its contribution is not modern nor appealing. Reproducibility of research is by no means facilitated by pdf documents. In the best case, papers demand a lot of effort from anyone who wants to reproduce the experiments/methods described in them. In the worst case, the descriptions contained in the paper are not sufficient for anyone who wants to reproduce the experiments/methods. This may be because of the limited length of the article (when it’s for a conference), the hasty and limited review process, or even (in some cases) a deliberate attempt by the authors to present biased data without being scrutinized.
Opening the windows to science
To avoid some of the defects described above, the scientific community has lately moved to more open channels of communicating its findings. Be it the early publishing of papers even in their premature stages through e-publishing services like arxiv, or providing links to open source code for reproducing experiments mostly in github. These are very important steps forward and have definitely opened new windows to science.
What we miss now in my opinion, are doors to science; free channels of data distribution, i.e. a github equivalent for sharing open access data. This will speed up scientific progress in certain fields like for example Machine Learning and will definitely encourage full reproducibility of research.
Make science less boring
The scientific community has to adopt new, more interesting ways of presenting scientific findings, not only regarding the format, but also the language and style used to convey the message.
The pdf documents are a nice way to read articles in your e-reader or print them out (not the ecological way though), but they shouldn’t be the norm in the era of smart devices. The use of tools that allow for reproducing the experiments while reading and also visualizing the data, results and the math behind the methods should be encouraged. Reading a scientific document in the 21st century should have an option for a more vivid experience than the one we’re used to by now.
Language and Style
I know many people will probably disagree on this, so I’ll try to be as concise and precise as possible… Did I say concise? Well, if we maintain conciseness and preciseness, I see no reason not to enrich our text by braking some rules and guidelines that are mere formalities from the past and offer nothing in the present time. I don’t claim it has to be like the following, excellent example by Jorge Cham, or even like the text from YOLOv3, which is a very amusing read of a very interesting state-of-the-art method. However, I feel scientists should somehow start conveying their work in a more interesting manner.
Tools for research in the 21st century
Here are some of my biased aphorisms on the use of tools that facilitate research and its presentation:
Tools matter. Mathematica notebooks were a revelation and helped many scientists progress faster. Unfortunately, Mathematica is not free. Neither is Matlab, but this is another -only slightly related- story.
Open source tools matter more. Ipython notebooks were a ground-breaking tool for the scientific community. It allowed for rich text to be combined with code that can be manipulated and run on the spot, so that you can tweak the experiments and re-run them. Deep Learning knowledge transfer wouldn’t be the same without them for one. Also, zero cost for using them means the world to researchers from poor countries, or from poor universities, or even from their home office where they have no means to do their work other than a computer with access to the internet. This is what research should be all about; being inclusive to everyone.
Non-programming language specific open source tools matter even more. Jupyter notebooks were the step needed to allow scientists not comfortable with Python to use them with R, Julia, Octave etc. They build on top of the Ipython notebook functionality, thus allowing the use of markdown syntax for text enrichment, MathJax for equation writing and of course images embedding in the text. Now all the data scientists who cannot see themselves programming in any language other than R, have no excuses not to use notebooks.
BONUS: With the use of add-on functionality like the excellent RISE written by Damian Avila, scientists can even do presentations with your Jupyter notebooks. No more static powerpoints in conferences! Still a lot of work needed, but it’s a very cool prospect indeed!
Open source IDEs compatible with Jupyter notebooks would revolutionize scientific research. When we find proper ways to combine all the functionality described above with a proper full-fledged IDE that will be free as beer, then the community should pause for a minute and celebrate. This is what all the people in the Jupyter project are aiming for. Their next step is Jupyter Lab, which is in beta right now. It allows easier contribution for extensions, something that has led to a very nice diagram tool, draw.io to be added to Jupyter Lab. A file browser and multiple editors/viewers are also in place.
Those who can contribute to this project should definitely do. Those who can’t can root for the success of Jupyter Lab and also spread the word!
Tools for researchers…. by researchers!
Unfortunately, tools like the ones mentioned above have to be written by the scientific community which needs them. Third party will not get it right quickly enough, or/and will charge lots of money for it. Then, they will tie you up with maintenance contracts and extra fees.
Pushing researchers who are active on the tool making front out of research is to me like shooting the community in the foot. They are essential to the community and they should be highly incentivized to do it. At the end of the day, most of them are also talented in conducting great research and are also amazing in writing papers.
Are we there yet? No, by no means. However, there is a lot happening on the front of reseach dissemination tools to just continue to do things “the way we always did”. In my personal opinion, rethinking the way we present research and its results is a necessary step to mending many of the broken parts of the research ecosystem.
And to answer the question in the provocative title; yes, I believe that the STEM research paper in its current form is obsolete, in the sense of being -very- out of date. Even if the current tools are not mature enough, we have to invest in getting them to a mature enough state to give us the STEM research paper v2.0.