This document is my attempt to keep a thematic list of all the problems that affect academic research, as I can tell from my point of view (theoretical computer science research in France). It was written after having worked in academia for 6 years, which I hope strikes a right balance between being too naive and having become oblivious to problems. This list is not a call to action directed at anyone (except maybe myself), but I hope it can lead to discussion and maybe to change.
Of course, despite these flaws, there are many things to like about academia: complete freedom, fascinating questions, passionate people, limited short-term pressure, etc. Many of the problems in this list also exist in other environments, and some of them are only more noticeable in academia because of its high standards (e.g., having to support your claims) and noble goals (e.g., acting in the public interest). Overall, I am still enthusiastic about academic research: it wouldn't be my job otherwise! Yet, I think it would be worthwhile for academics to spend more time thinking about these problems and discussing possible solutions to them.
As this page is rather long, each problem is in a separate section: I have put a star (*) in the title of those that I consider to be the most important. I have also put the main contents of each section in a collapsible box. If you want, you can:
If you are in a hurry, here is a 280-char summary of the core issue:
The competition for positions and grants has lead academia to focus on papers and citations as the primary indicators of success. This is influencing how research is conducted and presented, and makes it hard to complement traditional publishing with new ways to share knowledge.
As for other pages in this website, my disclaimer applies: while I am definitely relying on my research experience to write this essay, it only reflects my personal opinion, and is not endorsed by my current or past employers. I am interested in your opinion about these problems or their solutions, or if you would add other problems to this list. Feel free to reach out by email at a3nm @a3nm.net. I'm grateful to many people who wrote to suggest improvements to this page.
This section presents problems related to the distribution and formatting of traditional academic papers.
The goal of research is to develop and distribute new knowledge. Yet, its output is often difficult to access because it is stuck behind paywalls: potential readers need to pay to access research articles.
Of course, many other kinds of creative works have a price: books, music, movies, video games, etc. However, for such works, a part of the price usually supports the creators, i.e., the musicians, writers, etc. By contrast, subscriptions and payments for research articles do not support researchers at all. The salary of researchers comes from their home university or other institutions, not from these fees. Further, as these institutions are often publicly funded, one would think that members of the public should have a right to access the results of their research.
So what happens to the money paid to access scientific papers? All of it goes to the academic publisher, which is often a private company, or sometimes a university or scientific society. This is counter-intuitive because most of the "cost" of research is not incurred by the publisher: the research is done by researchers, it is written up in a paper by these same researchers, the paper is typeset by the researchers themselves (at least, in computer science), the paper is peer reviewed by other researchers, and the whole process is usually overseen by researchers. Yet, the publication venues (conferences and journals) belong to a publisher, which takes copyright ownership of the papers at the very end of the process, and sells them. The only contribution of the publisher is to reformat the paper slightly1, and host it online behind a paywall. The publisher then makes money from the paywall, so they forbid other people from sharing or re-hosting the article elsewhere, and thus end up working against the distribution of articles.
The justification for this broken system is historical. Before the Internet and computers, it was difficult to typeset papers, distributing them in paper form was a heavy and costly investment, and editors were often small companies with scientific expertise in their subject area. Nowadays, typesetting is mostly done directly by the authors, online distribution is much easier, and publishers have merged into publicly-traded multinational conglomerates who minimize their scientific involvement and maximize their profits.
There are many problems with the current system:
To work around these problems, many researchers put copies of their own work online. However, these articles often have an unclear copyright status, so they cannot be mirrored or redistributed, or licensed, e.g., under a Creative Commons license.2 Further, they are often hosted only on the author's website, and not in a stable repository: hence, when the authors retire or change institutions, the webpages and the papers disappear. Last, this author copy is often slightly different from the publisher version, in terms of formatting, numbering, and sometimes content: this causes much confusion about which version should be cited, which version is more up-to-date, etc., and researchers waste time formatting, typesetting, and proofreading multiple versions of their work.
Some publishers are happy to propose a solution to these problems: the author-pays open access model, aka "gold open access". In this model, the author can choose to have their research article freely available on the publisher's website, with no paywall or fees; but instead the author has to pay an "article processing charge" to cover the publisher's costs. So, how much do publishers charge to host a PDF file online? The ACM charges 700 USD per article3. Other publishers charge more: Nature charges 9750 EUR.4
Compare this to the open repository arXiv which offers to host PDF files at no cost to authors or readers: it hosted over 110k new articles in 2016 for a budget of 1.1 MUSD, amounting to around 10 USD per article5; or to LIPIcs, a rare example of an ethical publisher, who charges at most 60 EUR per article6. As research is underfunded, 700 USD is a lot of money, all the more so in developing countries and poorer universities. Further, with this optional author-pays model, libraries still need to pay subscriptions to access the articles that are not open-access, in addition to paying article processing charges when they publish, i.e., they pay twice.
So, why don't scientists get rid of these useless scientific publishers? This is trickier than one could expect, because the main indicator to evaluate researchers is the number of articles that they publish in traditional publishing venues (conferences and journals). Hence, researchers would harm their careers if they stopped publishing their results with traditional venues. This makes it very difficult to bootstrap new publication venues, as they would be perceived as less prestigious than the established ones. Thus, publishers can continue to milk the prestige of the venues that they own, without any risk of a fair competition taking place to drive down the costs. They can make heaps of money7 while working against the dissemination of science8 and without contributing anything of value to the scientific process.
Most scientific articles are distributed as PDF documents. (Some are distributed as Word documents, but this is even worse, and I won't comment further.) This is again for historical reasons: articles were first distributed on paper, and then they were distributed electronically but designed to be printed and read on paper. Of course, people still print articles, but nowadays, this is just one way among others to read articles; in many cases readers only skim articles on a computer screen and don't print them. Sadly, the exclusive use of PDF has multiple disadvantages.
First, PDF is not the Web's native format. This is a problem, because the Web is how people access articles today, but PDF documents are second-class Web citizens. The support of PDF on Web browsers is not great, especially on mobile, which is where most Web browsing happens nowadays9. Further, PDFs are not indexed efficiently by search engines -- which in addition to paywalls often makes scientific articles difficult to find.
Second, PDF imposes a specific paginated layout. This is great if you want to print the article, but it sucks if you want to read it on a screen, especially on smaller screens like phones, tablets, or ebook readers. Some solutions can reflow PDF documents, but they are ugly hacks and often do not work. Further, PDF pagination also makes some other important features fragile, e.g., searching for text in the paper (because of line breaks, hyphenation, etc.).
Third, you cannot navigate a PDF document as easily as a webpage: pages display with a noticeable rendering lag, there is little hierarchy, hyperlinks are crappy, and PDF readers often have no tabs. It is not possible for other documents to link to a specific point in a PDF (e.g., Theorem 4 page 2), the way that you can link to a specific part of an HTML page using anchors (e.g., to a section of a Wikipedia article).
Fourth, PDF makes interactive content much harder. You cannot, e.g., expand/collapse a proof, or have a floating diagram or table of notations in the margin, or have interactive plots where the reader can zoom, or change which dataset is plotted, or access the raw data, etc. And of course, except dirty hacks, no videos, no animations, even though sometimes an animation (e.g., this one) is worth a thousand words... It is true that PDF may in fact support some of these features, but many PDF viewers do not.
Fifth, PDF poses some accessibility challenges. They are in particular difficult to overcome when producing files using LaTeX, as is common in STEM.
So why is academia still using only PDF, rather than HTML or ePub or other formats? Mostly because of the network effect: venues cannot require a format that is completely different from what other venues expect; and authors cannot push their co-authors to write papers in a new unfamiliar format. In STEM, the use of LaTeX is partly justified by the need for features (maths, diagrams, etc.) that historically had no HTML equivalent. However, modern solutions have appeared since then: MathJax works reasonably for maths, SVG is a great way to draw diagrams, and I hear that plotting data in JavaScript also works fine (e.g., with D3.js). It is true that the typography of browser-rendered text is still inferior to that of LaTeX, but hopefully this will eventually change...
Academia is obsessed with the traditional scientific paper, but many of the supplementary material that goes with the paper is badly distributed or not distributed at all.
First, papers often do not distribute the code used to prepare, process, and analyze the data that they study. This makes it hard for other researchers to build upon the code in another paper, or to reproduce the results of the paper, which is a pity because scientific errors have happened because of wrong code. Besides, it also prevents engineers and other non-academics to test the methods in the paper without reimplementing it from scratch. The common defense13 is that academic code is usually crappy, but crappy code is better than no code at all: see the CRAPL for a humorous take on this.
Second, the raw datasets used in papers are often not shared, even though they would be useful for researchers who want to compare their approach to yours. The obvious explanation is that researchers don't like to be beaten by competitors, but this doesn't even make sense: in many fields, publishing a reference dataset is a sure way to attract many citations and increment the indicators that matter. Sometimes the datasets cannot be shared because of licensing reasons, privacy, confidentiality, etc., or because of their size. Most of the time, though, it is just laziness on the part of the authors.
Third, the raw results of the paper are usually not shared either. You have plots, tables, in the paper, but not the underlying data. This is terrible if you want to analyse the results differently, reuse them, compare to them, etc. For tables, you need to re-type them by hand, and for plots you are stuck. For instance, in this recent note that studies the data of another paper, the author had to zoom on a bitmap chart because the underlying data was not provided.
Fourth, in theoretical research, the proofs of results are not always provided with the main paper, often because of page limitations. Sometimes the proofs can be found in an extended version or technical report, but it can often be difficult to locate. In other cases the proofs cannot be found online, and sometimes the authors are not even able to provide them upon request...
Fifth, the LaTeX source of the paper is usually not published. Some repositories, such as arXiv, do encourage authors to upload their source and then distribute it, but in most cases the source is not publicly available. This is a shame because converting a paper to HTML, or to plain text, or reflowing it, is much easier with the source than without. This also means that papers cannot be read by machines. Besides, the source for pictures, diagrams, plots, etc., is not distributed either, so it is hard to reuse them. If you want to quote a plot from another paper, you have to take screenshots or have a hard time exporting it as a vector drawing. If you want to change the font, the colors, the background, you are out of luck.
Last, even in cases where these additional materials are shared, they are usually put online by the authors on some temporary webpage, which usually disappears when the authors retire or move to different institutions.
Papers are optimized to be read by humans, but nowadays it makes sense to process the scientific corpus using computers: consider, e.g., information extraction projects such as PaleoDeepDive or IBM Watson, or recommendation systems such as Google Scholar, etc. Yet, processing papers automatically is more difficult than it should, because the scientific corpus is not machine readable.
First, it is already complicated enough to obtain a dataset describing all scientific papers. In computer science, we have DBLP, but it does not cover other research areas... The only general source that I know is CrossRef: they do provide snapshots of their data but you have to pay for them, otherwise you need to make do with non-official dumps or crawl their API yourself.
Second, you cannot easily obtain the actual contents of the articles. Of course, the main obstacle to this is paywalls, but even open-access articles are not easy to download in bulk. For arXiv articles, there is a dump, but it cannot be redistributed, again because of copyright; there is a non-official freely downloadable mirror of this data on Archive.org, but its copyright status is unclear. Another issue is that articles are in PDF and often you do not have their source, so it is complicated to extract text, or to understand the structure. (For most arXiv articles, fortunately, the source is available.) There are some copyright exceptions for text and data mining, e.g., in the UK, but they are narrow and complicated to use in practice.
Third, there is no public copy of the citation graph that indicates which articles are cited by each article. This data is difficult to obtain because, when articles are not paywalled, they are often only available in PDF without their LaTeX source. Further, citations usually do not include an unambiguous machine-readable identifier of the resource being cited (i.e., a canonical URL, or a DOI), so citations have to be disambiguated, which is complex and error-prone. So, to obtain citation data, you need cooperation from publishers (who often have the source), or you need to run fragile information extraction tools on the PDF like GROBID or Science Parse or others. For these reasons, DBLP does not have the citation graph and the arXiv dump does not provide it either. Google Scholar has this data, but they are not sharing it. The best source of citation information seems to be the Initiative for Open Citations: but their data is only available via CrossRef (which as I explained does not provide dumps), they only seem to cover around 20M papers (whereas CrossRef knows nearly 100M papers), and the citations themselves are often not provided in a structured format (see this paper for instance). There is the Microsoft Academic Graph (no public version available), the OpenCitations dataset (only contains around 300k papers), and the Citation network dataset (only 5M papers and 45M citations, only seems to cover computer science, and an unclear license). More recently, Semantic Scholar released S2ORC, a dataset of citations covering 80M papers and 380M citations (still not perfect but somewhat better than alternatives), but under a nonfree license.
This is a very sad situation. In principle, there ought to be a large public dataset containing an entry for each paper ever published, including metadata, the full content in a usable textual form, and disambiguated citation links. If this existed, it would probably have research applications of a magnitude comparable to that of the Wikimedia dumps. Yet it doesn't exist, with the first main reason being copyright, and the second being the weight of traditions...
The main way for research articles to refer to other publications are citations, i.e., some identifier in brackets, which points to a complete bibliographic entry in a bibliography. However, these citations are not always used in ways that benefit the reader of the article.
One first problem is that bibliographic entries are often inconvenient for readers to follow: in most cases they include historical information (e.g., issue number, page number), but they almost never contain a DOI or a hyperlink to the paper being cited, and no one seems to know why.
A second problem is that references are often vague: they usually point to an entire article, or even an entire book. This is annoying for readers when the reference is about a specific definition, result, or theorem; but it can be much worse, e.g., when the claim is a non-immediate consequence of some side comment in the source being cited. The reason for these short citations is sometimes to save space, but often it's just to save effort. The worst kind of imprecise citations are hollow citations that are added to make subjective statements sound more definite (e.g., "approach X performs poorly for problem Z [42]") but where the source is only very tangentially related (e.g., [42] talks about solving Problem Z with method Y, and only indirectly implies that X would not have worked either).
A third problem is that references are one-way, i.e., when you read a paper, you do not know which other works have cited this paper since it was published. This information is often critical: Has anyone improved on the results of this paper, or used them since it was published? Are there recent surveys presenting the same results in a more legible way? Has someone reproduced or refuted this result? To find which works cite a given paper, the only tool I know is again Google Scholar, but this is only a Google side project that could join the Google Graveyard at any time. Further, Google Scholar contains many errors, e.g., bogus citations where the citing article is in fact much older than the cited article.
A fourth problem is that references are instrumentalized as an indicator to estimate the "impact" of a scientific work. Because of this, researchers have a perverse incentive to cite their own works or that of their colleagues. Further, when writing about a concept or result, academics have the custom of citing the original reference for it, instead of citing, e.g., a more modern presentation or a usable textbook on the topic. This is probably intended to boost the citation count of the original authors, but it is not helpful for readers.
Once an article has been published, you can no longer update it. To some extent, this makes sense: the article will hopefully be cited and improved by future articles, so it would confusing if its contents could change without warning. However, it is sometimes necessary to modify an article, for instance, to correct mistakes.
In my opinion, the right way to support updates is the way arXiv does it: new versions to an article will not overwrite the initial versions, so that you can still cite a specific version of the article, but the authors can publish a new version when necessary. However, most traditional publishers don't allow such updates. Some publishers have a complicated "errata" procedure where you can publish some document pointing out an error (e.g., this). However, in all these cases (even on arXiv), readers who directly download the PDF of the old version will not learn about the new version or erratum (e.g., look at the PDF here).
The lack of updates leads to very bad outcomes. For instance, in my field, some published results are "widely known" to be wrong, but this information has never been put in writing, and the erroneous article is still distributed as-is. Anyone from outside the community could fall into the trap of taking the incorrect results at face value.
Papers are written to be read from start to finish, because this is how reviewers read them when evaluating them. However, once articles are published, their readers will often only skim them, even though they are not optimized for this mode of consumption.
Usually, you come to an article looking for a specific bit of information, e.g., what is the definition of X, what is the complexity of Y, what is the proof of claim Z, what is the difference between articles A and B, and so on. Sadly, most articles are not optimized for such tasks: the contents are presented linearly, the sections are not self-contained, technical terms do not not point back to where they are defined, and useful summaries like tables of results are often omitted.
Another example from theoretical research: it's very helpful when the main theorems of a paper are self-contained, so you can reuse them easily in other works as a black box, without caring about the proof details. Yet, many theoretical papers are designed to be read linearly, so their result statements depend on definitions (or sometimes hypotheses) that were introduced earlier; or the statement anticipates on proof constructions that come later.
This point is somewhat specific to computer science, where most papers are published in peer-reviewed conferences. Historically, the papers presented at the conference were arranged in a "proceedings volume" (a book) that was shipped to university libraries. Nowadays, of course, articles are downloaded individually on the Web; but the historical tradition of proceedings volumes has survived.
Because of this tradition, editors waste their time writing prefaces to such volumes, even though no one will ever read them. Authors waste time following complicated instructions15 so that the papers can be arranged in a neat table of contents, with consistent typesetting and page numbering, even though no one will ever read the articles sequentially.
For many conferences, the preparation of this useless proceedings volume is the only task that the scientific publisher does, so it is the main obstacle that prevents conferences from ditching their publisher. Even ethical publishers such as LIPIcs have to charge money for the thankless work of preparing the proceedings volume16. It seems that the only function of this "proceedings volume" may be to make the publications look more "genuine" so that they will be taken into account by citation indices, which is obviously a problem of its own.
Another problem with proceedings volumes is that they were historically books, so pages used to be scarce. This is why some fields still use ridiculously dense article stylesheets such as sig-alternate, to cram as much information as possible on one single page, at the expense of legibility. This is also the reason why there are still severe page limits on the final versions of articles, which makes researchers lose lots of time shortening sentences one by one to fit an article to the limit, and often makes articles harder to understand. Some unethical publishers even go so far as to selling extra proceeding pages at the bargain price of a few hundred dollars per extra page17. Of course, none of this makes sense now that articles hosted online.
A variation on the page limit problem is that conferences, even similar conferences in the same field (e.g., ICDT and PODS), have often not agreed on the same page limit and stylesheet. As articles rejected from one conference are often revised and resubmitted to the other, this means that researchers waste time removing or adding back content and refitting the material to the right length
This section presents problem with the peer reviewing process which is one of the hallmarks of academia.
Peer reviewing is an crucial part of the scientific process: submitted papers are scrutinized by impartial experts from the community, who check that they are understandable, correct, novel, and interesting. Given the importance of this task, it is a major problem that reviewers have little incentive to do a good job. Of course, many reviewers want to do their job seriously, but as research is becoming very competitive (for positions and funding), the tasks that suffer are often the ones where you have little incentives and face few consequences for sloppy work. Hence, reviewing is often badly done.
It is true that reviewing carries some small side benefits, and this can serve as indirect incentives: e.g., getting acquainted with recent papers, influencing the direction of your field, ensuring that no errors creep in, etc. However, the main incentive to review nowadays is that is it somewhat prestigious to be part of a program committee (PC). This prestige, however, is not very strong, even for the most selective venues: a senior résumé without any PC membership may raise some eyebrows, but reviewing is definitely negligible when compared to other indicators such as the number of publications. Further, you only gain prestige if you are actually a PC member, but that's not always the case when you review: many papers to review are "outsourced" by PC members to external reviewers or subreviewers, who do the dirty work without any recognition. In addition, for venues which are useful but not selective (e.g., workshops, PhD symposia), being a PC member isn't prestigious in the slightest.
What is more, the prestige of PC membership is only attached to the function of being a reviewer, and not to the actual quality of your reviews. This is because of the Juvenalian problem of reviewing: while reviewers verify the submissions, essentially no one verifies the reviews. The PC chair will notice obviously nonsensical (or nonexistent) reviews, but the only "punishment" for unreliable reviewers is that they will no longer be asked to review for that venue -- not a bad deal as this frees up more time for them to do research. And how bad does a review have to be before the PC chair complains? As every researcher knows from experience, the bar is really low: one often receives reviews where it is obvious that the reviewer did not really try to read the paper. The PC chair usually does not care much about such shallow reviews, and even when they do, it is hard to push reviewers to a higher standard when bad reviews are so commonplace. Hence, even in top venues, a review will often only consist of a few paragraphs with general comments about the work that are vague if not misguided. This is not a lot when the average paper consists of 10 pages of highly dense and technical material: if you read it closely, you can always find mistakes, unclear explanations, awkward phrasing, and (if you think enough) general directions for improvement. Even more infuriating is the case where the reviewers have obviously not even taken the time to discuss among themselves after having written their reviews: Why is reviewer 1 asking for one thing and reviewer 2 asking for the opposite? Why did no one tell reviewer 3 about an obvious misunderstanding?
The low quality of reviews has many consequences. One of them is that the decision to accept or reject a paper becomes more random. This was famously highlighted by the experiment by the NIPS'14 conference: submissions to the conference were evaluated by two independent PCs, and it turned out that most papers accepted by one PC had been rejected by the other. A similar experiment was performed at ESA'18 (see details). Of course, reviewing is always subjective, so the outcome is always somewhat uncertain, but the extent of this variability is quite surprising, and probably has something to do with the low signal-to-noise ratio in reviews. Another problem with bad reviews is that errors in papers can go unnoticed because the reviewers have not actually checked the results. Last, receiving shallow reviews is depressing for paper authors, and getting a paper published often feels like a pointless game of trying to address unclear and superficial comments from pretentious but clueless reviewers: the futility of this exercise has driven many good people away from research. The same goes for abusive reviews, which are surprisingly common: while it can't be helped that reviews may contain negative criticism about the work, it is unacceptable that so many reviewers misuse their anonymity to adopt an extremely condescending or aggressive tone.
In all venues of my field, reviewing is at least single-blind: authors do not know who the reviewers are. I think this is a good idea, because reviewers can be honest without having to worry about possible adverse consequences to their career: as a reviewer, if I were not anonymous, I would be afraid when rejecting bad submissions written by prominent authors.
However, there are less conferences that use double-blind reviewing, where reviewers do not know who the authors are. I think this is a shame, because reviewers are inevitably biased when they know the authors, or the universities, nationalities, gender (from the first names), etc. This effect has been measured on the reviews of the WSDM 2017 conference, with the results showing a statistically significant bias in single-blind reviewing in favor of famous authors, companies, and universities. Hence, it is important to adopt double-blind reviewing, so that this irrelevant and bias-inducing information is hidden from reviewers.
There are many objections to double-blind reviewing, but most of them are about flawed implementations of the idea. For instance, some double-blind venues insist that it must be impossible for reviewers to deanonymize the authors of a submission, and some lazy reviewers use this as a cheap way to reject papers without reading them. This can be very annoying for authors, because deanonymizing them can be easy: they may have posted a previous version of the work online, or given a talk about it, etc. However, I do not think that venues need this kind of strong anonymity requirement: the point of double-blind reviewing should be to ensure that well-meaning reviewers are not biased by unwanted clues in the submitted material, and also to give paranoid authors the possibility to completely hide their identities if they so choose. Other objections to double-blind reviewing come from people who believe that their research community is honest and that reviewers in their community are not biased, so double-blind reviewing is unnecessary and makes everyone feel suspicious. However, we all know that even good people can be biased, and that safeguards can be useful even when there are no specific suspicions of fraud. The most unbelievable objection I have heard from my colleagues against double-blind reviewing is that submissions from unknown authors should be scrutinized more closely for errors than submissions from established researchers -- but this is precisely the kind of unfairness that double-blind reviewing should prevent!
It's true that there are some minor issues with double-blind reviewing: e.g., all possible conflicts of interest must be declared in advance (they cannot be fixed as they happen), uploading supplementary material anonymously is more complicated (but it's often the fault of the conference management system), authors must avoid obvious giveaways in the submitted paper (e.g., "our earlier work [42]"), and there is always the risk of having seen the work on arXiv before. However, I think that these small inconveniences are worth the increase in fairness.
A related question is the move to triple-blind reviewing, where reviewers do not know the identities of other reviewers. See this paper for a discussion of the benefits of anonymous discussions for peer review.
Experimental science is concerned about the reproducibility of papers: a paper is only valid if anyone that performs the experiment will be able to obtain similar results. In most areas of science, reviewers cannot reasonably reproduce the experiment, as they usually do not have access to the materials, equipment, test subjects, etc. But in experimental computer science, you can often run the experiments on commodity hardware when you have the code and datasets, so it would often be reasonable for reviewers to check that they can reproduce the experiment result. Yet, usually this does not happen, and in fact the code and data is often not published at all.
There are some happy exceptions, for instance, SIGMOD has a so-called reproducibility committee, which checks if they manage to run the experiments in some submitted papers. These laudable efforts, however, do not always survive for long: the SIGMOD committee went on hiatus19 from 2013 to 2015, and VLDB apparently stopped having one20 in 2014, and only resumed with the pVLDB Reproducibility initiative in 2018. The explanation is simple: these efforts require additional work from reviewers, who are already not incentivized to do their core job.
In addition, though they are better than nothing, these reproducibility committees can easily miss critical mistakes (not to mention deliberate fraud): they only check if they can re-run the code and obtain the same result, but they do not check if the code is actually computing the right thing (i.e., there's no bug). To be sure about the results, it would be necessary to check the code, but this would require even more effort. (Theoretical research has a similar problem that reviewers should check the proof of results but they often don't.)
When a non-reproducible paper gets published, the only hope is that other researchers will publish about their inability to reproduce it. Sadly, this is hard to do, because the code and data of the flawed paper is not published, and the authors may not even be willing to provide it upon request... What is more, there is little appeal to papers about the impossibility to reproduce another paper: this contributes to the replication crisis. Hence, when a paper with great but flawed results gets published, it can become a big obstacle to research in that area, because no one can beat its results, but it is difficult to publish a follow-up to argue that the results cannot be reproduced...
When you submit a paper to a conference in computer science, the answer comes after one or two months: not super fast, but acceptable. For journals, however, you have to wait at least for several months, or sometimes for a year or more.
Of course, it doesn't actually take three months of work to review a journal paper. The reason why it gets delayed is that journal reviews usually don't have a firm deadline, and missing the deadline carries little penalty, again because of the lack of incentives.
The long delay is obviously a problem because it is inefficient: when the reviews arrive, you don't remember at all what the paper was about, so you incur context-switching costs. If the judgment is "revise", you produce a reply to all the points made by all reviewers, which is extremely tedious; and then you wait for months before having any news, so you forget everything about the paper again! As a reviewer, this non-interactive operation is also problematic, because you also forget about the paper between each iteration, and because any misunderstanding about the paper (e.g., in an early definition or proof) can only be addressed by the authors in the next iteration, several months later.
In this section, I outline problems about what kind of research is done, how it gets published, and about the contents of academic papers about them.
Academia only encourages researchers to produce highly technical papers. Hence, it does not sufficiently encourage researchers to take care of reviewing or of other tasks. However, even in terms of publication activities, it means that there is not enough incentive for academics to publish works that are not traditional academic papers.
This last sentence is already very confusing to academics, because the problem is so deeply ingrained that it perverts the vocabulary: how could a "publication" be anything else than an traditional academic paper? Indeed, academia only cares about "papers": academics measure their research productivity by looking at publication lists which include only "papers" (even very bad ones, even unreviewed ones) and neglect other scholarly output, which somehow doesn't "count". The number of published papers is the only indicator, and we forget that 12-pages PDF files in conferences and journals are not the only adequate way to publish our work. Because of this exclusive focus on academic papers, academics have very little time and energy left to publish in other ways.
One way to publish beyond traditional papers are online datasets: for instance, wikis such as the Complexity Zoo or the open problem garden; databases such as the OEIS in mathematics; databases in biology, in computer security, or in other fields. These are all very legitimate ways to publish information for which academic papers would be unsuitable. I also believe that many research communities would find it useful to have a wiki where community members would maintain a consistent list of definitions, problems, a synopsis of the various papers and what they are trying to achieve, etc.: editing the wiki would incur less overhead than publishing a paper (which must be self-contained, recap all definitions, etc.), and the wiki could be updated at will, unlike papers. Yet, such wikis are rare, and often outdated when they exist, because researchers have no incentive to work on them.
Speaking of wikis, let me mention Wikipedia. Writing Wikipedia articles about your research area is an example of a task which is immensely useful and yet laughably undervalued. The Wikipedia page on first-order logic, for instance, received over 330k pageviews in 2017. This is orders of magnitude more than the number of people who will ever read all my research papers. Yet, I spend little time editing Wikipedia, because I am not incentivized to do so, and writing papers seems more important. This is why researchers can spend years or decades working on arcane concepts for which no one ever bothered to write a Wikipedia page, even though it makes it very hard for anyone outside their field to understand what they are doing.
There are lots of other types of non-paper content that should be published by academics but isn't. For instance, blogposts: you can use them to present your research to a wider audience, or to present material that wouldn't be suitable for a paper (e.g., because it's not sufficiently long, because it's not completely new, etc.), like this blogpost or that blogpost by a colleague (in French), or this survey blogpost of mine. Other kinds of publications include lists of open questions, online collaborations like the Polymath project, or Q&A sites like (in my field) TCS.SE. I have posted questions on TCS.SE and received immensely valuable answers from generous people, but their reputation on TCS.SE will probably never help their career as a researcher. Another useful kind of "publications" are lecture notes for classes that you give, slides, etc., which are sometimes the only material that readers can skim to understand something.
One last kind of publication which is important but undervalued, in computer science (and other fields), is code. Sadly, in many fields21, publishing software will typically not make you a successful academic, even if it is very technical or widely used. The main reason why researchers publish code is to accompany a paper: computer science conferences have even invented "demo papers", i.e., a short useless PDF (the decoy) that describes a piece of code (the payload), so the authors can turn software into a traditional academic publication. The main problem when publishing code as a paper is that it does not reward the time spent to clean up code, document it, or make it usable by nonspecialists: what matters, again, is only the paper, even though probably no one will read it. Most importantly, there is no encouragement to maintain the code once the paper has been presented: any time allocated to this would be better spent writing new papers. Of course, you can try to get multiple papers out of the same code if you can slice its features into a publishable form; but good luck publishing Reports on One Additional Year of Refactoring, Bug Fixes, Improvements, and Documentation to Useful Software XYZ. For software projects that do not fit in one single paper, the usual mode of operation is via grants, e.g., a five-year grant to create some large piece of code, which has to be provided as a "deliverable" at the end. Yet, often, the funding agency does not care at all about the status of the deliverables (i.e., is it well-documented? is it available online?), and again maintenance on the project will grind to a halt as soon as the grant is over.
All of this is the reason why academic code, if available online at all, is often not provided in a form that encourages people to run it, e.g., with an appealing webpage featuring documentation and examples. This is why academic code appears out of nowhere at some point in time (because it usually isn't developed in the open) and is left to rot afterwards (when the corresponding paper is accepted). Academic code often has little outside contributors, because academics customs in terms of paper authorship haven't caught up with pull requests yet. Last, academics seldom contribute to existing open-source codebases, because starting a completely new toy system is both easier to do and easier to publish because it seems more novel; even though improving popular software obviously has much more impact than starting yet another narrow research prototype. Of course, some successful software projects have come out of academia nevertheless, but this usually happens despite the system, not thanks to it.
An important and surprising lesson when you learn how to do science is that the structure of a scientific paper is completely disconnected from how the research was performed23.
Usually, a research direction starts from a vague or seemingly unimportant question, which evolves a lot while you investigate it. The main driving force is that the question is natural, fun, or intriguing; but in the introduction to the final paper you will often make up some a-posteriori motivation for the question. Inventing practical applications out of thin air is especially common in computer science (unlike, say, mathematics), which leads to uncomfortable questions: are we doing this to fool outsiders? society? funding agencies? or to fool ourselves about the usefulness of our research?
Once the introduction is done, the contents of the paper are ordered in a way that has nothing to do with how the results were found. Newcomers to academia imagine that the process would be the following: find some questions, find out all the answers, and then write a paper to present the answers in order. In reality, you do the research, you solve some questions and leave others open, and then you come up with a "story" for the paper, i.e., a compelling way to present the results that you have, which is usually completely independent from how you found them. You then write the paper following that story, saying nothing of your initial goals and motivations, and conveniently avoiding the problems where you don't know what to say. For instance, this paper of mine was initially about probabilistic databases, but then we observed a connection to provenance and we eventually rewrote it as a paper on provenance with applications to probabilistic databases, essentially turning an earlier draft inside-out, with little trace of the original motivation. This other work of mine started as a database theory work, but we later decided to write it in knowledge representation terms, to present it to a different community. However, these papers say nothing about the somewhat oblique path by which they came into being. In short, papers are like works of fiction because they tell a consistent story, whereas the actual research, like real life, is more messy and chaotic.
Of course, this departure from the truth is not entirely a problem. For instance, papers cannot reasonably present their results chronologically: it would be impossible to extract information from a meandering narrative "we tried doing X, it failed, we tried Y instead, then we thought about Z, we had a discussion with A, and somehow we gave up on Y entirely...". It's also inevitable that papers mostly talk about what the authors managed to do, and don't say much about the questions that could not be answered. And of course, e.g., if you believed something nonsensical for a long time and only realized your error much later, it's reasonable to omit this detour from the paper. Researchers often arrive to the right answer for entirely wrong reasons, but often the nonsensical path would be hard to express in writing and would not make sense for anyone else.
Yet, there are genuine problems caused by this departure from the truth. In particular, scientific papers give a misleading view of what is science and how it works, which can confuse students and laypersons. Like many works of art24, scientific papers make it very hard to reconstruct how and why they were actually created. The fake motivations are one example, and so is the carefully chosen paper structure, which sometimes lies by omission about what the authors do not know. Here is a common example: you think about problem A, then it turns out to be too difficult, so you settle for an easier case, problem B. In the paper, however, you will only focus on problem B, and then present problem A as future work in the conclusion, or as a late section "Extensions to Problem A" if you have some half-baked things to say about it. Not technically wrong, but not entirely honest either: to the question "why is the paper about B?", the paper does not give the candid answer "we wanted to do A, but it was too hard".
Hence, while most papers are honest in what they say, their structure and their way of selling their results is sometimes dishonest in that sense. This is problematic because it turns science into an adversarial game: authors find a smooth way to sell their results, and reviewers try to call their bluff. Of course, it is much better when reviewing can be cooperative, and authors are able to trust the reviewers and be honest with them... Another problem with smooth and deceptive papers is that they make it more difficult to publish works that are important but do not tell such a good story. In particular, it is hard to publish negative results that disprove some reasonable hypothesis (e.g., "the study found no correlation between X and Y"), which leads to publication bias.
A critical part of writing a research paper is positioning your work relative to existing work, and presenting its broader significance. However, this is hard to do right, and often not done properly.
Yes, papers usually have a "related work" section, but it is often just a useless list to justify that the current work is new, e.g., "A does X, B does Y, we do Z". The authors generally did not take the time to understand in depth what X and Y really are, and how their own work compares, provided that they could find some superficial difference. This is only logical: studying related work is hard, but you can afford not doing it, because reviewers won't do it either. Although reviewers often complain about the comparison to related work, their requests are usually only about adding a shallow comparison to some vaguely related line of work that they happen to be familiar with, or about citing their own papers... No reviewer will bother surveying the overall landscape, so why should you?
A related problem is that it is hard to publish survey papers, which investigate a general line of work, summarize the existing approaches, compare them, etc., often without making new technical contributions. This is a pity, because survey papers such as this one or that one are nevertheless very precious: they require considerable amounts of work, connect together strands of research that are not aware of each other, and provide an introduction and compass for readers unfamiliar with the area. Yet, in my field, publishing survey papers tends to be harder than publishing technical papers22, even if these technical papers are incremental and will not have impact beyond an extremely narrow community.
There may be a historical reason to explain why highly technical papers are overrated, and survey papers are underrated: a few decades ago, research communities were much smaller, and the total output of research was easier to follow. Hence, everyone could more easily keep track of what was going on, and the need for surveys was much less pressing. However, nowadays, many more people are doing research, and it has become much more important to coordinate its efforts by summarizing areas of work, but academia's prestige function has not yet changed accordingly.
Some theoretical research works, especially in mathematics or theoretical computer science, are very elegant and compelling even though they are completely abstract and not motivated at all by practice.
At the other end of the spectrum, e.g., in practical computer science, there are papers which are very important but somewhat ugly, e.g., they want to do some practically relevant task like image processing, efficient numerical computations, etc., and so they have to handle lots of unpleasant practical details about the precise task, the computing architecture, etc. Writing such papers often require deep familiarity with the task and with what is actually done by practitioners (e.g., in the industry).
Of course, the best situation is when it turns out that a highly practical and important problem can be re-cast in abstract terms as a purely theoretical and interesting question, so that the paper can make significant contributions in both areas. Sadly, at least in computer science, I see a lot of papers that shoot for both goals and end up failing at both.
Many computer science papers study a problem vaguely motivated by practice, so they take into account some practical aspects of the problem that make it ugly to describe and unappealing beyond their specific field. Yet, at the same time, the authors have no real connection to practice: they do not know the concrete problems faced by practitioners or users (provided they even exist). The experiments are on unrealistic synthetic data, or they do not compute anything that someone actually needs. Such papers do not make a useful theoretical contribution, because their problem is not natural and entirely motivated by a specific application domain, and at the same time they are not practically useful for applications: because there is no one actually interested in that problem, because the model neglects some important aspect of the problem, or because practitioners are all using existing systems on which no one will actually bother to implement the proposed approach. This leaves one wondering about what the paper is good for...
Academics paper often have a recognizable flavor of bad style. This is partly by conservatism, partly by negligence, partly because they are often written by non-natives, partly because they try to sound impressive. Mostly, though, it is mimicry: you get this bad habit by reading papers and by imitating them so that your work looks like a paper and is more likely to be accepted. Even when you realize that this style has problems, it's very difficult to avoid it.
Giving a complete course on writing style would be beyond the scope of this work (ah, did you notice how academese is becoming increasingly ubiquitous in my writing?). Anyway, examples include: abuse of the passive voice, preference for complicated words ("we construct" rather that "we build"; or "we utilize" for "we use"), weasel words especially followed by hollow citations, etc. See some related complaints on Matt Might's blog.
It often happens that different communities come up independently with the same notion, and give different names to it.
An example from database theory: tuple-generating dependencies, existential rules, and Datalog+/−. These three names all refer to exactly the same underlying concept. Everyone knows this, but no one bothers writing it down. The main difference is that these terms will be used by different communities (respectively: database theory; knowledge representation; semantic Web and data integration), with slightly different connotations. Of course, variants of these languages carry different names too, e.g., what Datalog+/−-people call "plain Datalog" would be called a "full TGD" by TGD people, and I don't know what the name would be in existential rules parlance.
Another example is the constraint satisfaction problem (CSP), which has its own community, but is also studied as graph homomorphism or conjunctive query evaluation in database theory. Each of these contexts uses different names: the left-hand-side and right-hand-side graphs in CSP will be called the query and instance in database theory, and the variables and the solution in CSP. Each context carries different assumptions: in CSP the structures are usually hypergraphs whereas in graph homomorphism they are graphs; in database theory the problem is often studied assuming that the left-hand-side structure is fixed whereas in CSP it is the right-hand-side which is often fixed; etc.
Yet, the relationship between these terms is hard to figure out for the uninitiated25. The reason why no one writes a Rosetta stone between them is because this is well-known to everyone in these circles, it is part of the oral tradition of these communities, so spelling it out would be just stating the obvious. The reason why these communities do not get together and agree on one single term is because there is no one to coordinate them. Beyond the confusion, these multiple vocabularies have another disadvantage: if you write a paper for community X, but decide to present it to neighboring community Y, then you need to do all of the rephrasing, i.e., what database people would call "a database", or sometimes "a relational instance", becomes a "set of facts" in knowledge representation, an "A-box" in description logics, a "structure" or a "model" in logics...
Some research communities work better than others. However, when you get started in research through an internship or PhD, you enter some random community and you have no idea what it's like in the others, so it can be hard to tell if your community is dysfunctional. Further, for policymakers who decide which research should be funded, it is hard to figure out from the outside which research communities are working well and which are not.
The main failure mode I have seen for scientific communities is that they grow complacent, and evolve towards a mutual admiration society of people who know each other and promote each other's work, even if it has little intrinsic value and no impact outside of that community. This is hard to identify, because scientific papers are often completely unintelligible to outsiders; hence, it is difficult to prove that they are worthless, short of pulling out a hoax like Alan Sokal did. The value of a community's work can sometimes be measured indirectly: do all people in this research community know each other well, are there outsiders (e.g., students) who enter the community, are the results of the community used by people from the outside (from other scientific fields, or from the industry)? Another test is how "welcoming" the community is: in particular, do they happily welcome people from other communities who manage to solve some of their problems, or do they jealously protect their turf?
Even if your research community does not seem too dysfunctional, it can be interesting to wonder if what you are doing is really of value to anyone outside of the community; and if not, then whether it nevertheless has some intrinsic value. I have spent hundreds of working hours on papers that I expect will have zero impact outside of their communities. How normal, how valuable is this? We should be worrying more about these questions, but often we don't, because we just care about getting papers accepted, and because it's uncomfortable to face the perspective that what we are doing is useless.
A second failure mode for communities is that they may become too competitive. For instance, they expand too fast (e.g., because of hype), and suddenly too many people are competing to publish in their venues, which become excessively selective. Hence, perfectly fine work gets rejected, leading to a waste of reviewer and author effort because of repeated resubmissions. Further, it may encourage desperate researchers to commit various kinds of fraud.
One particular flavor of competition is the obsession of some communities with particular benchmarks and metrics. Once many people have tried to optimize performance on one specific benchmark, measured with one specific metric, then it leads to a series of incremental papers that try to optimize every last percentage point. For instance, I have heard that the machine translation community uses the BLEU score for evaluation, even though everyone has known for a long time how flawed it is: can one really measure the quality of a translation simply by looking at its set of n-grams? I have also heard that lots of research effort in computer vision goes into optimizing performance on the ImageNet dataset, at the expense of other datasets. In database research, whenever one needs a benchmark, the usual ones are those from TPC, but few people stop to think about how sensible it is. When all papers focus on a specific benchmark, it becomes harder to publish other kinds of research: What if you want to propose a new benchmark or a new metric? What if your method optimizes a different criterion than performance, e.g., it is much faster, or much simpler to implement? This will not look good to reviewers: they will only care about your score on the usual benchmark, because quantifying performance relative to a benchmark is easier than quantifying the relevance of the benchmark.
Deliberate academic misconduct sometimes happens. I'm not especially familiar with the issue and have fortunately not seen this kind of problems happen in my community, so I'll just list pointers to the most obvious problems.
Most of these problems are specific to experimental sciences:
In theoretical research, which is a much smaller world, cases of deliberate misconduct seem rarer in my experience (or at least, they are less widely publicized). However, mistakes sometimes creep in, so it is not uncommon for published results to turn out to be wrong, sometimes spectacularly so: see for instance this paper which refutes26 what appears to be severely erroneous claims from earlier papers. Deliberate misconduct would probably be quite difficult to detect: one could imagine researchers pretexting that a statement is proved when the proof actually doesn't exist, or when the authors know of a hidden gap in a proof... See "Is there such a thing as fraud in mathematics?".
Not all kinds of research investigations are ethical. I am less familiar with this issue because I work on theoretical research, but the need for research ethics is well-understood in many practical fields, especially those that involve experimentation on humans or on animals.
However, in other fields, the importance of ethics is not understood.
For instance, artificial intelligence researchers have been criticized for studying, e.g., how to determine sexual orientation from a photograph, or how to guess nationality and ethnicity from a name, or how to build lethal autonomous weapons...
Conferences are an important mode of scholarly communication. In computer science, they are the main venues where papers are published. This section outlines some problems with them.
Many of these problems are also covered in this study that surveys scientific conferences for best practices in multiple fields, and estimates how widespread these practices are: see also their data.
Conference talks, when they are good, are a nice way for people to get acquainted with a paper, and slides are often pretty skimmable, so both are useful complements to the paper. Yet, the talks at conferences are second-class citizens.
While conference papers are carefully preserved, the associated talks are usually not recorded, and the slides are also lost unless the authors puts them online, and then they only survive until the author's webpage disappears. This is very annoying; I sometimes want to look again at the slides of a talk which I remember from a past conference, and then I don't manage to find it. Or say I'd like to watch the presentation of a paper given at a conference that I didn't attend, but it wasn't recorded. This is also important for people who skip conferences because of the environmental impact or because their university cannot afford to pay for it: the total cost of a conference trip often adds up to 2000 USD or more.
Even the program of a conference is an important piece of information, because you will sometimes refer back to it years later to remember what was going on at that conference: however, the websites of conferences are usually not archived and eventually disappear.
Many talks at scientific conferences are not interesting to the audience, who often end up doing something else instead of paying attention.
There are several common issues with conference talks. One common problem is that they are often not at the right level of detail. Some talks are very high-level, others are very technical. The first kind of talk is boring for people already familiar with the topic, and the second is boring for those who aren't.
Another problem is that the talks are not always relevant to your interests. The solution for this is to organize them into thematically consistent sessions, but it is often difficult to group paper by themes and schedule the teams without having related themes overlap. Further, when running sessions in parallel, you often make the conference less socially compelling, because everyone's experience of the conference becomes different.
Last, some talks are simply low-quality. I won't go into the question of how to make good talks because there are other tutorials about this, see e.g. here. Another common reason for bad talks is the language barrier. The underlying problem is that submissions are selected based on the submitted paper, not on the talk; so once the paper is accepted there's no way to tell how the talk will be.
The low quality of the talks is part of the reason why conference attendees routinely spend entire sessions replying to email or doing something else on their laptop. I remember that this puzzled me a lot initially: if your university pays for your trip to a remote country, of course you are going to listen carefully to the talks, right? Wrong. As it turns out, academics usually have too many things going on, so they don't spend time extracting meaning out of a talk that is not relevant to them. I'm not saying that this is a bad thing, just that the quality of conference talks is not always well-adjusted to the needs of the audience.
Good conferences are often international and draw a worldwide audience to a different place on Earth every year. Attendees usually travel to the conference venue by plane. However, travelling by plane has a high footprint in terms of carbon dioxide and other greenhouse gases, and contribute significantly to climate change.
Is it reasonable for researchers to travel to multiple conferences every year, each time covering ten thousand kilometers to stay only a few days at the conference and give a talk? Sometimes the talk is as short as 10 minutes, or there is no talk at all. Sometimes the stay is just for 1-2 days, and it's a necessity rather than a choice: I have seen busy academics complain about having to go to a conference at the other end of the world, give a talk, and leave the next day because they needed to be somewhere else.
This is a difficult question to answer. Of course, talks could be done remotely using videoconferencing, as could most other formal conference functions. But it is hard to replicate the social aspect, which is a crucial point. Academics go to conferences to network, to meet new people in person, to start collaborations with them, and have informal conversations at coffee breaks. For such purposes, it's hard to replace to the real world... This being said, even travelling to a conference is done for the intangible benefits of having met a few people and having chatted a bit with them, is this really worth the greenhouse gas emissions that it generates?
Conferences have other kinds of environmental impact beyond air travel. My pet peeve is the incredibly silly tradition of the conference bag and conference goodies. When you arrive at essentially any conference, you are offered a crappy bag bearing the conference logo. These are cheap plastic imitations of a laptop bag, which will survive for a few days at best. Everything about them reeks of utter crappiness. Further, in addition to the badge and conference program, the bag is often filled with lousy stuff that you don't need or want, including such improbable gifts as plastic gourds, suitcase labels, stapleless staplers, etc. The only useful materials for a conference are pens, paper, badges, small printed programs, and maybe sponsor materials, in cheap tote bags let's say. Another area to reduce the environmental impact of conferences is catering, as food production has a significant environmental foodprint.
Conferences have firm deadlines, because they need to take place at a given date. If you miss the deadline, you have to wait for another deadline; at worst you wait for one year until the next occurrence of the conference. In computer science, these deadlines are the main incentive that gets researchers to finish and publish things, but there are some valid questions to ask about whether this is the right model.
The main question is: does having a firm deadline encourage you to be more productive, or does it pushes you to make sloppier research? I think it may be a bit of both. I do believe that deadlines often give a useful kick to finish writing papers that have stalled because the authors have been busy with other stuff; but sometimes this goes too far. Consider for instance the "abstract submission" deadline of many conferences, where you have to submit a title and one-paragraph abstract (for reviewer allocation), and where the actual deadline for the paper is one week later. In most conferences, some proportion of abstract submissions fail to materialize into a paper. This means that the authors were not sure, one week in advance, of whether they would manage to finish the paper in time or not...
This haste probably causes bugs to creep in papers. In experimental computer science, it is common for papers to be submitted at the last minute, just after the experiments finished running. In such situations, finding a bug in the experimental code will mean that you miss the deadline and waste several months, so it certainly doesn't encourage researchers to look carefully for bugs...
Another dangerous side effect of conference deadlines is that, when people are deadline-driven, they never get around to doing important things that have no attached deadline, e.g., journal submissions, journal reviews, longer-term research, etc.
Academics rely on Web applications to manage conferences and journals, to submit papers, allocate them to reviewers, etc., and the crappiness of these tools is legendary.
For computer science conferences, the situation is not so bad, as most conferences use EasyChair, which is not great but not too awful either. However, journals are much worse: in most cases, journal management tools are incomprehensible and unmaintained, and people interact with them as little as necessary. I have seen journal editors that would only use the journal management system once everything had been arranged by email beforehand.
Another subtler problem is that these tools are proprietary: the code of the system is secret, and the database of the system is owned by the corporation that runs it, e.g., EasyChair Ltd for EasyChair. At least these corporations are not as bad as scientific publishers, because they don't assert copyright over submitted papers; and I'm not claiming that academia shouldn't enlist the help of companies to host its computing infrastructure. However, as EasyChair is not free software, it means that you cannot easily change the way that conferences are run (e.g., to fix some of the problems outlined in this document). Code is law27 and whoever owns the code of scientific conferences decides on their laws. In addition, no one except EasyChair has access to the EasyChair database, even though it would be a very interesting dataset for meta-research: study publishing trends, keywords, reviewing, etc. In principle, once suitably anonymized, this data should belong to everyone as a supplementary output of research.
In this section, I point out problems about how research is organized as a social community, and also about the relationship between society and research.
The overall number of research positions has significantly increased over the last decade, but this has not matched the increase in the number of students in universities and in the number of prospective future researchers. As a consequence, obtaining research positions is now ridiculously competitive.
In France, in 2014, there were 7014 defended PhD theses28 in STEM topics, but about 660 tenured researchers in these areas were recruited that year29: i.e., less than 10% of PhDs in STEM will hope to have an academic position somewhere in France30. This percentage varies across the fields; it is probably somewhat higher in computer science, and much lower in the humanities.
Of course, it would not be reasonable for all PhD students to obtain an academic position: researchers usually supervise more than one PhD student, so, in a stable system, only one of these students on average can take such a position and replace their advisor. It is true also that PhD graduates can go work in the industry afterwards: yet, supervisors do not always do a good job at encouraging PhD students to take industry positions, often because they are not themselves familiar with the industry.
In any case, 10% is very low, even while it is fairly clear that there is no shortage of important questions to study, nor is there a shortage of competent PhDs who would make great researchers. The insufficient number of academic positions has dire consequences:
A related problem is that the salaries of academic positions are usually low relative to the level of study required to reach them. In France, a newly recruited maître de conférences (tenured associate professor) can hope to earn around 2000 EUR/month after tax. In computer science, at least, this is way below what you would earn in the industry.
A secondary effect of this exceedingly competitive job market is that it also makes research more competitive. The pressure to publish (dubbed publish or perish) is very strong on PhDs and postdocs. Competition is not necessarily bad in itself, but pushing competition to unreasonable levels results in bad outcomes, like low-quality and short-term research, academic fraud, and burn-out.
Academics spend a considerable proportion of their time doing things that are not related to research.
The most obvious such occupation is teaching. Of course, teaching is a very important activity, and it often makes sense to do it conjointly with research. However, the trap is that teaching yields immediate external benefits (ensuring that classes are taught), but most research only brings long-term benefits. Hence, both universities and the state will push as much teaching on researchers as possible, and increase the teaching load rather than hiring more people, which comes at the expense of research.
Beyond teaching, researchers spend lots of time on random administrative tasks like applying for funding, organizing their trips, managing seminars, booking rooms, filling reimbursement forms, managing grant money, organizing timetables for classes, etc. For many of these tasks, researchers would appreciate to have administrative support from their universities, and it would be cost-efficient if they had secretaries to whom they could delegate such tasks. However, like for teaching, it is cost-efficient for universities and the state to pay less secretaries (an immediate benefit) even if it means that researchers can devote less time to research (a long-term loss).
Tenured researcher positions provide you with a salary (which is not always generous) but researchers have a hard time securing funding for other tasks.
In particular, researchers need to raise money to do the following:
Some university departments provide money for this, but the amount is often insufficient. Hence, researchers have to apply for funding from other sources, e.g., the ANR in France, the ERC in Europe, private companies, etc. Of course, it makes sense for researchers to apply to exceptional funding for specific projects, but all too often you need grants just to cover basic costs not actually related to a specific program. Further, the funding system has a lot of drawbacks:
One could think that writing grants is not completely useless, because it encourages researchers to think about what their goals are. Sadly, at least in my field, there is a large gap between scientific goals (the ones that researchers actually pursue) and purported "societal" goals that are put forward in grant applications. If scientific papers can be misleading, grant applications are usually farcical. You have to fit into pretentious social goals like "Innovative, integrative, and adaptive societies"34, concoct precise schedules over several years to describe the research that you will be pursuing and the result that you will find, maybe promise "deliverables" that no one will ever use. The situation has gotten so absurd that public universities are now paying private companies (e.g., Yellow Research) to train their researchers to apply to public grants!
Another complaint about research funding is that, beyond bureaucratic inefficiency, there is lot of funding which is supposedly designed for "research" but actually finances many other things. A well-known French example is the Crédit d'impôt recherche (CIR), a tax break for companies who claim that they are doing research. I have met people working for private companies who advise other companies on how to do tax optimization and present their expenses in a way that allows them to save money thanks to CIR... While the outcome of CIR in terms of research is very debatable, the money poured into CIR every year is almost twice35 the yearly budget of CNRS, which is Europe's largest fundamental science agency...
Because research positions and research funding are now extremely competitive, the pressure to evaluate researchers has increased too. However, it is extremely difficult to evaluate researchers seriously: very few people are qualified to understand what a researcher does, and most of these people already collaborate with the researcher (or are in their research community) so they are probably biased. What is more, the impact of research is usually very long-term. Faced with such incommensurate difficulties, hiring committees and evaluation committees prefer to use measurable indicators which (they hope) correlate well with research performance (whatever this means). Individual researchers are evaluated by hiring committees and evaluation committees, and research laboratories are themselves evaluated, e.g., in France, by the HCERES, using similar indicators: all this evaluation system means that researchers spend time producing reports, inventing hierarchies of teams and projects that look superficially consistent, and evaluating other labs...
Of course, the use of meaningless indicators is a widespread social problem nowadays, which is not limited to academia. However, it is impressive how much academia has been influenced by this cult of indicators, and how deeply those indicators are ingrained in the mind of academics.
The main number that summarizes the performance of a researcher is their publication count. As it turns out, counting papers is much simpler than actually reading them or evaluating their quality. This has a number of unfortunate consequences:
Because of this shallow evaluation and the competitiveness of academia, some very mundane questions start to get an unexpected importance. This is the case of the order of authors on papers for instance: see also this post. For this reason there is a push to randomize the order of authors of papers, e.g., this tool. See more generally the question of author-level metrics.
To prevent the most obvious forms of abuse, beyond the publication count, committees also look at publication quality, but again estimated through an indicator, namely, the "rating" of the venue (conference or journal) where the work was published. Hence, conferences and journals are classified into rank A, rank A∗, rank B, etc. This ensures that researchers, when they want to publish something, will spend time submitting it to the top venue, then to a slightly less prestigious venue, etc., until it is accepted somewhere. Some researchers will only submit at certain venues because their institution has a closed internal list of which venues are "good". These rankings also means that venues, to keep their high ratings, feel like they should maintain a low acceptance rate, no matter the proportion of submitted papers that are interesting. Of course, not all venues are equal, but there is too much focus on formalizing the pecking order among venues, just because of its disproportionate importance in evaluation. It is true that there is a legitimate need to discuss which research is important and which is not, and which communities are doing more important work than others, but the shallow discussion of conference ranks is only a distraction from this crucial and difficult debate.
Another indicator supposed to reflect paper quality is the h-index, which incorporates some measure of the number of citations of a work. The accounting is vague, however (do self-citations count?), and the resulting number doesn't mean much: the system has been gamed for fun and lampooned with the ha-index. What's more, it tends to favor larger communities, in particular theoreticians often have a low h-index simply because theory is smaller. Looking at citations is not an unreasonable way to estimate the impact of a researcher's output, but it shouldn't be the only way, and the analysis should be qualitative rather than quantitative because not all citations mean the same thing.
As researchers are evaluated on publication-based indicators, they are not incentivized to do other activities. I have already mentioned peer review. Another activity is teaching: when researchers are evaluated only on their research but not on their teaching, they tend to think of teaching as a waste of time, with predictable results in terms of teaching quality. But in addition to teaching, there are two other activities are not encouraged at all, even though they are a crucial part of a researcher's job. They are: communicating your results to a wider audience (e.g., popular science), and communicating your results with the industry so that they can be applied (technology transfer).
In principle, these activities are encouraged, and everyone agrees that they are very important. Yet, they do not matter much in evaluations. It's nice if you can popularize what you do; it's nice if you can have an impact on industry, but if you have less papers than the other candidate, tough luck. Researchers who does not produce academic papers are not held in high esteem by their colleagues, but no one blames researchers who never communicate outside of academia even though this is obviously unsustainable.
Academia is not a very diverse environment, especially in terms of skin color, country of origin, and gender (especially in computer science). Some of the underlying issues do not originate from academia, e.g., the lack of women in STEM; but academics have a responsibility to fix these issues when they teach in higher education. Furthermore, academia certainly introduces some bias of its own, as it is can be actively hostile towards under-represented groups, e.g., towards women.
This is a difficult point for me to elaborate upon, as I am not myself part of any under-represented group, so I don't have first-hand experience of how the problems that academia introduces. Nevertheless, I can try to echo what members of such groups are advocating, to help academia correct for the surrounding biases and avoid introducing more bias.
While tenured researchers often have the freedom to be their own bosses, non-permanent researchers are usually supervised by other researchers: interns and PhD students have advisors, and postdocs often have a principal investigator or some other kind of mentor, depending on their status. However, not all researchers are doing a good job as supervisors.
The root of the problem is that researchers are recruited for their abilities to conduct research; but such abilities do not necessarily correlate with their human capacity to act as a good manager (to put it mildly). What is more, tenured researchers are usually expected to start supervising students without having ever received any kind of training about how to do this: their only asset is their own experience as a student, which can lead them to reproduce the same mistakes that their own advisor did with them. The general cluelessness of many advisors is bad news for prospective PhD students, who often have limited opportunities to screen their potential supervisor before committing to a multiple-year working relationship with them.
The main kind of failure mode for supervisors, in my experience, is that they will not help their students enough, or that they will not be reliable when the student needs them. This can be because they want the student to find their own path and succeed on their own, or it can because they are overloaded and do not have enough time for their students, or because they do not care much about their student's needs. This kind of distant supervision gives the student complete freedom, but it can be very dangerous: the student may waste their time in misguided research efforts, they may become depressed because they do not know what to do and how to move forward, or they may simply get stuck because their advisor is not even able to take care of the bare minimum that is needed for them to defend (paperwork, defense scheduling, etc.). It is far too common for PhD students to complain about not seeing their advisor enough, or about waiting too long to get feedback on their work, or generally not knowing what to do next; from my experience this is especially common in the humanities, though there are considerable variations across fields and especially across advisors. All told, the effects on students can be catastrophic, especially if they are taking the competitive academic game too seriously.
The second kind of failure mode seems to be more rare (at least in France), but more common in other places: it is advisors who push their students too much, and ask them to work unreasonable hours, to squeeze every possible drop of productivity out of them. This is often done for the advisor's personal gain, as they get co-authorship of all the papers produced by their students, which helps further their own career. Such abuses often goes hand-in-hand with harsh behavior from the advisor, with predictable consequences on their students' mental health: see this article for instance, or this question, or this article...
Other more extreme failure modes are possible, including harassment by the supervisor (e.g., here or here), or plagiarism from senior researchers (e.g., this horrifying story).
Of course, inadequate supervision can also be found outside of academia, but academia is notorious for being unable to address such problems: there are few ways to fix inadequate supervision, and little accountability when an advisor is doing a bad job. For students, the only recourse is usually to appeal to some ombudsperson at their department, university, or graduate school, and ask them to find a different advisor. However, this can be tricky, as the new advisor needs to be knowledgeable about the student's ongoing work, yet not be too close to the previous advisor (or risk putting themselves in an uncomfortable situation). In any case, for the incompetent or malicious advisor, there is usually no career consequence beyond losing their student to someone else: in particular, there is nothing preventing them from trying their luck again with new students.
This problem is not specific to academia, but it ought to be mentioned: in fact, it is so obvious that I almost missed it. Here it is: except for conversations between people who happen to share some other language, absolutely everything in academia happens in English.
This fact causes many issues:
Of course, we would like to believe that academia evaluates ideas and not their expression, and that a paper with profound ideas will be well-received even if it written in bad English. Sadly, this is not true: well-written papers are much more compelling, and papers which are completely incomprehensible will get rejected even if the obstacle is only linguistic. Likewise, talks given by native speakers of English tend to give a better impression than those delivered with a thick accent and minimal vocabulary.
I will not say much more about this point, because when people complain about the role of English in academia, they often believe that the problem has a definite solution, and usually one that involves restricting the use of English somehow. I personally don't think that this problem has a clear solution; and if there is one, it probably involves more English rather than less.
In this section, I conclude the essay by pointing out some meta-problems in academia, i.e., figuring out why it is so difficult for academia to solve its own problems.
Why is it so hard for academia to address the issues pointed out in this document? I think that the main cause for academic inertia is that no one has the power to push for change.
Of course, advisors can influence their students, but they will usually not encourage them to fix problems that they are not already trying to fix themselves: besides, when you look for tenure, you usually have to settle for conservatism at first. Funding agencies, evaluation agencies, department heads, all have some impact on researchers, but they do not have real authority: it is hard for them to impose anything new, because researchers still enjoy considerable freedom, and tend to be suspicious of external interference.
Senior academics probably have some influence on their communities, e.g., those who are in executive committees for important conferences, in hiring committees, etc. Yet, as they are the most established people in the system, they are usually not the ones campaigning for change: and even those who would like to change things are usually too busy with other tasks to be able to push very hard. Further, the influence of senior academics over their fellow researchers is nothing comparable to, e.g., the power of a manager over its employees.
Do not get me wrong: academia's decentralized nature is a feature, not a bug. The lack of leaders ensures that researchers enjoy academic freedom, and that no one is telling them how they should be spending their time. However, this organization also has drawbacks, because no one has the power to make things move in the right direction. Compare this to companies where a boss can take action against what is broken in the company organization. Of course, these efforts may be ill-advised, in which case the company suffers (so employees leave and competitors triumph), but sometimes a benevolent dictatorship can do wonders. In academia, no one has the authority to point out the bugs in the system and ask everyone to do things differently.
Hence, traditions are mostly fixed, everything moves very slowly according to consensus, and getting people to agree on anything is cat herding at its best.
The conservatism of academia is still something of a surprise, even when you take its decentralized nature into account. Tenured academics enjoy considerable freedom, indefinite employment, and little obligations. Why aren't they working harder to fix the bugs in the system? In a company, even if you have legitimate concerns about the way things work, you will probably have to continue doing your job, because you risk being fired otherwise. If you are a tenured academic, and you have concerns about some task, in most cases you can just refuse to do it until your concerns are addressed, and it won't put your job in jeopardy. Given that tenured academics thus have almost total freedom to fight for the causes in which they believe, why are so little of them taking a stand?
When you put it like this, it is very strange to see how academics choose to put up with tasks that they could refuse to do, even when they perceive some obvious problematic implications, e.g., why are so many researchers publishing in closed-access journals and reviewing for them even when they believe that open access is the way forward? I can see several explanations. The most obvious one is that, while firing a tenured academic is complicated, you can still pressure them because they need to secure funding. You can also pressure them in terms of career evolution, and with some weaker forms of hierarchical control: e.g., the director of a lab may be able to control how researchers in the lab spend their grant money, how offices are allocated, etc.
The second explanation is social pressure: academics feel like they are part of a community, and are terrified by the perspective of being disapproved or patronized by their peers, also because getting ostracized would make it much harder to move forward in their career. Hence, researchers don't like to express disagreement with members of their community, even when these community members do not have genuine hierarchical power over them. In other words, researchers do not want to take a stand on their own and risk being the odd one out.
The third reason is recognition. Academics are often ego-driven to some extent, and academia has many carrots for the ambitious: having a submission accepted, being cited, being invited to a conference, being awarded a prize, etc. Unfortunately, as prizes are usually bestowed by the establishment, having strong ethical opinions will probably prevent you from getting them. Even when you know that this recognition is very superficial and essentially local (i.e., no one outside academia cares), it is easy to develop a taste for it and become addicted to the rules of the academic game.
The fourth issue, which may be the most fundamental, is that the freedom of academics comes with great responsibility. If you have the right to do what you want, then it is you who must decide what you want to do, and this is extremely challenging if there are no constraints whatsoever to guide you. Should you be working on this collaboration, or on that one, or some other kind of project? Should you completely give up on your current field of research and move to some other field? Academic freedom is dizzying, and it is somewhat stressful because you still feel a sense of duty towards the institution and society that pay your salary. Besides, you have very little feedback: the importance or insignificance of your work will not become apparent before years or decades. Which task is more useful: writing a paper about some topic or other, teaching a new class, or writing a long rant about academia like the one you are currently reading? Like for life's toughest decisions, only time will tell what the right choice was. In summary: researchers have complete freedom, some form of accountability, but unclear objectives, and no feedback.
I believe that this unpalatable cocktail is why meaningless indicators have taken such a hold of the minds of academics. Indicators allow researchers to replace their unclear objectives with a very concrete goal: getting papers published. Indicators provide feedback by creating a measure of success: the number of published papers, the number of citations, the h-index. Accountability is also solved, because the function that you optimize neatly matches the one used by authority figures to evaluate you. The existential anxiety of freedom also vanishes, because producing papers keeps you very busy and ensures that you can't spent any time thinking about how you should be using your time. All that researchers need to do is to follow the academic dogma: publishing PDF files in traditional conferences and journals is somehow good in and of itself. From that point on, they can just optimize for this indicator, and avoid the hard question of choosing the right things to prioritize.
This section lists similar initiatives and further reading about questions and problems in academia:
I am grateful to the following people, in roughly chronological order:
The general uselessness of publishers seems to be true specially in STEM. In some other fields (especially the humanities), publishers seem to be contributing more to the process: they are still small companies with some expertise, they really help with editing and promotion to a non-specialist audience, and still distribute articles on paper. Even in such cases where the publisher contributes to the article, I find it problematic that they assert copyright and that the public cannot access the results of the research, but the argument is less obvious than in cases where the publisher is not helpful at all. ↩
It would appear that the best solution for authors is to put copies of their work online as early as possible under a permissive Creative Commons license, as this ensures that the author copy of the work can be freely circulated even if a restrictive agreement is signed with a publisher afterwards. Indeed, this is the so-called rights retention strategy approach advocated by cOAlition S. It remains to be seen whether publishers will accept this -- currently there are only two journals who do not accept it. That said, even when the author version is available under Creative Commons, the existence of the publisher version often causes confusion about which article is up-to-date (e.g., integrates reviewer feedback). ↩
Source: ACM Author Rights. Because of how their website is designed, is not possible to make an exact link to the right page: to see the prices, you should click on "FAQ", then click on "What are ACM’s Open Access charges?" ↩
Of course, there is no apparent relationship between these prices and the actual cost of publishing the articles. Prestigious venues usually charge more just because they can. ↩
The number of papers is from here, summing from Jan 2016 to Dec 2016. The budget for 2016 is here; I don't know what it means that this is a "midyear" budget, but the costs are in line with the other years. ↩
Source: LIPIcs: Article-Processing Charge. I write "at most 60 EUR" is because they are transitioning to this: currently, the fee is less. ↩
Scientific publishers generate unusually high profit margins; see, e.g., this article, or this answer. ↩
To understand why closed-access publishers are working against the dissemination of science, a good argument is the garbage strike test. Specifically, if all closed-access publishers suddenly disappeared, in all probability they would swiftly be replaced by an open-access system that would work much better to do the same thing, e.g., a legal Sci-Hub. In my opinion, the negative effects of closed-access publishers on the dissemination of science (e.g., making copyright claims against close Sci-Hub, and more generally limiting the free redistribution of articles) outweigh the positive ones (i.e., the paywalled system they are currently operating). ↩
Source: ISWC'20 call for Research Track Papers and HTML Submission Guide ↩
Source: WWW'95 call for papers; search for "syntactically correct HTML 2.0" ↩
Source: WWW'20 call for papers; search for "must be in PDF format" ↩
Another problem is when the research depends on proprietary code, although this is ethically problematic on its own, especially in math: can you trust mathematical results that you cannot reproduce without opaque proprietary software? ↩
Source: Statistics - Records in DBLP ↩
See for instance the LIPIcs instructions (look for "Typesetting instructions"): in particular, notice the request to remove all unused macros, refit your article in one file, remove comments... Traditional publishers have similar rules, or worse. ↩
See for instance the call for papers of IJCAI 2016, where the cost is 275 USD/page. ↩
Not everyone agrees that the identities of reviewers should be kept secret, and some advocates of open peer review would argue that all reviewing should be done in public. I do not think I agree with this myself, but it is certainly an interesting suggestion. ↩
The page for the 2015 committee only links to committees from 2008 to 2012. ↩
The 2013 committee has a page, but the 2014 committee is a placeholder, and the committee is not mentioned on the website of subsequent editions. ↩
Fortunately, there are some exceptions. In particular, I hear that there are effective SAT solvers who are developed and maintained by the community around the SAT conference, thanks to a healthy culture of organizing competitions and publishing their code. ↩
See here for another article making a similar point (in French), paragraphs 3 sqq. ↩
The idea for this point partly came from attending a talk by Raghavendra Gadagkar and reading Is the scientific paper a fraud?. This latter article deals with biology, but the general notion applies more broadly, and these words really resonated with me: "The scientific paper [...] misrepresents the processes of thought that accompanied or gave rise to the work that is described in the paper. [It embodies] a totally mistaken conception, even a travesty, of the nature of scientific thought." ↩
A catchy expression of a related idea is in the quote by Ovid: "So art lies hid by its own artifice." ("Ars ardeo latet arte sua"). There is also this Paul Graham essay, which says: "In writing, as in math and science, they only show you the finished product. You don't see all the false starts. This gives students a misleading view of how things get made." ↩
See here (point "Primo") for a similar complaint (in French). ↩
I did not verify myself the paper nor the original articles. ↩
This quote is by Lawrence Lessig. ↩
See Repères et références statistiques 2016, Ministère de l’enseignement supérieur et de la recherche, p253. ↩
See L’état de l’emploi scientifique en France 2016, Ministère de l’enseignement supérieur et de la recherche. Specifically, there were 482 maîtres de conférences recruited (p84, first table, summing the first 7 cells of the third column with numbers), and around 220 chargés de recherche (p101, summing the first 8 cells of the second column with numbers gives 308 for chargés de recherche (CR) and directeurs de recherche (DR); I estimated that the number of CR was 179 by estimating the CR/DR ratio as 9874/16951 based on table 1 p96). ↩
It is true that many PhD students do not wish to pursue an academic career, so it would be better to measure the ratio between the number of postdocs recruited every year, and the number of tenured researchers recruited every year. However, I have not managed to find usable statistics for the number of recruited postdocs in France, probably because there are many different statuses and because postdocs are recruited in a decentralized way (i.e., directly by universities or by projects); by contrast, tenured researchers are state civil servants and PhD diplomas are overseen by the state. I was able to find some statistics about the number of currently employed non-permanent researchers in L’état de l’emploi scientifique en France 2016, Ministère de l’enseignement supérieur et de la recherche, but I couldn't find the numbers of new recruits per year, and it seems complicated to compare these numbers from one year to the next because of changes in accounting. ↩
This study purports to observe a Matthew effect (aka "rich get richer") where scientists who receive grant funding become more likely to receive more grant funding later. ↩
For instance, in 2016, the computer science committee of the ANR resigned in protest over political issues of this kind. ↩
For the 2016 ANR campaign, you had to argue how your project fitted with one of 9 "défis" (challenges) and 41 "orientations". See the call for projects. There is also an excellent parody of the ANR: the ANES. It's often hard to tell them apart. ↩
CNRS's budget in 2015 was 3.3 GEUR (source), and the CIR spending in 2013 was 5.567 GEUR (source, table p10, last line, 5th column of numbers). There are no reports about CIR in 2014 or later... ↩