a3nm's blog

Double-blind reviewing

More and more conferences in theoretical computer science (TCS) are moving to double-blind peer review. This is the implementation of peer review where the reviewers who evaluate submitted articles do not know the identity of paper authors, at least initially. The two major database theory conferences have adopted double-blind peer review (at least in an "experimental" way): PODS since 2023 and ICDT since 2024. Among general theoretical computer science conferences, we have ESA since 2019, STACS since 2021, ICALP since 2022, FOCS since 2022, SODA since 2022, STOC since 2024, and ITCS since 2024. An up-to-date list is maintained on double-blind.org. However, double-blind reviewing is not yet used in all conferences in all subareas of TCS. Further, it is also not commonly used in journals. In fact, I do not know of TCS journals that use double-blind peer review, except TODS which has used it since 2006 with a detailed rationale. See also the discussion of this issue at SoCG, which uses double-blind reviewing since 2023.

I think that this move to double-blind reviewing is a welcome change, and thought I'd try to summarize some of my thoughts about it.

First: should conferences and journals adopt double-blind reviewing? The implicit premise behind double-blind reviewing is that reviewers should evaluate articles based on their intrinsic merits, so that the identity and affiliations of the authors are not relevant for evaluation. When discussing double-blind reviewing, I think it is important to check first if all parties agree about this point. Indeed, this view is not universal: some researchers insist that it would be normal for conferences to evaluate submissions by newcomers with higher standards (see, for instance, this answer on academia.SE); or to the contrary conferences could be more welcoming towards outsiders1. However, if you believe, like TODS, that "every submission should be judged on its own merits", then information about the authors and their affiliations is indeed irrelevant to reviewers (assuming they have no conflicts of interest — see below). The question becomes: does hiding this information make a difference, and is the difference sufficient to justify the change.

Second: how much of a difference does double-blinding make? This is not, in fact, straightforward to evaluate. There has been decades of debate on this topic, and policies have been adopted by scholarly associations in many fields, given that double-blind peer review has been around since the 1950s (according to Wikipedia). Further, many scientific studies have attempted to quantify its effectiveness. The TODS rationale contains some pointers to this literature, and after 2006 one can find many other articles arguing for or against double-blind peer review (see for instance this one and the many articles that it cites or that cite it). Of course, the effectiveness of double-blind reviewing may depend on the community, or on how specifically it is implemented. In computer science, one influential data point was the 2017 WSDM experiment, in which submissions were scored by double-blind reviewers and by single-blind reviewers (i.e., who knew the identity and affiliation of authors). The experiment found that "single-blind reviewing confers a significant advantage to papers with famous authors and authors from high-prestige institutions". Here I must confess that I have not myself read all these works: my point is just that the issue is not simple, and that you cannot dismiss double-blind reviewing simply because you are unfamiliar with it or because you are personally convinced that it does not work.

Third: how should double-blind reviewing be implemented? Here, there is an idea I'd really like to get across: making reviewing "double-blind" does not necessarily mean that it should be impossible for reviewers to deanonymize authors. Indeed, essentially all2 conferences in the list above are using one specific implementation, called lightweight double-blind reviewing. This means that, while papers should not contain identifying details, and while reviewers are advised not to try to identify authors, it is OK if reviewers happen to find out about who the authors are. In particular, authors are encouraged to post their work, e.g., on preprint servers, even if this means a reviewer may find it (deliberately, or accidentally, e.g., by stumbling upon the work before reviewing). Lightweight double-blind reviewing still offers two important benefits:

  • Reviewers are not immediately biased by reading the author names and affiliations on the first page of the paper. In my personal experience as a reviewer, seeing familiar names or affiliations on a paper will immediately affect my impression of the paper and my expectation about the outcome ("probably accept unless the contents are a bad surprise" vs "probably reject unless the contents are a good surprise"). I would like not to be influenced by this, but I doubt I can avoid it, so I prefer not to see the information3.
  • Authors who worry about discrimination and do not trust the reviewers can choose to take steps to be completely anonymous, e.g., they can choose not to post preprints of their work.

By contrast, I do not like double-blind policies that try to guarantee complete anonymity of submissions at all costs, e.g., by prohibiting or discouraging the posting of papers as a preprints, informal talks, etc. I believe this is harmful, because it means that papers are only available when they are finally published4 — by contrast, preprints are immediately accessible to everyone. Unfortunately, there are conferences (especially practical conferences) that follow this route, e.g., SIGMOD 20245. Disallowing preprints can have a chilling effect on an entire field: as papers often end up rejected from a conference and resubmitted at another, authors may eschew the posting of preprints because of the risk that some ulterior, unspecified conference may disqualify their work on these grounds.

Fourth: What are the real problems with double-blind reviewing? Many of the criticism I have heard does not make sense to me:

  • "Anonymizing papers is tedious work." This I really don't understand: removing author names is trivial, writing self-citations in the third person (e.g., "The previous work of Myself et al." vs "Our previous work") feels weird but takes a few seconds... Altogether the impact seems to be minimal6.
  • "It it complicated to host/submit supplementary data, e.g., source code or experiments, in an anonymous fashion." But there are tools like Anonymous GitHub, and guides on what you can do.
  • "Double-blind reviewing is not perfect, and reviewers can guess who the authors are." Well, first, guessing is not the same as being sure; second, it's still useful if you can remove bias in some cases. An improvement can be valuable even if it is not a perfect solution.
  • "It's unnecessary, everyone is honest and exempt of bias." Even assuming that reviewers try to be fair, this misses the point that bias is often unconscious.

To me, the real (minor) shortcomings of double-blind reviewing are:

  • Some journals (and conferences?) are "epijournals", where papers are first submitted to a preprint server, and then reviewed by the journal (and endorsed if they pass peer review). I think this is a great practice, that neatly separates the hosting of papers (which is done for free by preprint servers) from the evaluation process. Unfortunately the interaction with double-blind peer review is not perfect: you cannot send the preprint to reviewers, because of course it includes author information. The fix is obvious, just a bit inelegant: simply ask authors for an extra blinded version of the paper when they submit it for evaluation by the journal.
  • The management of conflicts of interest (COIs) is more complicated. Many conferences and journals have policies to ensure that papers are not reviewed by colleagues, collaborators, supervisors, or personal friends of the authors — or, of course, the authors themselves! When reviewers know who the authors are, they can easily detect COIs and recuse themselves from reviewing the paper. With double-blind reviewing, this is more complicated. Typical solutions include asking authors upon submission to disclose with which members of the program committee (PC) they are in COI, and/or asking PC members to disclose with which authors they are in COI (during the bidding phase). But these are not easy to adapt to journals, where papers are typically sent to editors not affiliated to the journal. Or, for conferences, it does not address subreviewing, where PC members delegate a paper to an outside expert: this expert could end up being in COI with the authors, or be one of the authors, which is especially embarrassing. This is typically handled either by unblinding papers from PC members if they decide to subreview a paper, or by making subreview invitations pass through an unblinded party (e.g., the PC, or a specific organizer not involved in reviewing) who can check for the absence of COI7.

I hope this post can help clarify some points about double-blind reviewing, and maybe encourage more conferences and journals to consider it!


  1. To put it in another way: there are many academic events that are invitation-only, and many public conferences include some invited talks that have been nominatively selected. Whether this is OK or not, and what is the right balance, is a different question. But conferences who make an open call for contributions should be clear about whether they commit to treat submissions from everyone in the same way, or whether some authors are more equal than others

  2. The exceptions are ICDT, which uses the same concept but does not give the name; and FOCS, which does not give specific details about implementation but which I would expect to follow the lead of other conferences. 

  3. I don't know if all reviewers approach their job in the same way, but personally, I'm always worried about misjudging a paper — e.g., be the lone reviewer to advocate for rejection when the others reviewers have better reasons to accept it, or vice-versa. (On reviewing platforms I have used, reviewers typically cannot see the other reviews on a submission before inputting their own — thus forcing them to come up with their own independent assessment.) Of course, it doesn't all come down to the grade, but an important question remains as I read the paper: how will I evaluate it in the end? And I observe myself looking for clues that could give away how other reviewers are likely to evaluate it, e.g., the formatting, the writing quality — or, indeed, the identity of the authors. The same goes for originality: I suspect that a paper that looks very unconventional may end up being rejected just because the reviewers will think that others will not take it seriously (a kind of reverse Schelling point). Unfortunately, I further suspect that this phenomenon is encouraging papers to be more conformist and to avoid standing out, in a kind of herd-like behavior. 

  4. Of course, the final published version will often not actually be available to everyone. So it is especially important to encourage authors to post preprints (and postprints!) of their work. 

  5. The specific wording is: "we request that the authors refrain from publicizing and uploading versions of their submitted manuscripts to pre-publication servers, such as arXiv". That said, later, the call for paper grudgingly concedes that papers can still be submitted if there is an online preprint. 

  6. If we're trying to streamline the publication process, one more immediate area for improvement would be to fix the duplication of effort between submitting preprints (e.g., on arXiv), and submitting "camera-ready versions" of papers that often have a limited page counts and other inessential differences. Or: harmonizing paper stylesheets (e.g., using LIPIcs) and page limits (or removing strict page limits altogether), so as to avoid tedious reformatting work when resubmitting a paper to a different conference. I have spent orders of magnitude more time on busywork of this kind than I have ever spent on the anonymous-vs-nonanonymous question. 

  7. About COIs, by the way: I wish we eventually have a platform to handle these more automatically. Knowing that researchers are often (alas, not always) disambiguated with a unique ORCID identifier, and that there are bibliographic databases (e.g., DBLP for computer science, or Crossref or BASE) with co-authorship information, many COIs nowadays can be detected from public, structured data. This can be refined with affiliation information (sometimes from email addresses, or from ORCID records or sometimes from DBLP), and supervision information (e.g., the Mathematics Genealogy Project). Sure, there are also COIs arising from close personal friendships or other undetectable causes... but it would be nice if manual handling were limited to these — or they could also be given by authors and stored by some trusted third party. Such a system would be useful, like how the Toronto Paper Matching System (TPMS) (cf reference) is streamlining the paper bidding phase for large AI conferences. 

Should conferences still require mandatory attendance?

This post is a reprint of the "Viewpoints" column by Theoretical Computer Scientists for Future (TCS4F), published in the bulletin of the EATCS. You can read it in PDF format in number 139 of the Bulletin.

The issue of climate change has been on our collective mind for decades. Each passing year improves our scientific understanding of the problem, and narrows down our uncertainty about the need to drastically reduce worldwide greenhouse gas emissions. As the window of opportunity is closing, and concrete action is slow to materialize, more and more groups from seemingly unrelated areas find themselves advocating for change.

TCS4F is one such initiative: it is lead by computer scientists, and aims at making research in theoretical computer science environmentally sustainable. It started in 2020 with a manifesto that can be signed by researchers, conferences, and research groups. The pledge taken by signers is to follow a sustainable emissions trajectory: reduce emissions by at least 50% before 2030 relative to pre-2020 levels. The TCS4F manifesto was signed by 199 individual researchers (and counting!), 3 research groups, the 2022 edition of the ICALP conference, and the 3 conferences CSL, STACS, and Highlights of Logic, Games, and Automata.

The contribution of theoretical computer science research to the climate crisis is two-fold. On the one hand, we may be able to improve the situation through our research. For instance, we can improve the efficiency of algorithms and hope to reduce the footprint of the ICT sector — though our efforts may well have the opposite effect because of the Jevons paradox! On the other hand, we should also think about the present impact of our research activities on the environment, and try to adapt our practices to be more sustainable.

It may be unclear at first how theoretical research harms the environment — is it about the consumption of draft paper? Whiteboard markers? In fact, while our activities can emit greenhouse gases in many ways, the main factor in our climate impact is probably long-haul plane trips. Indeed, our research field is structured around international conferences. Their stated aim is to give the community a place to meet, discuss, and exchange new ideas. Prestigious conferences are also the most important means of recognition in our community: they are a must-have on one's CV when applying for research positions. For PhDs and researchers on short-term positions, publishing there is not a choice but has become a vital professional necessity. And, until recently, publishing at international conferences naturally meant that you had to fly across the world and be there.

It is in this context that we launched TCS4F in early 2020. This coincidentally followed Vardi's “Publish and Perish” CACM column, which advocated for optional attendance to conferences in the name of environmental sustainability. As we all know, shortly afterwards, the COVID-19 epidemic moved all conferences online almost overnight. This forced experiment gave us a taste of what could be the closest online replacement for traditional conferences — if organized on short notice and by necessity rather than choice. The situation left us yearning for the golden days of in-person conferences and lively bar discussions in exotic locations, and the question of flight-induced climate change was not very pressing while we were stuck at home during lockdowns.

Once the COVID situation improved, many conferences adopted some kind of hybrid format, pragmatically acknowledging the fact that travel was not possible for everyone. These experiments revealed that it is comparatively easy to accommodate remote speakers, and to stream talks to a remote audience, which some conferences already had experience with. However, integrating the in-person and remote worlds proved challenging, especially for coffee breaks and social events. Based on this, some conferences are now back to firm requirements for in-person attendance, and are making explicit what used to be an implicit rule: “all talks are in-person” at ICALP'23, online talks will be for “travel restrictions or other exceptional situations” at ICDT'24... The intent may be to encourage participants to travel so everyone can enjoy a better conference... or to ensure that universities will continue to reimburse trips. Also, a fully in-person conference is of course simpler to organize, and closer to what we are used to.

These rules arguably reveal an inconvenient truth: many conferences are now attracting participants whose main goal is to have their paper published (at a prestigious venue, and on a predictable timeline), and not necessarily to attend the event. Of course, the general will to travel and meet is still very much alive — as can be seen at events without formal proceedings, such as the Highlights workshop series. But coupling formal publications with an in-person gathering no longer makes sense for everyone.

We argue at TCS4F that decoupling the two is necessary, because plane travel is unsustainable at the scale at which we practice it. Flying across the world to a conference can amount to several tons of CO2-equivalent emissions, exceeding sustainable targets for individual yearly footprints in 2030 (source), and there are no plausible technological pathways for low-carbon intercontinental travel by then. Thus, our position at TCS4F is that, if everyone is to do their part to mitigate climate change, we must fly less — and attend less international conferences in person.

However, I believe that mandatory travel is also a question of diversity and inclusion. In-person conferences are an exclusive club for frequent travelers, and exclude people with insufficient funding to travel, people from countries who cannot easily obtain visas, people with disabilities, and people with caretaking obligations (which disproportionately affect women). For instance, the relative proportion of women participants at the 2020 International Conference on Learning Representations (online) was 20%, versus 15% at ICLR'2019 (in-person) — a 33% increase (source). Our focus on in-person conferences thus overlooks a silent majority of people for which online attendance is the only feasible way to participate. Further, if prestigious conferences are in-person only, then recognition in our community is reserved to the privileged few who can meet this obligation.

Of course, my point is not that in-person conferences should be eliminated altogether. As we all know from the COVID era, online events are not perfect, and in-person socializing has no known replacement. Traveling to conferences is still important, and can be done responsibly — going there by train if possible, picking geographically closer locations, or simply going there less often. Online and hybrid events can also play a role, as do other forms of online research: online videos (for instance on ScienceCast), online seminars (for instance on researchseminars.org), the Theoretical Computer Science Stack Exchange, etc. These new formats are especially promising when they do not try to mimic what already exists, but instead leverage features specific to the Internet: asynchronicity, low friction, low cost, machine interpretability, long-term archival... Overall, it is very challenging to balance the scientific value of international in-person meetings with their environmental impact. But every member of our community should have a say in this choice, and it should be guided by careful deliberation — not simply by reverting to the default 20th-century-style conference culture.

It is not yet clear how the conference landscape will evolve after COVID: which conferences will settle on a new format in the long run, and which ones will revert to the pre-COVID rule of mandatory participation barring extenuating circumstances. We have tried to survey this at TCS4F. For conferences with optional in-person attendance, it is not clear how much organizers will encourage or discourage participants to travel, and how researchers will respond. These questions should probably be debated in our community, so the system can achieve the best compromise between scientific value, inclusivity, and environmental sustainability. But, specifically for prestigious conferences with formal proceedings, our short-term hope is that future call for papers will allow publication without in-person attendance.

We are interested at TCS4F to hear about the views of the community on this important issue. Should conference publication be conditioned to onsite participation? How should our conference culture change to be sustainable and inclusive? You can reach us at: contact<REMOVETHIS>@tcs4f.org.

Suggestions for further reading:

This post features minor corrections relative to the edition published in the bulletin of the EATCS. This post was written with help from the TCS4F team: Thomas Colcombet, Thomas Schwentick, and Tijn de Vos. Thanks to Louis Jachiet for proofreading and suggestions.

Can you be sure to clear a line at Tetris?

Summary: it is possible to play Tetris and guarantee that you will score at least one line, no matter which pieces are given to you, i.e., even assuming they are chosen adversarially.

There are many things that can be found online about the game Tetris: high scores (see for instance this great documentary), implementation details (did you know there was a specification for Tetris games?), intellectual property surprises (this post is not related to or endorsed by The Tetris Company)... But for many years I have not been able to find on the Internet the answer to this question: can you be sure to clear a line at Tetris?

Of course, in practice, most Tetris players seem to be able to clear lines at least some of the time[citation needed]. But maybe they are just being lucky! Maybe an evil computer or extreme bad luck could prevent you from ever clearing a line, no matter how you played. For instance, consider the game Hatetris, which is programmed to give you unpleasant pieces. Clearing a line in Hatetris is much more challenging, though good players can do it. Could a worse version of Hatetris, with a more clever opponent, give you pieces that will always make you lose without scoring a single line?

Mathematically, we can formulate this a sequential zero-sum game with perfect information. We have a standard Tetris board of 10 columns by 20 rows. The computer and human play alternatively: the computer gives a piece to the human (one of the seven tetrominoes), and the human responds by positioning it somewhere according to the usual Tetris rules. If the screen is filled up then the human has lost, and if the human completes a line then the human has won. Mathematically, there are then only two possibilities:

  • either the computer has a strategy to give pieces (depending on how the human has played so far) that guarantees that the human will lose without completing a single line, no matter what they do;
  • or the human has a strategy to place pieces that guarantee that they will score a line before losing, no matter which pieces they receive from the computer.

My question was to figure out which one of the two is true. The same question was asked by qntm in 2011, but leaving the question open for the standard Tetris board size. After some coding and much computation, it turns out that the human wins: it is possible to guarantee a non-zero score when playing Tetris. If you want to see an example strategy, you can test it with the Tetris board below. You play the computer, i.e., you select which piece to give, and the chosen piece will be placed according to a winning strategy. Try preventing your opponent from scoring a line! (the point of this post is that this is impossible)

An alternative challenge: try preventing the computer from scoring a line in the 5 bottom rows! This is possible for precisely 759 piece sequences out of the 427513 winning sequences. Can you find them?

In the rest of the post, I explain some details about the exact problem statement, and give more details about the strategy and how to check it. Then, I explain how I computed the strategy, possible improvements to the computation, and then present related work and remaining open problems.

Technical details

Here are some detail about the problem and strategy:

  • I restrict the human player to only rotate and then drop pieces vertically, i.e., playing a piece means choosing a rotation and a column where to drop it. (You cannot slide a block under another block, no t-spins, etc.) Of course, if the human can win under this restriction, they can win even when we allow more moves.
  • I do not require the computer to show the next piece, i.e., commit to what the next piece will be before the human has dropped their current piece. Of course, if the human can win in such conditions, they can also win with a next piece display.
  • Under such circumstances, the human can play so as to guarantee that they can complete one of the lowest 6 rows of the board. The exploration also shows that the computer can prevent the human from scoring one of the bottom 5 lines, subject to the above restrictions, and assuming that the human does not play "above" the fifth line, i.e., dropped pieces cannot touch a full column which has height at least 5. (See below to know more about this limitation.1)
  • In the worst case, the human needs to position 13 pieces to complete its first line among the bottom 6 rows. This is optimal under the above restrictions: when only considering completion of the first 6 rows, dropping pieces without sliding them, and prohibiting dropped pieces from touching a full column. I think it is very likely that the human can win faster if we allow them to slide pieces; and possible that the human can win faster by going above the 6th row.
  • Where several moves achieve the same distance to a win, they are chosen arbitrarily, following the order in which the possible choices of where to put a piece are considered (first by rotation, then by column). This is why the strategy above is not symmetric, e.g., between l-shaped and j-shaped tetrominoes.

Checking the strategy

I provide the strategy as a file describing how the human can play and guarantee a win. Each line of this file corresponds to a possible "game state", numbered from 0 to 5249 (the file has 5250 lines2). Each line consists of 21 integers, numbered from 0 to 20 as 3p+q with 0p<7 and 0q<3. The triple 3p,3p+1,3p+2 explains what to do when receiving piece p: the number 3p indicates the rotation 0r<4 to use, the number 3p+1 indicates the column 2c<10 where to drop the piece, and the number 3p+2 indicates the game state from which we should follow the rest of the winning strategy (the special value "-1" indicates that the human has won). The pieces, rotations, and column offsets are according to the following description of the pieces:

== piece 0 rotation 0 ==
#...
#...
#...
#...
== piece 0 rotation 1 ==
....
....
....
####
== piece 0 rotation 2 ==
...#
...#
...#
...#
== piece 0 rotation 3 ==
####
....
....
....
== piece 1 rotation 0 ==
.#..
##..
#...
....
== piece 1 rotation 1 ==
....
....
##..
.##.
== piece 1 rotation 2 ==
....
...#
..##
..#.
== piece 1 rotation 3 ==
.##.
..##
....
....
== piece 2 rotation 0 ==
.##.
##..
....
....
== piece 2 rotation 1 ==
....
#...
##..
.#..
== piece 2 rotation 2 ==
....
....
..##
.##.
== piece 2 rotation 3 ==
..#.
..##
...#
....
== piece 3 rotation 0 ==
###.
.#..
....
....
== piece 3 rotation 1 ==
....
#...
##..
#...
== piece 3 rotation 2 ==
....
....
..#.
.###
== piece 3 rotation 3 ==
...#
..##
...#
....
== piece 4 rotation 0 ==
###.
#...
....
....
== piece 4 rotation 1 ==
....
#...
#...
##..
== piece 4 rotation 2 ==
....
....
...#
.###
== piece 4 rotation 3 ==
..##
...#
...#
....
== piece 5 rotation 0 ==
###.
..#.
....
....
== piece 5 rotation 1 ==
....
##..
#...
#...
== piece 5 rotation 2 ==
....
....
.#..
.###
== piece 5 rotation 3 ==
...#
...#
..##
....
== piece 6 rotation 0 ==
##..
##..
....
....
== piece 6 rotation 1 ==
....
....
##..
##..
== piece 6 rotation 2 ==
....
....
..##
..##
== piece 6 rotation 3 ==
..##
..##
....
....

The strategy thus describes a DAG (with labeled edges). In particular, we can get to the same game state by different paths, and indeed the same game state can correspond to different states of the board but from which we can use the same winning strategy.

The strategy in this file is the one used in the Javascript game above. I have written a program (with helper file) that checks the strategy by systematically going over all possible moves by the computer, playing according to the strategy, and checking that the human indeed wins when the strategy says they do. Hence, the mathematical "proof" that the human has a winning strategy would consist of this file and of the verification program.

Finding the strategy

I could stop here and say that the strategy and the verification program are an answer to the question I had posed :) but let me say a few words about how I found it. This is performed using the minimax algorithm, i.e., systematically exploring the game tree. Formally, we define inductively who wins on a given board state, among the human (H) and computer (C):

  • Base 1: H wins on board states where one of the bottom 6 rows is filled
  • Base 2: C wins on board states where we cannot place a piece without touching a full column, i.e., one where there are already blocks above the 6th row.
  • Induction 1: In a given board state b where H is given some piece p, if H can drop it and get to a board state which is winning for H, then (b,p) is winning for H; if all moves by H get to a board state which is losing for H, then (b,p) is losing for H.
  • Induction 2: In a given board state b, if C can give H a piece p such that (b,p) is losing for H, then b is losing for H; if any pieces given by C are such that (b,p) is winning for H then b is winning for H.

This is just a simple recursive exploration. The naive way to implement this would be to compute who wins on every board state, remembering in the board state if each cell of the bottom 6 rows is filled or not: this would be 26×10 possible board states, where 10 is the number of columns. This amounts to 1153 peta states, which is too much.

Since we are only allowing pieces to be dropped (without sliding them), we can do better: in each column, we only need to remember the height of the highest completed block3. The status of the cells below the topmost filled cell of a column have no influence on which moves are possible. The only extra thing to remember is the set of rows having a "hole", i.e., an empty cell above which there is a filled cell. As we cannot slide pieces, these holes can never be filled, so we must remember that the rows that have a hole cannot be completed. We say that these lines are "sacrificed".

For an example, consider the state of the board after dropping a vertical I-tetromino in the second column, pictured below. We represent this as the sequence of heights written at the bottom.

Imagine now that we drop a vertically oriented z-block, getting to the configuration below:

Columns 2 and 3 are now full, i.e., the blocks stack up to the 6th row. Hence, we store their height as 6, and forget what happens above the 6th row (i.e., we forget the block in column 3 with a black cross) -- because of this, we will no longer allow blocks to touch one of these full columns. Further, the new block has now created holes -- the gray cells with red crosses. The sequence of heights does not account for these holes; to remember that they are there, we store that the bottom 5 rows have been "sacrificed", materialized by the red crosses to the left of the board. This is sufficient, because our way to drop pieces will never allow H to slide a piece and fill these holes. Now, the 6th row is the only row where H can ever hope to complete a line. (In fact, the configuration is now clearly losing for H: if C no longer gives H any I-tetromino, H can never fill the cell at column 1 and row 6, so H can never complete the only non-sacrificed line.)

To summarize, the board state consists of the height of each column (between 0 and 6 inclusive) and the set of sacrificed lines, for a total of 710×26 board states. This is much better: there are now 18 billion states. There are other small savings, e.g., board states where all lines are sacrificed are immediately losing. In practice, a complete exploration considers around 2 billion states, i.e., only around 11% of the possible board states are reached.

For efficient implementation, a board state is represented as a 64-bit integer consisting of a 16-bit mask describing the sacrificed lines (only the 6 lowest bits are useful), and 10 4-bit integers describing the heights (only the 3 lowest bits are useful).

As the same board state can be reached by many different paths in the game tree, we use memoization: when we are done exploring the tree from some board state, we remember whether this state is losing or winning, and if it is winning we remember in how many moves. This is done with a hash table (a C++ unordered set). This means that the program requires a large quantity of RAM to run (around 20-30 GB): for this, I thank the INFRES department at Télécom Paris for giving us access to suitable computing resources.

This explains how the winning strategy is found. It is fast (one hour) to check that a winning strategy exists by terminating the search for a board state and piece as soon as a winning move is found. It is longer (18 hours) to find a strategy that wins as fast as possible, because this requires us to explore the full tree. To speed things up, we use a form of alpha-beta pruning: when trying to put a piece, when we have found a winning option, we consider other options but only exploring them up to a depth that would give a strictly shorter strategy than the currently known best option. This lowers the running time to 10 hours and lowers the number of explored states from 2 billion states to around 750 million states, of which 9 million are winning. The resulting program is here. (Sorry, it is not very clean...)

Once the program has produced a strategy, we need to "compress" it. To do so, we first "trim" it by removing unreachable board states: we go from 9 million states to just 21 thousand states: this is done here. Then we minimize it by merging states from which the strategy is the same: this is done here. As the DAG is acyclic, we can perform this in linear time simply by processing the DAG bottom-up and hashing configurations. We get to the 5250 states of the strategy file.

Other possible optimizations

The following optimizations would have been possible, but I did not implement them:

  • There would be an easy saving of a factor 2 by breaking the left-right symmetry, which would also probably make the winning strategy more concise (but it is less convenient to reconstruct it).
  • When checking the cache of possible board states, we could check "shifted" configurations where we remove any one of the lowest d lines, provided each column has a height of at least d. We could also check configurations with a subset, or superset, of sacrificed rows: if a configuration with identical column heights and with a subset of sacrificed rows is known to be losing then the current configuration also is, and if a configuration with identical column heights and a superset of sacrificed rows is known to be winning then the current configuration also is.
  • Rather than exploring all possibilities to the maximal depth before giving up, a better solution is to use iterative deepening. I did this in an earlier version of the code that followed a different approach. Likewise, ordering the possible moves with a piece using some heuristic (e.g., putting the piece as low as possible, or sacrificing as little new lines as possible) would possibly make the search faster (thanks to alpha-beta pruning).

Related work

The most related work is the 2011 study by qntm, which set out to solve the same question. His work did not conclude that the human had a winning strategy for the standard board size of 10 columns, though the cases with less columns were solved. The analysis showed that a height of 6 was not sufficient to have a winning strategy. For qntm's analysis, playing up to height 6 means not allowing a piece to protrude above line 6, whereas for me it means not allowing a piece to land on a column that goes to height 6 and not considering filled lines strictly above line 6. This would imply that qntm should find a solution for height 9 or less. A difference is also that qntm considers more moves than simply rotating and then dropping pieces as I do; this may be part of the reason why I was able to conclude the analysis and qntm was not.

For the related question of whether the human can win at Tetris in the usual sense, i.e., play indefinitely, it is known that this is not the case. The computer can force the player to lose, even with a strategy which is oblivious to how the human plays and alternates S-shaped and Z-shaped pieces: see Burgiel's 1997 paper "How to Lose at Tetris" or this page. This does not contradict the result presented here, which says we can clear a line (but eventually lose). Optimizing the number of lines cleared (in the worse case or in expectation) vs guaranteeing that you make one line are different goals, that may be at odds with each other.

On the other hand, with stronger assumptions on how the computer can propose pieces, it is possible to play forever: see this page.

There are other results of this kind in Brzustowski's 1992 master thesis Can you win at Tetris?.

There is a 2002 study by Demaine et al. of the computational complexity of Tetris play, which does not seem related to the results here. For more related work on Tetris, there is a great literature review in this FUN'2022 paper by Dallant and Iacono.

There are many Tetris implementations that try to give the worst possible pieces to the player: at least Hatetris, bastet, and LTris in "expert mode".

A relevant question seems to be the study of "adversarial Tetris", introduced at a reinforcement learning competition in 2009 (see here), and followup, e.g., Beatris. However, I wasn't able to find works in this area which considered the question of whether the human could guarantee that they score at least one line assuming perfect play.

Open problems

I do not know what is the minimal number of rows and/or maximal necessary height to score a line if the human can slide or rotate pieces as they fall; the latter would depend on the exact rules implemented for rotation which is somewhat unpleasant (similar subtleties are considered in Demaine et al.'s paper).

I do not know what happens for a different number of columns. Odd number of columns clearly allow the computer to win by giving only square tetrominoes, and two columns clearly allow the human to win (every piece can score a line by itself in every configuration except I-tetrominoes where this can be achieved in two moves). Of course one can generalize the problem to different polyomino sets, etc.

I do not know if the human can guarantee scoring multiple lines at once: if the computer only gives O-shaped blocks then the human cannot hope to score three or four lines at once4; I don't know if guaranteeing that you can score two lines at once is possible.

I do not know if the result implies that the human can score any arbitrary number of lines provided that the board is sufficiently high. The strategy I presented does not ensure this, because it only works from the empty board; it could be the case (although unlikely) that the board states reached after scoring one line in this strategy no longer themselves have a winning strategy (no matter the height). More generally, one can ask what is the behavior of the function that maps the number of rows of the board to the number of lines that can be guaranteed by the human.


  1. Thanks to Louis for clarifying this point. 

  2. The number 5250 is not optimized by the program; it is possible that a more concise strategy exists -- and I am not sure of how to find it. In any case, the strategy could probably be compressed further if it were encoded in a more clever way, e.g., by breaking symmetries. 

  3. Thanks to Louis for a discussion where he proposed this bruteforce approach, which worked better than what I was attempting to do. 

  4. Thanks to Ted for this remark. 

New home page

After at least 10 years, I thought it was time to revamp my home page. The main goal was to make it more focused towards my current activities. I also wanted to make it less frightening, by having text instead of bullet points.

I have also tried to avoid presenting my research as a list of publications: I think that publication lists are hard to understand outside a specific area and do not show the big picture. Accordingly, I have written a high-level presentation of my research to make it more understandable, and will try to keep it up-to-date. I am also considering to announce new results (e.g., posted papers) on this blog, which I think is more interesting than simply updating a list. Of course I still have a list of publications on a dedicated page for those interested.

Another novelty is that I now have a list of students I supervised, and a list of internship and PhD offers for prospective students.

Announcing TheoretiCS, a diamond open-access journal

I'm honored to be part of the team behind the new open-access journal in theoretical computer science, TheoretiCS, which just opened earlier this month.

The goal of TheoretiCS is to publish the best papers in all areas of theoretical computer science, while guaranteeing a fast turnaround time (3 months for a first opinion). It is a diamond open-access journal, which is free for readers and authors: the articles are hosted on arXiv following the overlay journal model, and the reviewing platform1 is Episciences, which is open-source software hosted by the CNRS CCSD. The journal belongs to an independent nonprofit, which avoids any interference from for-profit publishers and also from the large scholarly societies that are currently failing to ensure the free dissemination of our work.

I am involved in TheoretiCS as a "managing editor", together with Nathanaël Fijalkow. We are in charge of handling technical aspects of running the journal and making sure everything is going smoothly and steadily. :) For now, our main work has been to study how to adapt the Episciences platform to the needs of the journal, to open issues, and to document how to use the platform.

I am impressed by the journal's editorial board, its distinguished endorsers, and its representatives for many of the major computer science conferences. Making all of this happen has been underway for several years by the team; I only joined rather late, at a point where the editorial board was mostly finished.

I hope that this journal will be a success and an example of how research should be published in the 21st century, i.e., entirely online and at no cost. My involvement in this effort is in line with my commitments towards open access, and I hope it can help make things move in the right direction.


  1. Episciences is the same platform which is used by DMTCS, FI, and LMCS (where I am also a layout editor).