Research Publications

This page lists all my research publications. Those marked by (*) have not been refereed. This list is also available as a PDF file. Cette page est également disponible en français.

For a friendlier presentation of my research, here is an introduction.

The definitive version of my papers is always the one listed on this page, which includes all reviewer feedback and all corrections and errata (if any). All my publications are freely available. Never buy my papers from scientific publishers: you would be wasting your money, and you risk ending up with an older or otherwise inferior version.

2025 ¶

Antoine Amarilli, Corentin Barloy, Louis Jachiet, Charles Paperman.
Dynamic Membership for Regular Tree Languages.
MFCS 2025.
Antoine Amarilli, Florin Manea, Tina Ringleb, Markus L. Schmid.
Linear Time Subsequence and Supersequence Regex Matching.
MFCS 2025.
Antoine Amarilli, Benoît Groz.
Cutwidth Bounds via Vertex Partitions.
arXiv:2504.01574v2, 2025. (*)
Antoine Amarilli, Wolfgang Gatterbauer, Neha Makhija, Mikaël Monet.
Resilience for Regular Path Queries: Towards a Complexity Classification.
PODS 2025. [minor errata]
Antoine Amarilli, Benoît Groz, Nicole Wein.
Edge-Minimum Walk of Modular Length in Polynomial Time.
ITCS 2025. [slides by Nicole Wein, video by Nicole Wein on Youtube]
Antoine Amarilli, Timothy van Bremen, Octave Gaspard, Kuldeep S. Meel.
Approximating Queries on Probabilistic Graphs.
Under review. [conference version] (*)

2024 ¶

Antoine Amarilli, Michael Benedikt.
Tighter Bounds for Query Answering with Guarded TGDs.
Under review. (*)
Antoine Amarilli.
Survey of Results on the ModPath and ModCycle Problems.
arXiv:2409.00770v1, 2024. [code] (*)
Antoine Amarilli, Florent Capelli.
Tractable Circuits in Database Theory.
SIGMOD Record Database Principles Column. (*)
Antoine Amarilli, Marcelo Arenas, YooJung Choi, Mikaël Monet, Guy Van den Broeck, Benjie Wang.
A Circus of Circuits: Connections Between Decision Diagrams, Circuits, and Automata.
arXiv:2404.09674v1, 2024. (*)
Antoine Amarilli, Mikaël Monet, Dan Suciu.
The Non-Cancelling Intersections Conjecture.
arXiv:2401.16210v1, 2024. (*)
Antoine Amarilli, Timothy van Bremen, Kuldeep S. Meel.
Conjunctive Queries on Probabilistic Graphs: The Limits of Approximability.
ICDT 2024. [journal version, slides by Timothy van Bremen, minor errata]
Antoine Amarilli, Pierre Bourhis, Florent Capelli, Mikaël Monet.
Ranked Enumeration for MSO on Trees via Knowledge Compilation.
ICDT 2024. [slides by Pierre Bourhis]
Antoine Amarilli, Benny Kimelfeld, Sébastien Labbé, Stefan Mengel.
Skyline Operators for Document Spanners.
ICDT 2024. [slides]

2023 ¶

Antoine Amarilli, Charles Paperman.
Locality and Centrality: The Variety ZG.
LMCS.
Osnat Drien, Matanya Freiman, Antoine Amarilli, Yael Amsterdamer.
Query-Driven Resolution in Uncertain Databases.
SIGMOD 2023.
Antoine Amarilli.
Degree-3 Planar Graphs as Topological Minors of Wall Graphs in Polynomial Time.
arXiv:2302.03461v3, 2023. (*)
Antoine Amarilli.
Query Evaluation: Enumeration, Maintenance, Reliability.
Habilitation thesis. (*)
Antoine Amarilli, Mikaël Monet.
Enumerating Regular Languages with Bounded Delay.
STACS 2023. [slides by Mikaël Monet, minor errata]
Antoine Amarilli.
Uniform Reliability for Unbounded Homomorphism-Closed Graph Queries.
ICDT 2023. [slides, video]

2022 ¶

2021 ¶

Antoine Amarilli, Louis Jachiet, Charles Paperman.
Dynamic Membership for Regular Languages.
ICALP 2021. [slides, video on Youtube or in direct download]. Best paper award of ICALP'21 track B.
Osnat Drien, Antoine Amarilli, Yael Amsterdamer.
Managing Consent for Data Access in Shared Databases.
ICDE 2021. Short paper. [errata]
Antoine Amarilli, Benny Kimelfeld.
Uniform Reliability of Self-Join-Free Conjunctive Queries.
ICDT 2021. [journal version, slides, video on PeerTube INFRES or in direct download]
Antoine Amarilli, Pierre Bourhis, Stefan Mengel, Matthias Niewerth.
Constant-Delay Enumeration for Nondeterministic Document Spanners.
TODS, 2021. [conference version, code by Rémi Dupré and Matthias Niewerth]

2020 ¶

Julien Romero, Nicoleta Preda, Antoine Amarilli, Fabian M. Suchanek.
Computing and Illustrating Query Rewritings on Path Views with Binding Patterns.
CIKM 2020. Demo paper. [video by Julien Romero on ACM Digital Library or in direct download, code by Julien Romero]
Antoine Amarilli, İsmail İlkan Ceylan.
A Dichotomy for Homomorphism-Closed Queries on Probabilistic Graphs.
ICDT 2020. [journal version, slides, poster, video on TIB AV-Portal or in direct download]. Best paper award of ICDT'20.
Julien Romero, Nicoleta Preda, Antoine Amarilli, Fabian M. Suchanek.
Equivalent Rewritings on Path Views with Binding Patterns.
ESWC 2020. [slides, video by Julien Romero on Videolectures.net or on Youtube or in direct download]
Antoine Amarilli, Michael Benedikt.
Finite Open-World Query Answering with Number Restrictions.
ToCL, 2020. [conference version]
Antoine Amarilli, Florent Capelli, Mikaël Monet, Pierre Senellart.
Connecting Knowledge Compilation Classes and Width Parameters.
ToCS, 2020. [conference version]

2019 ¶

Andy Shih, Guy Van den Broeck, Paul Beame, Antoine Amarilli.
Smoothing Structured Decomposable Circuits.
NeurIPS 2019. Spotlight presentation. [slides by Andy Shih, poster by Andy Shih]
Antoine Amarilli, M. Lamine Ba, Daniel Deutch, Pierre Senellart.
Computing Possible and Certain Answers over Order-Incomplete Data.
TCS, 2019. [conference version]
Antoine Amarilli, Pierre Bourhis, Stefan Mengel, Matthias Niewerth.
Enumeration on Trees with Tractable Combined Complexity and Efficient Updates.
PODS 2019. [slides by Matthias Niewerth, poster by Matthias Niewerth, video by Matthias Niewerth on TIB AV-Portal or in direct download, errata]
Antoine Amarilli, Pierre Bourhis, Stefan Mengel, Matthias Niewerth.
Constant-Delay Enumeration for Nondeterministic Document Spanners.
ICDT 2019. [journal version, slides by Matthias Niewerth, code by Rémi Dupré and Matthias Niewerth]. Featured in ACM SIGMOD Research Highlights.
Antoine Amarilli, Pierre Bourhis, Mikaël Monet, Pierre Senellart.
Evaluating Datalog via Tree Automata and Cycluits.
ToCS, 2019. [conference version]

2018 ¶

2017 ¶

2016 ¶

2015 ¶

Antoine Amarilli, Mehryar Mohri, Cyril Allauzen.
Minimum Bayesian Risk Methods for Automatic Speech Recognition.
US Patent 9123333. [not an endorsement of software patents]
Antoine Amarilli.
Possibility in Probabilistic XML.
ISI, 2015. [conference version, the publisher version is no longer available]
Antoine Amarilli, Silviu Maniu, Pierre Senellart.
Intensional Data on the Web.
ACM SIGWEB Newsletter, Summer 2015. (*)
Antoine Amarilli, Michael Benedikt.
Combining Existential Rules and Description Logics.
IJCAI 2015. [slides, longer slides, poster]
Aliaksandr Talaika, Joanna Biega, Antoine Amarilli, Fabian M. Suchanek.
IBEX: Harvesting Entities from the Web Using Unique Identifiers.
WebDB 2015. [slides]
Antoine Amarilli, Michael Benedikt.
Finite Open-World Query Answering with Number Restrictions.
LICS 2015. [journal version, slides]
Antoine Amarilli, Pierre Bourhis, Pierre Senellart.
Provenance Circuits for Trees and Treelike Instances.
ICALP 2015. [slides]
Antoine Amarilli.
Structurally Tractable Uncertain Data.
SIGMOD/PODS Ph.D. Symposium 2015. [slides]

2014 ¶

2013 ¶

Antoine Amarilli, Pierre Senellart.
On the Connections between Relational and XML Probabilistic Data Models.
BNCOD 2013. [slides]

2012 ¶

Antoine Amarilli, Fabrice Ben Hamouda, Florian Bourse, Robin Morisset, David Naccache, Pablo Rauzy.
From Rational Number Reconstruction to Set Reconciliation and File Synchronization.
TGC 2012. (*)
Antoine Amarilli.
Advances in Holistic Ontology Alignment.
Master's thesis (internship report). [slides] (*)
Marilena Oita, Antoine Amarilli, Pierre Senellart.
Cross-Fertilizing Deep Web Analysis and Ontology Enrichment.
VLDS 2012.
Antoine Amarilli, Marc Jeanmougin.
A Proof of the Pumping Lemma for Context-Free Languages Through Pushdown Automata.
arXiv:1207.2819v1, 2012. (*)

2011 ¶

Errata ¶

I try hard to avoid making mistakes in my work, but I'm sorry that they sometimes seem to happen despite my best efforts. When a mistake is found in one of my papers, I usually update it with a corrected version, and also update the links on this page to point to the latest version. However, when the error is major, there is always a risk that older versions of the paper will still confuse or mislead readers, especially in cases where the publisher version (in conference proceedings and journals) cannot be updated.

For this reason, in an attempt to minimize the impact of mistakes, here is a list of the major errors that I know about. Here are my criteria for when an error must be included in this list (some more minor errors may also be covered):

The error concerns the technical claims of the paper (not the surrounding discussion or other material).
The error is major, e.g., a result is not true, or a proof is not correct, and cannot easily be fixed by an expert reader and requires substantial new arguments.
The error occurs in a version of the work that had been published at a peer-reviewed conference or journal. In other words, I may not cover errors which we were able to correct before the paper is accepted for publication, even when preliminary drafts of the work had been made public (e.g., on arXiv). If these happen, I may only update the drafts without listing the error here.

Here is the list of errata:

Resilience for Regular Path Queries: Towards a Complexity Classification:
- In arXiv version 2 and arXiv version 1 and the publisher version. In Fig. 1, two lines above "Prp. 7.8", the query "abc|be|ef" should instead be "abcd|be|ef". Thanks to Martín Muñoz for identifying this issue.
Conjunctive Queries on Probabilistic Graphs: The Limits of Approximability:
- In arXiv version 1. Notational errors and a minor issue were reported by Octave Gaspard in the proof of Proposition 3.5 and in the proof of Theorem 6.3. These problems are fixed in the latest version of the article.
- In arXiv version 1, arXiv version 2, arXiv version 3, conference proceedings version. An error was identified by Timothy van Bremen in the proof of Proposition 4.1 in Appendix C and is fixed in the latest version of the article.
Enumerating Regular Languages with Bounded Delay (conference proceedings version, arXiv version 3 and earlier). On Figure 1e page 8, state 0 of automaton $A_{5}$ should be non-final instead of final. The rest of the discussion in Example 3.4 is unchanged.
Tractable Lineages on Treelike Instances (conference proceedings version (closed access), arXiv version 1). The result stated as Theorem 5.5 in this work is proved in my PhD thesis, but the proof given there is incorrect. See this erratum for details about this issue. A corrected proof is given in the next version of the article (arXiv version 2).
Leveraging the Structure of Uncertain Data (on Thèses en ligne). I have found an error in the proof of Theorem 6.2.5 of my PhD thesis, on page numbered 108 of the manuscript. The result in question also appears (without proof) as Theorem 5.5 of Tractable Lineages on Treelike Instances: Limits and Extensions. The error is that the definition of the advice function in this proof does not only depend on the size of the instance but also on the instance itself. A corrected proof is given in the appendix of arXiv version 2 of Tractable Lineages on Treelike Instances: Limits and Extensions
Query Answering with Transitive and Linear-Ordered Data (arXiv version 1, conference proceedings version). An error was reported to us by Andreas Pieris and Andrew Bailey in the proof of Theorem 9 (Appendix H of the arXiv version). The same error exists in the journal version of the work: please refer to this erratum where we discuss the error in more detail and explain how the error was fixed. In the conference version, the error is in the step "From UCQ to CQ" in appendix H of the arXiv version, which increases the arity.
Possible and Certain Answers for Queries over Order-Incomplete Data (arXiv version 1, conference proceedings version). I have found a issue in the proof of Theorem 22 (both versions), which also affects Theorem 19 and Theorem 30 (both versions). We no longer claim that these results hold. The specific error is in the proof of Proposition 66 (in arXiv version 1) for the case of the product operator. The latest version of the article (arXiv version 2) removes the affected results and discusses these issues in Appendix G. The journal version of the article (arXiv) contains neither the affected results nor the discussion in this appendix.
Query Answering with Transitive and Linear-Ordered Data (journal version). An error was reported to us by Andreas Pieris and Andrew Bailey in the proof of Theorem 6.2. The same error exists in the conference version of the work: see this erratum. The given proof works if the query Q is allowed be a UCQ, or if the signature is not required to be arity-two, but not both. The error is that the last step of the proof, given in Appendix E.5.1, used to cover the case of a CQ rather than a UCQ, increases the signature arity (with Lemma E.6), which we did not realize. The arXiv version of this work, posted in 2022, gives a corrected proof of the claimed result.
When Can We Answer Queries Using Result-Bounded Data Interfaces? (arXiv version 1, arXiv version 2, conference proceedings version (closed access)).
- An error was reported by reviewers of the upcoming journal version of this work concerning the case of finite answerability. The error is in Proposition 2.2, which does not apply to schemas with result bounds. We do not know if Proposition 2.2 is correct in the setting with result bounds studied in the paper, or how it could be proved. This error affects all results concerning finite answerability, so the general claims at the beginning of Sections 5 and 7 that all results also extend to that case, and the specific Corollary 7.3. We do not know if our specific results in the finite case are also true (no matter the status of Proposition 2.2). We also do not have knowledge about any reasons why they would false. All results related to finite answerability claimed in the paper should instead be considered as open. This is clarified in the journal version.
- There are several imprecisions in the proof of Theorem G.3 in appendix G.2.3, which is used to prove the EXPTIME bound of Theorem 7.2 in the main text of the paper. A revised and corrected proof is given in this work and should be read instead of Appendix G.2.3.
Enumeration on Trees with Tractable Combined Complexity and Efficient Updates (arXiv versions 1 and 2, conference proceedings version (closed access). An issue was found in the LICS'18 paper MSO Queries on Trees: Enumerating Answers under Updates Using Forest Algebras, on which some of our own results crucially depend. It is currently open whether the results of the LICS'18 paper can indeed be proved as stated. In our own paper, this affects Lemma 7.4 (for the part asserting that the encoding is linear-time computable, and for the part asserting that updates can be efficiently performed), and thus affects Theorem 8.1, Corollary 8.2, Corollary 8.3, Corollary 8.4, and Theorem 8.5 (in what concerns the preprocessing time and the update time). All these results from our paper are thus only proven conditionally to the fact that one can indeed prove the results claimed in Section 3 of the LICS'18 paper. This is currently under investigation by the author of the paper (as of February 2022), and we will update the arXiv version of our paper depending on how this develops.
Managing Consent for Data Access in Shared Databases (conference proceedings version, full version). I have found an issue in Proposition III.4 (conference proceedings version), also present in the corresponding results of the full version. The complexity and approximation ratios claimed in the proposition are incorrect. The correct complexity should be the following: $O (| ¯ D |^{| Q |} \times | Q |^{2} + | Q (¯ D) | \times | Q |^{2 | Q | + 1} \times | ¯ D |^{2})$ . Specifically, the first term is the complexity of computing the result of the query along with its provenance as a DNF formula (the number of possible assignments is the number of facts of the database to the power of the number of query atoms, and the time to check an assignment is in time proportional to the query size), and then minimizing the DNF formula of each tuple in time proportional to the maximal number of tuples ( $| ¯ D |^{| Q |}$ ) times the maximal size of a provenance formula ( $| Q |^{2}$ ). The second term is the complexity of computing the provenance as a CNF formula (expanding to a conjunction of at most $| Q |^{| Q |}$ clauses of size $| Q |$ and then minimizing them by considering each pair of clauses and testing subsumption), then multiplied by the complexity of performing the Q-value algorithm (which is in the number of variables squared). The correct approximation ratio should be $log (| Q |^{| Q | + 1} \times | Q (¯ D) |)$ . The first factor in the logarithm is because, for each tuple, the number of terms in the DNF is $| Q |$ and the number of clauses in the CNF is $| Q |^{| Q |}$ . The second factor in the logarithm is the number of tuples. Please note that, while these are the only errors currently known in the conference proceedings version, the full version of the paper is still a "work in progress" and contains (as of November 2021) some other errors not affecting the conference proceedings version.