This page lists all my research publications.
Those marked by (*) have not been refereed. This list is also available as a
PDF file. Cette page est également disponible en
français.
For a friendlier presentation of my research, here is an
introduction.
The definitive version of my papers is always the one listed on this page,
which includes all reviewer feedback and all corrections and errata (if any).
All my publications are freely
available. Never buy my papers from
scientific publishers: you would be wasting your money, and you risk ending up with an
older or otherwise inferior version.

Antoine Amarilli,
Charles Paperman.
Locality and Centrality: The Variety ZG.
Under review. (*)

Antoine Amarilli,
Michael Benedikt.
When Can We Answer Queries Using ResultBounded Data Interfaces?
LMCS.
[conference version]

Antoine Amarilli,
Louis Jachiet,
Charles Paperman.
Dynamic Membership for Regular Languages.
ICALP 2021.
[slides, video on Youtube or in direct download]. Best paper award of ICALP'21 track B.

Antoine Amarilli,
İsmail İlkan Ceylan.
The Dichotomy of Evaluating HomomorphismClosed Queries on Probabilistic Graphs.
LMCS.
[conference version]

Osnat Drien,
Antoine Amarilli,
Yael Amsterdamer.
Managing Consent for Data Access in Shared Databases.
ICDE 2021. Short paper.
[errata]

Antoine Amarilli,
Benny Kimelfeld.
Uniform Reliability of SelfJoinFree Conjunctive Queries.
ICDT 2021.
[journal version, slides, video on PeerTube INFRES or in direct download]

Antoine Amarilli,
Pierre Bourhis,
Stefan Mengel,
Matthias Niewerth.
ConstantDelay Enumeration for Nondeterministic Document Spanners.
TODS.

Julien Romero,
Nicoleta Preda,
Antoine Amarilli,
Fabian M. Suchanek.
Computing and Illustrating Query Rewritings on Path Views with Binding Patterns.
CIKM 2020. Demo paper.
[video by Julien Romero on ACM Digital Library or in direct download, code by Julien Romero]

Antoine Amarilli,
İsmail İlkan Ceylan.
A Dichotomy for HomomorphismClosed Queries on Probabilistic Graphs.
ICDT 2020.
[slides, poster, video on TIB AVPortal or in direct download]. Best paper award of ICDT'20.

Julien Romero,
Nicoleta Preda,
Antoine Amarilli,
Fabian M. Suchanek.
Equivalent Rewritings on Path Views with Binding Patterns.
ESWC 2020.
[slides, video by Julien Romero on Videolectures.net or on Youtube or in direct download]

Antoine Amarilli,
Michael Benedikt.
Finite OpenWorld Query Answering with Number Restrictions.
ToCL, 2020.
[conference version]

Andy Shih,
Guy Van den Broeck,
Paul Beame,
Antoine Amarilli.
Smoothing Structured Decomposable Circuits.
NeurIPS 2019. Spotlight presentation.
[slides by Andy Shih, poster by Andy Shih]

Antoine Amarilli,
Florent Capelli,
Mikaël Monet,
Pierre Senellart.
Connecting Knowledge Compilation Classes and Width Parameters.
ToCS, 2019.
[conference version]

Antoine Amarilli,
M. Lamine Ba,
Daniel Deutch,
Pierre Senellart.
Computing Possible and Certain Answers over OrderIncomplete Data.
TCS, 2019.
[conference version]

Antoine Amarilli,
Pierre Bourhis,
Stefan Mengel,
Matthias Niewerth.
Enumeration on Trees with Tractable Combined Complexity and Efficient Updates.
PODS 2019.
[slides by Matthias Niewerth, poster by Matthias Niewerth, video by Matthias Niewerth on TIB AVPortal or in direct download, errata]

Antoine Amarilli,
Pierre Bourhis,
Stefan Mengel,
Matthias Niewerth.
ConstantDelay Enumeration for Nondeterministic Document Spanners.
ICDT 2019.
[slides by Matthias Niewerth, code by Rémi Dupré and Matthias Niewerth]. Featured in ACM SIGMOD Research Highlights.

Antoine Amarilli,
Pierre Bourhis,
Mikaël Monet,
Pierre Senellart.
Evaluating Datalog via Tree Automata and Cycluits.
ToCS, 2019.
[conference version]

Antoine Amarilli,
Michael Benedikt,
Pierre Bourhis,
Michael Vanden Boom.
Query Answering with Transitive and LinearOrdered Data.
JAIR, 2018.
[conference version, errata]

Antoine Amarilli,
Charles Paperman.
Topological Sorting under Regular Constraints.
ICALP 2018.
[slides]

Antoine Amarilli,
Michael Benedikt.
When Can We Answer Queries Using ResultBounded Data Interfaces?
PODS 2018.
[journal version, slides, poster, video on Youtube or in direct download, errata]

Antoine Amarilli,
Mikaël Monet,
Pierre Senellart.
Connecting Width and Structure in Knowledge Compilation.
ICDT 2018.
[journal version, slides by Mikaël Monet]

Antoine Amarilli,
Pierre Bourhis,
Stefan Mengel.
Enumeration on Trees under Relabelings.
ICDT 2018.
[slides, poster]

Antoine Amarilli,
M. Lamine Ba,
Daniel Deutch,
Pierre Senellart.
Possible and Certain Answers for Queries over OrderIncomplete Data.
TIME 2017.
[journal version, slides, errata]

Antoine Amarilli,
Pierre Bourhis,
Louis Jachiet,
Stefan Mengel.
A CircuitBased Approach to Efficient Enumeration.
ICALP 2017.
[slides]

Antoine Amarilli,
Mikaël Monet,
Pierre Senellart.
Conjunctive Queries on Probabilistic Graphs: Combined Complexity.
PODS 2017.
[slides by Mikaël Monet, poster by Mikaël Monet]

Antoine Amarilli,
Pierre Bourhis,
Mikaël Monet,
Pierre Senellart.
Combined Tractability of Query Evaluation via Tree Automata and Cycluits.
ICDT 2017.
[journal version, slides by Mikaël Monet, poster by Mikaël Monet]

Antoine Amarilli,
Yael Amsterdamer,
Tova Milo,
Pierre Senellart.
Topk Queries on Unknown Values under Order Constraints.
ICDT 2017.
[slides by Yael Amsterdamer, poster by Yael Amsterdamer]

Luis Galárraga,
Simon Razniewski,
Antoine Amarilli,
Fabian M. Suchanek.
Predicting Completeness in Knowledge Bases.
WSDM 2017.
[slides by Luis Galárraga, poster by Luis Galárraga]

Antoine Amarilli,
Silviu Maniu,
Mikaël Monet.
Challenges for Efficient Query Evaluation on Structured Probabilistic Data.
SUM 2016.
[slides by Mikaël Monet]

Antoine Amarilli,
Marc Beunardeau,
Rémi Géraud,
David Naccache.
Failure is Also an Option.
The New Codebreakers: Essays Dedicated to David Kahn on the Occasion of His 85th Birthday. (*)

Antoine Amarilli,
Michael Benedikt,
Pierre Bourhis,
Michael Vanden Boom.
Query Answering with Transitive and LinearOrdered Data.
IJCAI 2016.
[journal version, slides by Michael Vanden Boom, longer slides, poster by Michael Vanden Boom, errata]

Antoine Amarilli.
Leveraging the Structure of Uncertain Data.
PhD thesis.
[slides] (*)

Antoine Amarilli,
Pierre Bourhis,
Pierre Senellart.
Tractable Lineages on Treelike Instances: Limits and Extensions.
PODS 2016.
[slides, poster]

Antoine Amarilli,
Mehryar Mohri,
Cyril Allauzen.
Minimum Bayesian Risk Methods for Automatic Speech Recognition.
US Patent 9123333.
[not an endorsement of software patents]

Antoine Amarilli.
Possibility for Probabilistic XML.
ISI.
[conference version, the publisher version is no longer available]

Antoine Amarilli,
Silviu Maniu,
Pierre Senellart.
Intensional Data on the Web.
ACM SIGWEB Newsletter, Summer 2015. (*)

Antoine Amarilli,
Michael Benedikt.
Combining Existential Rules and Description Logics.
IJCAI 2015.
[slides, longer slides, poster]

Aliaksandr Talaika,
Joanna Biega,
Antoine Amarilli,
Fabian M. Suchanek.
IBEX: Harvesting Entities from the Web Using Unique Identifiers.
WebDB 2015.
[slides]

Ruiming Tang,
Antoine Amarilli,
Pierre Senellart,
Stéphane Bressan.
A Framework for SamplingBased XML Data Pricing.
TLDKS.
[conference version]

Antoine Amarilli,
Michael Benedikt.
Finite OpenWorld Query Answering with Number Restrictions.
LICS 2015.
[journal version, slides]

Antoine Amarilli,
Pierre Bourhis,
Pierre Senellart.
Provenance Circuits for Trees and Treelike Instances.
ICALP 2015.
[slides]

Antoine Amarilli.
Structurally Tractable Uncertain Data.
SIGMOD/PODS Ph.D. Symposium 2015.
[slides]

Antoine Amarilli,
Luis Galárraga,
Nicoleta Preda,
Fabian M. Suchanek.
Recent Topics of Research around the YAGO Knowledge Base.
APWEB 2014. (*)

Ruiming Tang,
Antoine Amarilli,
Pierre Senellart,
Stéphane Bressan.
Get a Sample for a Discount: SamplingBased XML Data Pricing.
DEXA 2014.
[journal version, slides by Ruiming Tang]

Antoine Amarilli,
Yael Amsterdamer,
Tova Milo.
Uncertainty in Crowd Data Sourcing under Structural Constraints.
UnCrowd 2014.
[slides]

Antoine Amarilli.
The Possibility Problem for Probabilistic XML.
AMW 2014.
[journal version, slides]

Antoine Amarilli,
Yael Amsterdamer,
Tova Milo.
On the Complexity of Mining Itemsets from the Crowd Using Taxonomies.
ICDT 2014.
[slides]
I try hard to avoid making mistakes in my work, but I'm sorry that they
sometimes seem to happen despite my best efforts. When a mistake is found in one
of my papers, I usually update it with a corrected version, and also update the
links on this page to point to the latest version. However, when the error is
major, there is always a risk that older versions of the paper will still
confuse or mislead readers, especially in cases where the publisher version (in
conference proceedings and journals) cannot be updated.
For this reason, in an attempt to minimize the impact of mistakes, here is a
list of the major errors that I know about. Here are my criteria for when an
error must be included in this list (some more minor errors may also be covered):
 The error concerns the technical claims of the paper (not the surrounding
discussion or other material).
 The error is major, e.g., a result is not true, or a proof is not correct, and
cannot easily be fixed by an expert reader and requires substantial new
arguments.
 The error occurs in a version of the work that had been published at a
peerreviewed conference or journal. In other words, I may not cover errors which we were
able to correct before the paper is accepted for publication, even when
preliminary drafts of the work had been made public (e.g., on arXiv). If
these happen, I may only update the drafts without listing the error here.
Here is the list of errata:
 Query Answering with Transitive and LinearOrdered Data
(arXiv version 1, conference
proceedings version).
An error was reported to us by Andreas
Pieris and Andrew Bailey in the
proof of Theorem 9 (Appendix H of the arXiv version). The same error exists in
the journal version of the work: please refer to this
erratum where we discuss the error in more detail
and explain how the error was fixed.
In the conference version, the error is in the step "From UCQ to CQ" in
appendix H of the arXiv version, which increases the arity.
 Possible and Certain Answers for Queries over OrderIncomplete Data (arXiv version 1, conference proceedings
version).
I have found a issue in the proof of Theorem 22 (both versions), which also affects Theorem 19
and Theorem 30 (both versions). We no longer claim that these results hold.
The specific error is in the proof of Proposition 66 (in arXiv version 1) for
the case of the product operator. The latest version of the article (arXiv
version 2) removes the affected results
and discusses these issues in Appendix G. The journal version of the
article (arXiv) contains neither the
affected results nor the discussion in this appendix.
 Query Answering with Transitive and LinearOrdered Data
(journal version).
An error was reported to us by Andreas
Pieris and Andrew Bailey in the
proof of Theorem 6.2. The same error exists in the conference version of the
work: see this erratum.
The given proof works
if the query Q is allowed be a UCQ, or if the signature is not required to
be aritytwo, but not both. The error is that the last step of the proof,
given in Appendix E.5.1, used to cover the case of a CQ rather than a UCQ,
increases the signature arity (with Lemma E.6), which we did not realize. The
arXiv version of this work, posted in
2022, gives a corrected proof of the claimed result.
 When Can We Answer Queries Using ResultBounded Data Interfaces?
(arXiv version 1,
arXiv version 2,
conference proceedings version (closed
access)).
An error was reported by reviewers of the upcoming journal version of this
work concerning the case of finite answerability. The error is in Proposition
2.2, which does not apply to schemas with result bounds. We do not know if
Proposition 2.2 is correct in the setting with result bounds studied in the
paper, or how it could be proved. This error affects all results concerning
finite answerability, so the general claims at the beginning of Sections 5 and
7 that all results also extend to that case, and the specific Corollary 7.3.
We do not know if our specific results in the finite case are also true (no
matter the status of Proposition 2.2). We also do not have knowledge about any
reasons why they would false. All results related to finite answerability
claimed in the paper should instead be considered as open. This will be
pointed out in the upcoming journal version: see
this erratum
 When Can We Answer Queries Using ResultBounded Data Interfaces?
(arXiv version 1).
An error was reported during the review process, concerning the case of
finite answerability. This error already existed in the conference version
of this work: see this erratum. The affected
results were removed in arXiv version
2 and subsequent versions.
 Enumeration on Trees with Tractable Combined Complexity and Efficient Updates (arXiv versions 1 and 2,
conference proceedings version (closed access).
An issue was found in the LICS'18 paper MSO Queries on Trees: Enumerating
Answers under Updates Using Forest
Algebras,
on which some of our own results crucially depend. It is currently
open whether the results of the LICS'18 paper can indeed be proved as
stated. In our own paper, this affects Lemma 7.4 (for the part
asserting that the encoding is lineartime computable, and for the
part asserting that updates can be efficiently performed), and thus
affects Theorem 8.1, Corollary 8.2, Corollary 8.3, Corollary 8.4, and
Theorem 8.5 (in what concerns the preprocessing time and the update
time). All these results from our paper are thus only proven
conditionally to the fact that one can indeed prove the results
claimed in Section 3 of the LICS'18 paper. This is currently under
investigation by the author of the paper, and we will update the arXiv version of
our paper depending on how this develops.
 Managing Consent for Data Access in
Shared Databases (conference proceedings
version, full
version).
I have found an issue in Proposition III.4 (conference
proceedings version), also present in the corresponding results of the
full version. The complexity and approximation ratios claimed in the
proposition are incorrect. The correct complexity should be the following:
O(¯DQ×Q2+Q(¯D)×Q2Q+1×¯D2).
Specifically, the first term is the complexity of computing the result of
the query along with its provenance as a DNF formula (the number of possible
assignments is the number of facts of the database to the power of
the number of query atoms, and the time to check an assignment is in
time proportional to the query size), and then minimizing the DNF formula of
each tuple in time proportional to the maximal number of tuples
(¯DQ) times the maximal size of a provenance formula (Q2).
The second term is the complexity of computing the
provenance as a CNF formula (expanding to a conjunction of at most
QQ clauses of size Q and then minimizing them by considering
each pair of clauses and testing subsumption), then multiplied by the
complexity of performing the Qvalue algorithm (which is in the number of
variables squared). The correct approximation ratio should be log(QQ+1×Q(¯D)). The first factor in the logarithm is
because, for each tuple, the number of terms in the DNF is Q and the
number of clauses in the CNF is QQ. The second factor in the logarithm is the number of tuples.
Please note that, while these are the only errors currently known in the
conference proceedings version, the full version of the paper is still a
"work in progress" and contains (as of November 2021) some other errors not
affecting the conference proceedings version.