Finding the members of the theoretical database community with DBLP
The DBLP service is a great bibliographical tool for computer science research. In this post, I explain how to use it to prepare the list of members of a research community. I will be using the theoretical database community, whose two conferences are PODS and ICDT.
The list of publications for one edition of a conference can be found on DBLP as XML, e.g., for ICDT'18. It is then easy to use xmlstarlet to find the list of people who have published at that conference:
curl -s 'https://dblp.uni-trier.de/db/conf/icdt/icdt2018.xml' |
xmlstarlet sel -T -t -m "//inproceedings/author" -m . -c '.' -n |
sort | uniq
For each person in the list, we can obtain detailed XML information, including its homepage, ORCID, etc., using the DBLP API again. (This also gives us a canonical form for the name, which may appear in different ways in various inproceedings entries.) This is just a bit more complicated than it should, because of a limitation of the DBLP search API: when queried with a name, sometimes the API inexplicably favors non-exact matches even in some cases where an exact match exist. So we must filter the matches ourselves to use an exact match if one exists, and a non-exact match otherwise. Of course, independently from this problem, you may be getting the wrong author, in particular because of homonyms, so these results should be taken with a grain of salt.
NAME="Antoine Amarilli"
ENAME=$(echo "$NAME" | sed 's/ /%20/g')
curl -s "https://dblp.org/search/author/api?h=1000&q=$ENAME" > matches.xml
URL=$(xmlstarlet sel -T -t -m "/result/hits/hit/info[author='$NAME']" \
-c url -n < matches.xml | head -1)
if [[ -z "$URL" ]]
then
URL=$(xmlstarlet sel -T -t -m /result/hits/hit/info/url \
-c . -n < matches.xml | head -1)
fi
curl -L "${URL}.xml"
From there, we can use this to prepare a list of community members. Of course, any criterion for inclusion is completely arbitrary... My criterion to get a list of "active community members" is to select the who have published on three different years, with one publication in 2015 or later. Which gives:
Corrected one error in this list caused by the DBLP search API limitation
Click to see the list...
- Adrian Onet [dblp]
- Alin Deutsch [dblp] [webpage]
- Alon Y. Halevy [dblp] [webpage] [webpage] [wikidata] [wikipedia]
- Andreas Pieris [dblp]
- André Hernich [dblp]
- Andrew McGregor [dblp] [webpage]
- Antoine Amarilli [dblp] [orcid] [webpage] [wikidata]
- Atri Rudra [dblp] [webpage]
- Balder ten Cate [dblp] [webpage]
- Bas Ketsman [dblp]
- Benny Kimelfeld [dblp] [webpage]
- Benoît Groz [dblp]
- Carsten Lutz [dblp] [webpage]
- Christopher Ré [dblp] [webpage] [webpage] [wikidata] [wikipedia]
- Christoph Koch [dblp] [webpage]
- Claire David [dblp]
- Cristian Riveros [dblp]
- Daniel Deutch [dblp]
- Daniel Zinn [dblp]
- Dan Olteanu [dblp] [webpage]
- Dan Suciu [dblp] [webpage] [wikidata] [wikipedia]
- David P. Woodruff [dblp] [webpage]
- Diego Calvanese [dblp] [orcid] [webpage]
- Dirk Van Gucht [dblp] [webpage]
- Domagoj Vrgoc [dblp]
- Dominik D. Freydenberger [dblp] [webpage]
- Egor V. Kostylev [dblp]
- Emanuel Sallinger [dblp] [orcid] [webpage] [webpage]
- Filip Murlak [dblp]
- Floris Geerts [dblp] [webpage]
- Foto N. Afrati [dblp] [webpage]
- Francesco Scarcello [dblp] [orcid] [webpage]
- Francesco Silvestri [dblp] [orcid] [webpage]
- Frank Neven [dblp] [webpage]
- Frank Wolter [dblp] [webpage]
- Georg Gottlob [dblp] [orcid] [webpage] [webpage] [wikidata] [wikipedia]
- Gianluigi Greco [dblp]
- Giansalvatore Mecca [dblp] [webpage]
- Gösta Grahne [dblp] [webpage]
- Graham Cormode [dblp] [webpage]
- Hubie Chen [dblp] [webpage]
- Hung Quoc Ngo [dblp] [webpage]
- Jan Hidders [dblp] [webpage] [webpage] [webpage]
- Jan Paredaens [dblp] [webpage]
- Jan Van den Bussche [dblp] [webpage]
- Jeffrey D. Ullman [dblp] [webpage] [wikidata] [wikipedia]
- Jeffrey F. Naughton [dblp] [webpage] [wikidata] [wikipedia]
- Jeffrey Scott Vitter [dblp] [webpage] [webpage] [webpage] [webpage] [wikidata] [wikipedia]
- Jef Wijsen [dblp] [webpage]
- Jelani Nelson [dblp]
- Juan L. Reutter [dblp] [webpage] [webpage]
- Kamesh Munagala [dblp] [webpage]
- Ke Yi [dblp]
- Kobbi Nissim [dblp] [webpage]
- Leonid Libkin [dblp] [webpage] [wikidata] [wikipedia]
- Leopoldo E. Bertossi [dblp] [orcid] [webpage]
- Lucian Popa [dblp] [webpage]
- Luc Segoufin [dblp] [webpage]
- Mahmoud Abo Khamis [dblp]
- Mantas Simkus [dblp]
- Marcelo Arenas [dblp] [webpage]
- Martin Farach-Colton [dblp] [webpage] [wikidata] [wikipedia]
- Martin Grohe [dblp] [orcid] [webpage]
- Martín Ugarte [dblp]
- Matthias Niewerth [dblp] [webpage]
- Maurizio Lenzerini [dblp] [webpage] [wikidata] [wikipedia]
- Meghyn Bienvenu [dblp]
- Michael A. Bender [dblp] [webpage]
- Michael Benedikt [dblp] [webpage]
- Michael Mitzenmacher [dblp] [wikidata] [wikipedia]
- Miguel Romero [dblp]
- Minos N. Garofalakis [dblp] [webpage]
- Moshe Y. Vardi [dblp] [webpage] [wikidata] [wikipedia]
- Nadime Francis [dblp]
- Nicola Leone [dblp] [orcid] [webpage]
- Nicole Schweikardt [dblp] [webpage] [wikidata]
- Pablo Barceló [dblp] [webpage]
- Pankaj K. Agarwal [dblp] [webpage] [wikidata] [wikipedia]
- Paraschos Koutris [dblp] [webpage]
- Paul Beame [dblp] [webpage]
- Pawel Parys [dblp]
- Peter Buneman [dblp] [webpage] [wikidata] [wikipedia]
- Phokion G. Kolaitis [dblp] [webpage]
- Pierre Bourhis [dblp] [webpage]
- Pierre Senellart [dblp] [webpage]
- Ping Lu [dblp]
- Piotr Indyk [dblp] [webpage] [wikidata] [wikipedia]
- Piotr Wieczorek [dblp]
- Qin Zhang [dblp] [webpage]
- Rahul Shah [dblp] [webpage]
- Rasmus Pagh [dblp] [orcid] [webpage]
- Reinhard Pichler [dblp] [webpage]
- Ronald Fagin [dblp] [webpage] [wikidata] [wikipedia]
- R. Ryan Williams [dblp] [webpage] [webpage] [webpage] [wikidata] [wikipedia]
- Sanjeev Khanna [dblp] [webpage] [wikidata] [wikipedia]
- Sara Cohen [dblp] [webpage]
- Sebastian Maneth [dblp] [webpage]
- Sebastian Rudolph [dblp] [webpage] [webpage]
- Sebastian Skritek [dblp]
- Serge Abiteboul [dblp] [webpage] [webpage] [wikidata] [wikipedia]
- Slawomir Staworko [dblp] [webpage]
- S. Muthukrishnan [dblp] [webpage]
- S. Sudarshan [dblp] [webpage]
- Stefan Mengel [dblp] [webpage] [webpage]
- Stijn Vansummeren [dblp] [wikidata]
- Subhash Suri [dblp] [webpage] [wikidata] [wikipedia]
- Sudeepa Roy [dblp] [webpage]
- Sudipto Guha [dblp] [webpage]
- Susan B. Davidson [dblp] [webpage] [wikidata] [wikipedia]
- Thomas Schwentick [dblp] [webpage]
- Thomas Zeume [dblp] [webpage]
- Ting Deng [dblp]
- Todd J. Green [dblp] [webpage]
- Tomasz Gogacz [dblp]
- Tom J. Ameloot [dblp]
- Tony Tan [dblp]
- Tova Milo [dblp] [webpage] [wikidata] [wikipedia]
- Vadim Savenkov [dblp]
- Val Tannen [dblp] [webpage]
- Vasilis Samoladas [dblp]
- Victor Vianu [dblp] [webpage] [wikidata] [wikipedia]
- Vladimir Braverman [dblp] [webpage]
- Wang Chiew Tan [dblp] [webpage] [wikidata]
- Wenfei Fan [dblp] [orcid] [wikidata] [wikipedia]
- Werner Nutt [dblp] [webpage] [wikidata]
- Wim Martens [dblp]
- Yaacov Y. Weiss [dblp]
- Yael Amsterdamer [dblp] [webpage]
- Yakov Nekrich [dblp] [webpage]
- Yehoshua Sagiv [dblp] [wikidata] [wikipedia]
- Yufei Tao [dblp] [orcid] [webpage]
Another inclusion criterion for a "historical" list would be the list of people who are not necessarily still active but have published over a long period, say, 10 different (not necessarily contiguous) years. Here is the resulting list, sorted by the year where the person has last published in ICDT or PODS.
Click to see the list...
- 2018: Balder ten Cate [dblp] [webpage]
- 2018: Benny Kimelfeld [dblp] [webpage]
- 2018: David P. Woodruff [dblp] [webpage]
- 2018: Diego Calvanese [dblp] [orcid] [webpage]
- 2018: Floris Geerts [dblp] [webpage]
- 2018: Frank Neven [dblp] [webpage]
- 2018: Georg Gottlob [dblp] [orcid] [webpage] [webpage] [wikidata] [wikipedia]
- 2018: Jan Van den Bussche [dblp] [webpage]
- 2018: Leonid Libkin [dblp] [webpage] [wikidata] [wikipedia]
- 2018: Luc Segoufin [dblp] [webpage]
- 2018: Maurizio Lenzerini [dblp] [webpage] [wikidata] [wikipedia]
- 2018: Michael Benedikt [dblp] [webpage]
- 2018: Pablo Barceló [dblp] [webpage]
- 2018: Pankaj K. Agarwal [dblp] [webpage] [wikidata] [wikipedia]
- 2018: Phokion G. Kolaitis [dblp] [webpage]
- 2018: Serge Abiteboul [dblp] [webpage] [webpage] [wikidata] [wikipedia]
- 2018: Victor Vianu [dblp] [webpage] [wikidata] [wikipedia]
- 2018: Wang Chiew Tan [dblp] [webpage] [wikidata]
- 2018: Wim Martens [dblp]
- 2017: Alon Y. Halevy [dblp] [webpage] [webpage] [wikidata] [wikipedia]
- 2017: Dan Suciu [dblp] [webpage] [wikidata] [wikipedia]
- 2017: Foto N. Afrati [dblp] [webpage]
- 2017: Jan Paredaens [dblp] [webpage]
- 2017: Jeffrey D. Ullman [dblp] [webpage] [wikidata] [wikipedia]
- 2017: Jeffrey F. Naughton [dblp] [webpage] [wikidata] [wikipedia]
- 2017: Moshe Y. Vardi [dblp] [webpage] [wikidata] [wikipedia]
- 2017: Peter Buneman [dblp] [webpage] [wikidata] [wikipedia]
- 2017: Ronald Fagin [dblp] [webpage] [wikidata] [wikipedia]
- 2017: Sara Cohen [dblp] [webpage]
- 2017: S. Muthukrishnan [dblp] [webpage]
- 2017: Thomas Schwentick [dblp] [webpage]
- 2017: Tova Milo [dblp] [webpage] [wikidata] [wikipedia]
- 2017: Val Tannen [dblp] [webpage]
- 2017: Wenfei Fan [dblp] [orcid] [wikidata] [wikipedia]
- 2016: Alin Deutsch [dblp] [webpage]
- 2016: Christoph Koch [dblp] [webpage]
- 2016: Marcelo Arenas [dblp] [webpage]
- 2016: Yehoshua Sagiv [dblp] [wikidata] [wikipedia]
- 2015: Dirk Van Gucht [dblp] [webpage]
- 2015: Gösta Grahne [dblp] [webpage]
- 2013: Abraham Silberschatz [dblp] [webpage] [wikidata] [wikipedia]
- 2013: Christos H. Papadimitriou [dblp] [webpage] [wikidata] [wikipedia]
- 2013: Giuseppe De Giacomo [dblp] [webpage]
- 2012: Richard Hull [dblp] [webpage]
- 2010: Jianwen Su [dblp] [webpage]
- 2009: Catriel Beeri [dblp] [webpage]
- 2009: Raghu Ramakrishnan [dblp] [webpage] [wikidata] [wikipedia]
- 2005: Alberto O. Mendelzon [dblp] [webpage] [wikidata] [wikipedia]
- 2001: Eljas Soisalon-Soininen [dblp]
- 1995: Paris C. Kanellakis [dblp] [webpage] [wikidata] [wikipedia]
Another kind of statistics that can be computed in this way is the "neighboring" conferences, i.e., the other conferences where members of the community have published. Here is the list of the top neighboring conferences of PODS and ICDT, sorted by the number of active community members who have published at least once there since 2015 (with hyperlinks and descriptions added manually):
- 38: SIGMOD Conference, the practical database conference held jointly with PODS
- 34: AMW, the database theory workshop held in honor of Alberto O. Mendelzon (whom you may remember from the previous list)
- 29: IJCAI, an AI conference
- 28: ICALP, a theoretical CS conference on logics and automata
- 26: LICS, another theoretical CS conference about logics
- 20: SODA, a theoretical CS conference on algorithms
- 19: AAAI, another AI conference
- 19: EDBT, the practical database conference held jointly with ICDT
- 18: WWW, a conference about the World Wide Web
- 15: Description Logics, the workshop on description logics
- 15: ICDE, a practical data management conference
- 13: CIKM, an information and knowledge management conference
- 12: SEBD, the Italian conference on databases
- 12: STOC, a general-purpose theoretical computer science conference
- 11: KR, a conference on knowledge representation and reasoning
- 11: FOCS, another general-purpose theoretical computer science conference
It would be interesting to visualize this data differently, e.g., visualize a world map with the community members, but sadly the affiliation information in DBLP is too sparse for this to work.