Welcome to the homepage of the DIG seminar, which is the regular seminar of the DIG team of LTCI at Télécom ParisTech. The seminar features talks by members of the team and guests from other research groups, as well as discussions on topics of relevance to the team. Attendance is generally open to the public, feel free to contact us if you are interested in coming. Talks are held at Télécom ParisTech, 46 rue Barrault, Paris, France, métro Corvisart. You can contact Antoine Amarilli for any inquiry about the seminar.

The seminar has been formerly called "DBWeb seminar" and "IC2 seminar". You may also be interested in the LTCI Data Science Seminar, which is co-organized by DIG and S2A.

14 June 2018, 12:00, C47

Lucie-Aimée Kaffee, University of Southampton
Multilinguality of Wikidata (slides)

Abstract: The web in general shows a lack of support for non-English languages. One way of overcoming this lack of information is using multilingual linked data. Wikidata data supports over 400 languages in theory. In practice, however, not all languages are equally supported. As a first step, we want to explore the language distribution of a collaboratively edited knowledge base such as Wikidata d label coverage of the web of data in general. Labels are the access point for humans to the web of data, and a lack thereof means limited reusability. Wikipedia is an ideal candidate for reuse of the multilingual data: the project has instances in over 280 languages, but the number of articles differ drastically. For many readers it could be a first starting point to get information. wever, with a lack of information the project is unlikely to attract new community members that could create new articles. We investigate the possibility of neural natural language generation for underserved Wikipedia communities, using kidata’s facts and evaluate this approach with the help of the Arabic and Esperanto Wikipedia communities. This approach can only be as good as the amount of multilingual data we have at our disposal. Therefore, we discuss future ways of improving the coverage of under-resourced languages’ information in Wikidata.

Biograhphy: Lucie is a PhD student at the School of Electronics and Computer Science, University of Southampton, as part of the Web and Internet Science (WAIS) research group. Additionally, she is part of the part of the Marie Skłodowska-Curie ITN Aqua. Generally, she is working on how to support underserved languages on the web with the means of linked data. Therefore, her research interests include linked data, multilinguality, Wikidata, underserved languages on the web and most recently natural language generation and relation extraction. Before getting involved with research, she worked as a software developer at Wikimedia Deutschland in the Wikidata team. There she was already involved in the previously mentioned topics, developing the ArticlePlaceholder extension, ich includes Wikidata’s structured knowledge on Wikipedias of small languages, a project she continued research on. She is still involved in Open Source projects, mainly Wikimedia related, where she is currently part of the Code of Conduct Committee for technical spaces.

23 May 2018, 12:05, C47

Viktor Losing, University of Bielefeld, HONDA Research Institute Europe
Memory Models for Incremental Learning Architectures (slides)

Abstract: There are more and more products available with automated functions for human assistance or autonomous services in home or outdoor environments. A common problem is the inadequate match between user expectations which are highly individual and the assistant system function which is typically rather standardized. Incremental learning methods offer a way to adapt the parameters and behavior of an assistant system according to user needs and preferences. In this talk, I will illustrate the benefits of personalization and incremental learning using the task of driver maneuver prediction at intersections. The study is based on a collection of commuting drivers who recorded their daily routes with a standard smart phone and GPS receiver. The personalized prediction based on at least one experience of a certain intersection already improves the prediction performance over an average prediction model trained.

A closely related topic is incremental learning in non-stationary data streams which is highly challenging, since the possibly occurring types of drift are fundamentally different and undermine classical assumptions such as data independence or stationary distributions. Here, I will introduce the Self Adjusting Memory (SAM) model for the k Nearest Neighbor (kNN) algorithm. The basic idea is to construct dedicated models for the current and former concepts and apply them according to the demands of the given situation. In an extensive evaluation, SAM-kNN achieves highly competitive results throughout all experiments, underlining its robustness and capability to handle heterogeneous concept drift.

Biography: Viktor Losing received his M. Sc. in Intelligent Systems at the University of Bielefeld in 2014. Since 2015 he is a PhD student at the CoR-Lab of the University of Bielefeld in cooperation with the HONDA Research Institute Europe. His research interests comprise incremental and online learning, learning under concept drift as well as corresponding real-world applications.

28 March 2018, 12:05, C47

Romain Giot, IUT Bordeaux and LaBRI
Biometric performance evaluation with novel visualization (slides)

Abstract: Biometric authentication verifies the identity of individuals based on what they are. However, biometric authentication systems are error prone and can reject genuine individuals or accept impostors. Researchers on biometric authentication quantify the quality of their algorithm by benchmarking it several databases. However, although the standard evaluation metrics state the performance of a system, they are not able to explain the reasons of these errors.

After presenting the existing evaluation procedures of biometric authentication systems as well as visualisation properties, this talk presents a novel visual evaluation of the results of a biometric authentication system which helps to find which individuals or samples are sources of errors and could help to fix the algorithms. Two variants are proposed: one where the individuals of the database are modelled as a firected graph and another one where the biometric database of scores is modelled as a partitioned power-graph where nodes represent biometric samples and power-nodes represent individuals. A novel recursive edge bundling method is also applied to reduce clutter. This proposal has been successfully applied on several biometric databases and proved its interest.

Biography: I am associate professor at the IUT de Bordeaux and the LaBRI and head of the team “Back to Bench and Beyond” of the group “Bench to Knowledge end Beyond”. I have a research experience in biometric authentication (as a PhD student at the university of Caen where I worked on template update and multibiometrics for keystroke dynamics), anomaly detection (as a postdoctoral researcher at Orange Labs where I worked on fraud detection in mobile payment), and large graph visualisation (since I'm associate professor at Bordeaux).

5 March 2018, 12:05, C46

Fadi Badra, LIMICS
Analogical Transfer: a Form of Similarity-Based Inference? (slides)

Abstract: Making an analogical transfer consists in assuming that if two situations are alike in some ways, they may be alike in others. Such a cognitive process is the inspiration for different machine learning approaches like analogical classification, the k-nearest neighbors algorithm, or case-based reasoning. This talk explores the role of similarity in the transfer phase of analogy, by taking a qualitative reasoning viewpoint. We first show that there exists an intimate link between the qualitative measurement of similarity and computational analogy. Essential notions of formal models of analogy, such as analogical equalities/inequalities, or analogical dissimilarity, and the related inferences (mapping and transfer) can be formulated as operations on ordinal similarity relations. In the light of these observations, we will defend the idea that analogical transfer is a form of similarity-based inference.

Biography: Fadi Badra is an assistant professor at Paris 13 University, and is a member of the Medical Informatics and Knowledge Engineering Research Group (LIMICS) in Paris, France. He completed his PhD in the Orpailleur Research Group at the LORIA Lab in Nancy, France. His current research interests are in the area of computational analogy and case-based reasoning, with a particular focus on its adaptation phase.

22 November 2017, 12:00, C47

Vwani Roychowdhury, UCLA
The Unreasonable Effectiveness of Data: A Scalable framework for "Understanding" Social Forums and Online Discussions (no slides provided)

Abstract: As humans we interpret and react to the world around us in terms of narratives. At a basic level, a narrative is comprised of principal actors and entities, their interactions, and finally the decisions they make to reinforce and protect their interests. The primary question we address in this talk is whether a computer can automatically distill and create such narrative maps from millions of posts and discussions that happen in the online world. How much and which parts of the underlying narratives can be extracted via unsupervised statistical methods, and how much "humanness" needs to becoded into a computer? We provide a framework that uses statistical techniques to generate automated summaries, and show that when augmented with a small-size dictionary that encodes "humanness," the framework can generate effective narratives from a number of domains. We will present several sets of empirical results where millions of posts are processed to generate story graphs and plots of the underlying discussions.

Biography: Vwani Roychowdhury is a Professor of Electrical and Computer Engineering at University of California, Los Angeles (UCLA). He specializes in interdisciplinary work that deal with the modeling and design of information and computing systems, ranging from the physical, biological and engineered systems. He has done pioneering work in Quantum Computing, Nanoelectronics, Peer-to-Peer (P2P), social and complex networks, machine learning, text mining, artificial neural networks, computer vision, and Internet-Scale data processing. He has published more than 200 peer reviewed journal and conference papers, and co-authored several books. He has also cofounded several silicon valley startups, including www.netseer.com and www.stieleeye.com.

18 October 2017, 12:00, C47

Yun Sing Koh, University of Auckland
Using Volatility in Concept Drift Detection and Capturing Recurrent Concept Drift in Data Streams (slides)

Abstract: Much of scientific research involves the generation and testing of hypotheses that can facilitate the development of accurate models for a system. In machine learning the automated building of accurate models is desired. However traditional machine learning often assumes that the underlying models are static and unchanging over time. In reality there are many applications that analyse data streams where the underlying model or system changes over time. This may be caused by changes in the conditions of the system, or a fundamental change in how the system behaves. In this talk, I will present a change detector called SEED, and how we capture stream volatility. We coin the term stream volatility, to describe the rate of changes in a stream. A stream has a high volatility if changes are detected frequently and has a low volatility if changes are detected infrequently. I will also present a drift prediction algorithm to predict the location of future drift points based on historical drift trends which we model as transitions between stream volatility patterns. Our method uses a probabilistic network to learn drift trends and is independent of the drift detection technique. I will then present a meta-learner, Concept Profiling Framework (CPF) that uses a concept drift detector and a collection of classification models to perform effective classification on data streams with recurrent concept drifts, through relating models by similarity of their classifying behaviour.

Biography: Yun Sing Koh is a Senior Lecturer at the Department of Computer Science, The University of Auckland, New Zealand. She completed her PhD at the Department of Computer Science, University of Otago, New Zealand in 2007. Her current research interest is in the area of data mining and machine learning, specifically data stream mining and pattern mining.

12 September 2017, 12:00, C47

Bob Durrant, University of Waikato
Random Projections for Dimensionality Reduction (slides)

12 July 2017, 12:00, C47

Amin Mantrach, Criteo Research
Deep Character-Level Click-Through Rate Prediction for Sponsored Search (slides)

31 May 2017, 12:00, C48

Quentin Lobbé, Télécom ParisTech
An exploration of web archives beyond the pages : Introducing web fragments (slides)
Mikaël Monet, Télécom ParisTech
Probabilistic query evaluation: towards tractable combined complexity (slides)

26 April 2017, 12:00, C47

Themis Palpanas, LIPADE, Paris Descartes University
Riding the Big IoT Data Wave: Complex Analytics for IoT Data Series (slides)

8 March 2017, 12:00, C47

Thomas Bonald, Télécom ParisTech
Community detection in graphs (slides)

27 February 2017, 12:00, C46

Laurent Decreusefond, Télécom ParisTech
Stochastic geometry, random hypergraphs, random walks (slides)

26 January 2017, 12:00, C47

Nofar Carmeli, Technion
Efficiently Enumerating Tree Decompositions (slides)

11 January 2017, 12:00, C47

Simon Razniewski, Free University of Bozen-Bolzano
Query-driven Data Completeness Assessment (slides)

14 December 2016, 12:00, C47

Fabian M. Suchanek, Télécom ParisTech
A hitchhiker’s guide to Ontology (slides)

23 November 2016, 12:00, C47

Ngurah Agus Sanjaya ER, Télécom ParisTech
Set of T-uples Expansion by Example (slides)
Qing Liu, National University of Singapore
Top-k Queries over Uncertain Scores (slides)

26 October 2016, 12:00, C46

Maria Koutraki, Université Paris-Saclay
Approaches towards unified models for integrating Web knowledge bases. (slides)

From November 2013 to September 2016

During this time, the DBWeb seminar was held as part of the IC2 group seminar. These seminars are listed on the IC2 seminar Web page.

10 September 2013, 14:00, C49

Antoine Amarilli
Taxonomy-Based Crowd Mining (slides)
Jean-Louis Dessalles
Relevance (slides)

14 January 2013, 10:00, B549

Vincent Lepage, Cinequant
Cinequant, datamining pour le monde réel
Jean Marc Vanel, Déductions SARL
EulerGUI, un outil libre pour le Web Sémantique et l'inférence

04 December 2012, 10:00, C017

Jean-Louis Dessalles
Why spend (so much) time on the social Web? A model of investment in communication
François Rousseau
Short talk and brainstorming on graph based text representation and mining

20 November 2012, 10:00, C017

Mohamed-Amine Baazizi
Static analysis for optimizing the update of large temporal XML documents
Christos Giatsidis
S-cores and degeneracy based graph clustering

6 November 2012, 10:00, C49

Jonathan Michaux, Télécom ParisTech
Interaction safety in Web service orchestrations (slides)
Georges Gouriten
Brainstorming on knowledge-based content suggestions on the social Web

16 October 2012, 10:00, C49

Clémence Magnien, Université Pierre et Marie Curie
Measuring, studying, and modelling the dynamics of Internet topology
Imen Ben Dhia
Evaluating reachability queries over large social graphs (slides)

2 October 2012, 10:00, C017

Idrissa Sarr, Université Cheikh Anta Diop
Dealing with the disappearance of nodes in social networks (slides)
Damien Munch
“Eating cake during a scientific talk:” Can we reverse-engineer natural language aspectual processing? (slides)

18 September 2012, 10:00, C017

Silviu Maniu
Context-Aware Top-k Processing using Views
Asma Souihli
Optimizing Approximations of DNF Query Lineage in Probabilistic XML (slides)

4 September 2012, 10:00, C017

Antoine Amarilli
Advances in holistic ontology alignment (slides)
Yannis Papakonstantinou, University of California, San Diego
Declarative, optimizable data-driven specifications of web and mobile applications