In this series of seminars dedicated to the memory of Mauriana Pesaresi, a doctoral student of the Computer Science Department of the University of Pisa, first-year PhD students in Computer Science will present an open research problem related to their field of study. After each seminar, a panel discussion will follow.
You can find the programme of the 2022 edition here, and the programme of the 2023 edition here.
For any further information, you can reach us out via email.
14:00-15:00
Conventional methodologies face difficulties in striking a harmonious balance between data-driven learning and adherence to physical constraints, particularly when dealing with noisy datasets. To tackle this challenge, hybrid approaches are being explored, aiming to amalgamate the advantages of both Model-Based and Data-Driven methodologies to develop more robust models. Physics-informed neural networks (PINNs) are a hybrid approach, that presents a promising solution by integrating fundamental physical principles into model training, thereby enhancing generalization and interpretability. Despite the potential advantages of PINNs in reconciling physics-based constraints with data-driven learning, concerns persist regarding their tendency to generate unrealistic predictions. Nonetheless, a challenge arises over validating neural networks that represent a physical system characterized by unknown or partially known dynamics. Consequently, further research is required to refine the validation of AI-based systems, minimizing deviations from physical solutions. This presentation delves into the verification process of neural network models, with an emphasis on discerning the physical properties of real dynamical systems.
15:00-16:00
Periods and Deadlines are the key concepts sustaining the design of most real-time systems. At the same time, networking, security, and artificial intelligence are usually seen as add-ons or secondary design requirements. This strategy has proved effective and has been reinforced since the 60s. On the other hand, safety-critical cyber-physical systems, such as autonomous vehicles and automated factories, must handle gigantic flows of data that must be processed in real-time to tackle safety in decision loops. Nonetheless, the safety of such systems is not limited to time and results correctness. Instead, it must also consider the plausibility of data originating from a sensor or AI, the data temporal validity to the system, the security protocols in place, and often the communication between distributed systems, possibly facing the challenges of wireless communication. In this sense, we must learn to design and implement such systems around the data they handle instead of the tasks they run. This talk will cover the main challenges faced while designing modern safety-critical cyber-physical systems and how a data-driven design might address such challenges.
14:00-16:00
In this seminar, we will discuss how to index sorted string dictionaries. We will present the state-of-the-art support data structures based on different kinds of tries, describing also the current open problem related to dynamic string dictionaries. In the second part, we will discuss how to augment indexing data structures with machine learning. While the literature provides learned indexes for numerical values that are more effective w.r.t. the classical counterpart, the problem is still open for strings.
14:00-15:00
UNESCO data from 2017 shows a staggering “learning crisis” affecting millions of people, especially students with disabilities. To combat this, Universal Design for Learning (UDL) emerged in the 1990s, advocating for a tailored curriculum. Studies emphasize how appropriate use of Information and Communication Technology (ICT) can accelerate the adoption and application of UDL in various educational systems and benefit students with disabilities. Among ICTs, Virtual and Augmented Reality appear to have great potential for UDL as they are attention-grabbing and create immersive, personalized experiences that are difficult to replicate in person. Despite their potential, VR and AR are still little used in real educational contexts and most proposals in the literature relate to children. The seminar will focus on a systematic literature review that identifies research gaps, such as the lack of guidelines that inform and guide the design and development of AR apps, and the lack of attention given to the orientation and university education of students with disabilities. It identifies a number of open problems holding back the use of immersive technologies in the university environment and concludes with suggestions for an inclusive university.
15:00-16:00
Unlike traditional machine learning (ML) paradigms, graph neural networks (GNNs) are a class of ML models that can process graph structured data natively. Such models have proven successful in many diverse applications such as computational chemistry, biology, drug design, and social network analysis; however theoretical frameworks for understanding their expressivity - that is their power - are scarce. Analysis of the expressivity of graph based models is either done empirically using standard data sets (as determined by the literature) or with recourse to their comparison with the Weisfeiler Lehmann (WL) heuristics for graph isomorphism testing. The latter has proven successful in both understanding the sufficient conditions for optimising the expressivity of a standard GNN as well as the development of more expressive models such as k-GNNs. However, several seminal publications have demonstrated that GNNs that are determined expressive in the WL framework perform poorly on substructure identification tasks. Such tasks are crucial for the aforementioned applications and, consequently, the scope of expressivity must encompass them. In this presentation we explore the fundamentals of GNNs and their importance as a model class in machine learning. We discuss the Weisfeiler-Lehman framework for expressivity, providing examples of how these heuristics compare to the computational capabilities of standard GNNs and hence allow us to better understand their behaviour. Furthermore, we consider why GNNs struggle on tasks that involve substructure identification and hence motivate the application of concepts in algebraic topology for the development of new topologically informed graph neural networks.
14:00-15:00
End-User Development (EUD) is a research topic that aims to allow people without programming experience with concepts, methods, and tools enabling them to create or modify their applications. Recent mainstream technological trends associated with the Internet of Things and the widespread presence of robots in professional, social, and domestic settings have increased interest in this approach. In this presentation, we highlight the main aspects and open problems identified in surveys related to the topic. Additionally, we discuss potential research directions to develop an enhanced approach to EUD, allowing the robot itself to autonomously contribute to solving previously unseen tasks through activities programmed by users.
14:00-15:00
The use of large language models (LLMs) for question answering has grown considerably in recent years, especially since the introduction of ChatGPT. Although these are trained on datasets spanning large domains, not all topics can be covered. In some specific cases, we may want a model to be able to answer questions that refer to private documents, and therefore are not included in the datasets on which a model may have been trained, and to do so in a reliable and content-controlled way. Private texts can be semantically structured in databases or knowledge graphs. Fine-tuning a model would require the creation of ad hoc annotated datasets for these documents, which would be time consuming and resource intensive. On the other hand, direct prompting to LLMs without any tuning can lead to incorrect or inaccurate answers and thus hallucinations. We therefore want an LLM to be able to answer questions and then generate text from the documents stored in a structured way. The aim is to extend current work focused on text generation, which currently only takes simple knowledge graphs into account.
15:00-16:00
Quantum computing is a recent computational framework that utilizes unique properties of quantum mechanics. These properties enable algorithms to have exponential speedups over classical counterparts. This seminar will explore the application of this framework to image manipulation and analyze unsolved problems, such as matrix loading in the quantum framework and quantum convolutions.
14:00-15:00
The Neural Networks (NN) field is rapidly advancing, achieving significant milestones annually. However, NN models are often overparameterized, with billions of parameters, requiring substantial memory and computational resources. These models consist of layers such as fully connected or convolutional layers, containing large matrices due to overparameterization. Predictions involve intensive matrix operations, adding to the computational load. In this seminar, we will present the techniques that are present today in the state of art, these techniques can be divided into two: lossy and lossless, this means that you can compress matrices loosing information or not loosing information. Then, we will discuss the problems of this topic and possible research studies that can be made.
15:00-16:00
Recent years have witnessed significant advancement in Artificial Intelligence (AI), particularly with the rise of Deep Neural Networks (DNNs) fueled by large datasets and increased model complexity. However, the demand for substantial computational resources poses challenges, especially in decentralized data scenarios. Edge Intelligence (EI), combining Edge Computing (EC) and AI, emerges as a transformative solution for decentralized learning, crucial in the era of IoT proliferation. Federated Learning (FL) and Knowledge Distillation (KD) represent prominent paradigms in decentralized learning, each with its unique challenges and opportunities. This seminar delves into the development of KD as a decentralized learning method, exploring its principles, unresolved challenges, and promising research avenues.
14:00-16:00
Distributed computing has a long history, starting from the introduction of time-sharing techniques in the 1960s to the rise of cloud computing in the last two decades. However, the recent introduction of advanced technologies in areas such as IoT, autonomous vehicles, and Industry 4.0, demands a rethinking of cloud paradigms. In this seminar, we introduce two cutting-edge technologies in Cloud Computing: Serverless Computing and Cloud Continuum. Serverless Computing allows developers to write and deploy code without managing the underlying infrastructure, enhancing flexibility and scalability. Cloud Continuum extends cloud computing beyond traditional data centers, integrating computational resources distributed across the network including cloud, fog, edge and IoT. The main goal of this seminar is to provide insights into the potential implications and limitations of these two technologies in real-world scenarios. Furthermore, we explore key research directions and present current open challenges.
14:00-15:00
Adaptive Video Streaming (ABR) has significantly enhanced the landscape of video delivery by prioritizing user experience. The success of video transmission hinges on minimizing interruptions while ensuring compatibility with the user's device. Consequently, the development of novel technologies in this realm has become imperative, with a growing emphasis on performance measurement from a user-centric perspective. Techniques rooted in machine learning, cloud, fog, and edge computing have emerged as crucial enablers for maximizing bandwidth efficiency and reducing unnecessary waste. In this presentation, we aim to explore the multifaceted challenges and promising research avenues in adaptive video transmission, with a specific focus on enhancing user experience. We will showcase state-of-the-art techniques and research endeavors geared towards optimizing Adaptive Streaming. These efforts are geared towards minimizing bandwidth waste and enhancing transmission quality by continuously evaluating feedback from both network environment conditions and user interactions. Through this exploration, we endeavor to shed light on the evolving landscape of adaptive video transmission and its pivotal role in delivering seamless, high-quality viewing experiences to users.
15:00-16:00
In the era of Big Data, Similarity Search has become a fundamental paradigm for identifying relevant items, especially in fields like search engines and recommendation systems, where pinpointing similarities is key. Advances in neural networks have enabled the representation of objects as numerical vectors, simplifying Similarity Search into calculations of distances within vector spaces. Among the various techniques developed to address this challenge, graph-based methods stand out. These methods, which connect vectors through edges to facilitate the search process, have proven to be the most efficient and promising approach.
14:00-15:00
Since approximately 80% of global transportation relies on marine traffic, efficient monitoring is crucial. Synthetic Aperture Radar (SAR) data, captured as satellite images, offers a promising approach for maritime applications like ship classification and speed estimation. However, SAR images often contain noise (gutter data) and have low object resolution, hindering classification and detection. Traditional machine learning algorithms depend on manually crafted features, making it challenging to identify relevant features for low-resolution objects. Deep learning provides a powerful alternative by automatically learning these features, leading to improved model performance. This seminar focuses on the challenges associated with SAR data for marine traffic monitoring and explores potential solutions for image and model enhancement.
15:00-16:00
The tasks of finding particular common subsequences over a set of strings have drawn much interest over the years, with practical applications spanning from sequence alignment in bioinformatics to file comparison in tools such as git and diff. One of the most studied problem is finding the Longest Common Subsequence, which has been shown to be NP-hard for the general case of k strings. In this seminar we shall delve into a far less studied generalization of this problem called Maximal Common Subsequences; we will play with some approaches to solving it and find many pitfalls that emerge from intuitive solutions. Finally we will discuss together new possible practical applications that could make use of this relatively unknown problem.