Workshop Program “The Way Forward: Future Challenges in Software Engineering”
Thank you for joining us. You can now access and download the speakers' presentations here.
08:30 - 09:00 - Welcome and Registration
09:00 - 09:30 – Introduction – Chair: Juncal Alonso, TECNALIA
09:30 - 10:30 – Security in the Computing Continuum – Chair: Mark Miller, Conceptivity
- Cyber-Physical Systems: Attack and Defence – Martin Higgins, University of Oxford
The objective of this talk is to inform the audience on the risks associated in cyber-physical systems. Cyber-Physical systems are systems where computing networks interact directly with the real world. A good example being the power system wherein distribution, transmission and physical assets interact with software, computing & IT through the SCADA network. Of particular interest is deception attacks whereby an attack vector is hidden. These attacks are gaining prominence and can feature in a number of cyber-physical systems including power systems, water networks, self-driving vehicles and industrial control systems. The talk might garner significant industrial interest as attacks like Stuxnet have caused significant damage to cyber-physical systems in the past.
- Security-by-Design methodologies and security metrics – Valentina Casola, University of Napoli Federico II
This talk concerns modeling and assessing security through Service Level Agreements (SLAs) with a quantitative approach.
- Usage of AI to automate human-related cyber attacks – Francesco Morano, CEFRIEL
This talk presents a Proof of Concept (PoC) for a semi-automatic phishing attack that uses Artificial Intelligence (AI) and discusses the usage of AIs to automate human-related cyber attacks. The PoC uses different network types to automatically compose highly targeted phishing emails with information gathered from the initial OSINT analysis of potential victims. We aim to explore the potential of AI in a full OpSec attack stack, simulating an all-out attack from a cyber criminal's perspective. Using AI tools, we implement a complete attack chain that includes collecting victims' data through OSINT, generating the phishing email body using GPT-2, and creating a graphic that mimics the actual organization's brand identity using other models. Our study helps penetration testers and red teams build targeted phishing simulations more rapidly and provides a methodological approach to social engineering attacks. We also discuss the effectiveness of tools like ChatGPT in the current era of automated phishing using generative AIs. According to the Verizon 2022 Data Breach Investigations Report, phishing remains the most common human element in data breaches, accounting for 82% of cases. By approaching the problem from a cyber criminal's point of view, we aim to understand the feasibility of such an attack tactic and prepare countermeasures.
10:30 - 11:00 – Coffee break
11.00 - 13.00 – Software Engineering and AI – Chair: John Favaro, Trust-IT
- Evaluating and validating hybrid AI algorithms and asserting their adherence to an ethical and legal overarching approach to trustworthiness – Nuria Quintano Fernández, TECNALIA
Context and objectives: Development of hybrid and explainable AI models for industrial use. Analysis and development of tool-chains and metrics for explaining, evaluating and validating hybrid AI algorithms and asserting their adherence to an ethical and legal overarching approach to trustworthiness. This will be exemplified using real-world industrial use cases in the Robotics domain (collaboration between human and robots for logistics activities) and Space domain (failure detection for satellites) to promote the widespread adoption of hybrid AI in industrial settings.
The overarching ethical and legal approach is based on Value Sensitive Design (VSD) methodology: the end users and the technology developers are working together from the beginning of the engineering cycle to understand and define the ethical and trustworthiness characteristics. In order for the engineering team and other relevant stakeholders to understand whether those characteristics will be present as expected/required when the system is operating, they need to quantify their current and future presence. For this, we aim at generating a statistical model which provides with a sufficient confidence degree, information regarding the cause-effect relationship between ethics and trustworthy lead and lag KPIs throughout the engineering lifecycle. Moreover, the provision of general evaluation mechanisms to understand and quantify confidence of AI models will increase the adoption of AI technology in industrial environments, specifically in safety-critical industry.
- Hybrid IA
- Quantifying what matters, at low (e.g. algorithm) and high (e.g. sub-system/system) levels.
- Confidence & explainable AI models to be used in industrial contexts
Importance: Focus on what matters regarding expected/required ethical and trustworthiness performance values, based on the market context as well as on the AI system expected use and expected confidence and model understandability.
Potential impact: Start walking the way for understanding AI and AI-based systems performance from different viewpoints and for approaching AI Act EU future regulation conditions, throughout the IA software engineering lifecycle.
- Software Engineering for Machine Learning, some first experiences – Luciano Baresi Politecnico di Milano
While we all are convinced that Machine Learning can greatly help many different software engineering activities, from requirements elicitation to software testing and maintenance, there are probably some doubts that software engineering can help ameliorate machine learning-based systems. This talk wants to try to answer this question without asking chatGPT, but by presenting some first experiences carried out in our research group. The presentation will discuss some first attempts at healing machine learning models, at improving the deployment and operation of federated learning systems and at provisioning heterogeneous resources to ease and optimise the training and inference phases of machine learning-based systems. The final goal is to try to identify some recurring problems and some open issues for a plausible research agenda.
- MLOps the hard way: a journey on integrating AI and software engineering – Michele Ciavotta, University of Milano – Bicocca
Machine learning (ML) projects are challenging to automate and operationalize, especially when they involve collaboration between software engineers and data scientists. I will share my personal experience of working on a complex ML project that required integrating AI and software engineering principles and tools. I will discuss the main challenges and lessons learned from applying MLOps practices to manage the ML life cycle from experimentation to production.
- On the evaluation of binary classifiers for Software Engineering – Luigi Lavazza, Università degli Studi dell'Insubria
AI techniques (mainly Machine Learning and Neural Networks) support the construction of prediction models.
Among these are binary classifiers (BCs), i.e., models that, given a phenomenon described via a set of features, classify it in one of two classes (e.g., "positive" or "negative").
Binary classifiers are widely used in SE, for instance, to predict interesting characteristics of code modules (classes, methods, functions, etc.). For instance, several models have been proposed for predicting defectiveness or difficult maintenance. BCs are also widely used in other fields related to SW constructions, e.g., for evaluating the vulnerability of code.
For both researchers and practitioners, a problem with BC is the evaluation of BCs' accuracy. Evaluation can be absolute (is the BC "good enough"?) or relative (which of the available BS is most accurate?). To support these kinds of evaluations, many accuracy indexes (aka "performance metrics") have been proposed and are widely used.
Accuracy indexes have specificities (and sometimes flaws) that are largely ignored. Unfortunately, using accuracy indexes without considering their characteristics can lead to incorrect conclusions concerning the actual accuracy of BCs, with possibly serious consequences.
In this talk, we will see the most widely used accuracy indexes and their characteristics. We will also examine the consequences of the unaware usage of Accuracy Indexes. We can expect that research aiming at creating better BCs will be very active in the near future; hence, proper BCs will be needed. In this respect, it is important to raise awareness of the limitations of the current proposals and possibly suggest new, better ways of representing classification accuracy. To this end, the use of "cost" indicators, directly linked to the use of BCs in the real world, is promising. Therefore, the talk will provide some recommendations on how to approach BC evaluation, also with reference to the consequent costs.
- Development of an IoT2cloud Operating System (ICOS) – Marina Giordanino, Stellantis
The unstoppable proliferation of novel computing and sensing device technologies and the ever-growing demand for data-intensive applications in the edge and cloud are driving a paradigm shift in computing around the dynamic, intelligent, and yet seamless interconnection of IoT, edge, and cloud resources in one single computing system to form a continuum. Many research initiatives have focused on deploying a sort of management plan intended to properly manage the continuum. Simultaneously, several solutions exist aimed at managing edge and cloud systems through not suitably addressing the whole continuum of challenges. The next step is, without doubt, the design of an extended, open, secure, trustable, adaptable, technology-agnostic, and much more complete management strategy, covering the full continuum, i.e., IoT-to-edge-to-cloud, with a clear focus on the network connecting the whole stack, leveraging off-the-shell technologies (e.g., AI, data, etc.), but also being open to accommodate novel services as technological progress goes on. The ICOS project aims at covering the set of challenges that come up when addressing this continuum paradigm by proposing an approach embedding a well-defined set of functionalities, ending up in the definition of an IoT2cloud Operating System (ICOS). Indeed, the main objective of the project ICOS is to design, develop, and validate a meta-operating system for a continuum by addressing the challenges of: i) device volatility and heterogeneity, continuum infrastructure virtualisation, and diverse network connectivity; ii) optimized and scalable service execution and performance, as well as resource consumptions, including power consumption; iii) guaranteed trust, security, and privacy; and iv) reduction of integration costs and effective mitigation of cloud provider lock-in effects, in a data-driven system built upon the principles of openness, adaptability, data sharing, and a future edge market scenario for services and data.
- AI-SPRINT: Design and Runtime Framework for Accelerating the Development of AI Applications in the Computing Continuum - Francesco Lattari, Politecnico di Milano
Artificial Intelligence (AI) and edge computing have recently emerged as major trends in the ICT industry. Enterprise applications increasingly make intensive use of AI technologies and are often based on multiple components running across a computing continuum. However, the heterogeneity of the technologies and software development solutions in use are evolving quickly and are still a challenge for researchers and practitioners. Indeed, lack of solutions tailored for AI applications is observed in the areas of applications placement and design space exploration with performance guarantees, both under-developed. The aim of the AI-SPRINT “Artificial intelligence in Secure PRIvacy-preserving computing coNTinuum” project is to develop a framework composed of design and runtime management tools to seamlessly design, partition and operate Artificial Intelligence (AI) applications among the current plethora of cloud-based solutions and AI-based sensor devices (i.e., devices with intelligence and data processing capabilities), providing resource efficiency, performance, data privacy, and security guarantees. AI-SPRINT is intended to accelerate the development of AI applications, whose components are spread across the edge-cloud computing continuum, while allowing trading-off application performance and AI models accuracy. This is accomplished by the thorough suite of design tools provided by the AI-SPRINT framework, which exposes a set of programming abstractions with the goal of hiding as much as possible the computing continuum complexity, while further providing a simple interface to define desired constraints upon which the application design is guided. The communication across components is hidden and the parallelization of the compute-intensive part of the application is transparently implemented. AI applications are enriched by quality of service (QoS) annotations to express performance, accuracy, and security constraints that could be evaluated to drive the application deployment. The suite of annotations enables setting requirements in terms of execution times, for single or multiple application components, data-flow rate, and secured executions. Specialized annotations are dedicated to AI components exploiting deep neural networks (DNN) for the automatic partitioning of the models across edge-cloud computational layers, through the exploitation of the Open Neural Network Exchange (ONNX) format, and for splitting the computation based on the existence of early exits in the networks. AI-SPRINT design annotations additionally enable defining components with multiple implementations to automatically design alternative deployments with degraded performance, which can be selected accordingly based on the requirements. Furthermore, a specific QoS annotation enables the automatic detection of data drift at runtime by triggering the deployment of an ad-hoc tool, running in parallel to the application workflow, whose task is to detect changes in the time series of user-defined metrics, monitored by the AI-SPRINT Monitoring Subsystem (AMS). The drift detection enables the collection of new data to retrain the DNN-based models, thus adapting to the new distribution. Abstractions are the very first step of the AI-SPRINT design workflow, which involve key tools for the automatic generation of Docker images for the containerized execution of application components (TOSCARIZER), for the automatic profiling of application performance (OSCAR-P), and for the automated design space exploration to provide the optimal production deployments (SPACE4AI-D). Design tools, further enriched with tools for neural architecture search (NAS), federated learning, and privacy preserving models execution, make AI-SPRINT an advanced framework that fills known technology gaps, by providing solutions to efficiently develop and deploy AI enterprise applications across the computing continuum.
13.00 - 14.10 – Lunch
14.10 - 15:30 – Software Engineering for Quantum Computing – Chair: Elisabetta Di Nitto, Politecnico di Milano
- The Quantum Frontier of Software Engineering and its Adoption – Dario Di Nucci, University of Salerno
Quantum computing is no longer only a scientific interest but is rapidly becoming an industrially available technology that can potentially overcome the limits of classical computation. Over the last few years, all major companies have provided frameworks and programming languages that allow developers to create quantum applications. This shift has led to the definition of a new discipline called quantum software engineering, which is required to define novel methods for engineering large-scale quantum applications.
This talk presents the challenges developers experience when dealing with quantum technology and how academia is trying to solve them.
On the one hand, I will showcase a taxonomy of the purposes for which quantum technologies are used and the results of our interviews to elicit developers' opinions on the current adoption and challenges of quantum programming.
On the other hand, I will report on the results of a systematic mapping study we conducted to understand the current state of quantum software engineering research, aiming to identify the most investigated topics, the main reported results, and the most studied quantum computing tools and frameworks.
- The Role of Formal Probabilistic Models in Quantum Computing – Vlado Stankovski, University of Ljubljana
In many potential applications of quantum computing, there is the promise of being able to solve complex problems that cannot be efficiently tackled by existing supercomputers. This talk will explore the possibility of information management in quantum computing via formal probabilistic models. The whole idea is to enable two new concepts in quantum computing: sincerity and sensitivity.
- Quantum Optimization: current trends and open opportunities – Eneko Osaba, TECNALIA
Context: Quantum optimization has generated a profound impact in recent years. How to implement novel quantum solvers or how to introduce quantum methods into already-existing classical pipelines or algorithms is currently attracting great interest. In this regard, the fast advances in hardware technology and the democratization of its access have made research take off, especially in the optimization branch. Regarding application fields, transportation, finance, energy, or medicine are some examples of how quantum optimization can contribute to the development of notable scientific advancements. Even so, research in the application of quantum techniques to industrial use cases cannot ignore the state of the hardware. Current quantum computers suffer from certain limitations that directly affect their capability and performance. The current state of quantum computing is known as the "noisy intermediate scale quantum" (NISQ) era. Quantum devices available in this NISQ era are characterized by not being completely reliable enough to deal with large problems.
Goal: The goal of our talk is to provide an overview of the current trends in quantum optimization. We will talk about quantum-gate-based approaches as well as those related to quantum annealing. We will highlight the main strengths and weaknesses of each paradigm, spotlighting current open challenges. We will align these concerns with the principles and commitment described in the "Talavera Manifesto for Quantum Software Engineering and Programming". In this regard, it should be noted that the quantum computing research community holds complementary interests, presumably as a result of individual areas of knowledge or interest: i) practitioners coming from industrial and applied research groups, which are mostly concerned about quantum computing-based formulations and experiments over more realistic scenarios; ii) researchers coming from quantum physics, usually interested in analyzing the hardware performance and reliability; and iii) researchers with backgrounds in traditional artificial intelligence, involved in testing the limits of QC, comparing results, and leveraging fundamentals, heuristics, or shareable-across-platform knowledge. So, one of the main challenges is to build a united community and establish a strong research avenue, which will lay the foundations that will guide research in the coming years.
Besides that, we will focus especially on quantum-classical hybrid approaches. In the NISQ era, hybrid solving pipelines are receiving outstanding attention from the community since they are arguably the near-term future in the context of quantum optimization and its application in real-world use cases. A lot of research is being conducted in this direction, presenting novel methods and avant-garde applications. In this sense, although hybrid approaches have undeniable potential, they also raise some crucial open questions that must guide the related research in the following years. As a final reflection, we should be aware that hybrid approaches are not a way to circumvent the limited quantum resources but a way of boosting performance in real-world use cases. Hybrid approaches will be here forever, and quantum procedures will play a role as well as classical methods. Learning how to get the most out of each of them will be the key to success.
- Quantum Annealing as the analog version of Quantum Computing - Paolo Cremonesi, Politecnico di Milano
Context: The quantum computing community has shown significant interest in quantum annealing due to its status as commercially available hardware with the largest number of qubits, far surpassing ""digital"" hardware based on the gate model.
In quantum annealing, the solution to a computational problem is encoded as a vast array of physical parameters that define the initial configuration and dynamic evolution of the quantum annealer. Consequently, the final state of the system represents the solution to the given problem. Despite their potential computational power, quantum annealers bypass the traditional notion of algorithms and require programmers to understand the principle of quantum mechanics that drive the behavior of the quantum annealer.
Goal: We will explore the architecture of quantum annealers, highlighting the main open challenges in their usage. We will describe the programming frameworks and software tools specifically tailored to support quantum annealers. Additionally, considerations related to performance analysis, algorithmic design, and integrating quantum annealing into existing computing infrastructure will be discussed. Finally, we will provide examples on how quantum annealers can be used as hardware accelerators in some machine learning problems."
15:30 - 16.00 – Coffee break
16.00 - 17:30 – Sustainable Software Engineering and final discussion – Chair: David Wallom, University of Oxford
- Extreme and Sustainable Graph Processing for Urgent Societal Challenges in Europe - Radu Prodan, University of Klagenfurt
The use, interoperability, and analytical exploitation of graph data are essential for the European data strategy. Graphs or linked data are crucial to innovation, competition, and prosperity and establish a strategic investment in technical processing and ecosystem enablers. Graphs are universal abstractions that capture, combine, model, analyze, and process knowledge about real and digital worlds into actionable insights through item representation and interconnectedness. For societally relevant problems, graphs are extreme data that require further technological innovations to meet the needs of the European data economy. For example, a study by IBM revealed that the world generates nearly 2.5 quintillion bytes of financial data daily, posing extreme analytics challenges. The complexity, diversity, and data multilingualism lead to highly complex graph operations when encoding media, news, and social-networking messages. Digital graphs help pursue the United Nations Sustainable Development Goals by enabling better value chains, products, and services for more profitable or green investments in the financial sector and deriving trustworthy insight for creating sustainable communities. All science, engineering, industry, economy, and society-at-large domains can leverage graph data for unique analysis and insight, but only if graph processing becomes easy to use, fast, scalable, and sustainable.
The presentation discusses the software engineering challenges in the research and development of a next-generation high-performance, scalable, gender-neutral, secure, and sustainable platform for multilingual information processing and reasoning based on the massive graph representation of extreme data. Massive graphs, unifying general graphs, knowledge graphs, and property graphs integrate patterns and store interlinked descriptions of objects, events, situations, concepts, and semantics. The graph processing platform must support the “volume” graph challenge by supporting up to billions of vertices and trillions of edges. It must tackle the “velocity” graph challenge of dynamically changing topologies and introduces a novel “viridescence” graph challenge for sustainable processing at scale. Its support for extreme data must extend existing graph processing technological capabilities by orders of magnitude for at least one “v”-characteristic.
The presentation describes the software engineering design of an open-source graph processing toolkit of five open-source software tools and FAIR graph datasets covering the sustainable lifecycle of processing extreme data as massive graphs. The tools focus on holistic usability (from extreme multilingual data ingestion and massive graph creation), automated intelligence (through analytics and reasoning), performance modeling, and environmental sustainability tradeoffs supported by credible data-driven evidence across HPC systems and computing continuum. The automated operation based on the emerging serverless computing paradigm protected by state-of-the-art cybersecurity measures must support experienced and novice stakeholders from a broad group of large and small organizations to capitalize on extreme data through massive programming and processing.
The large-scale graph analytics tools market still traverses a developing phase, hampered by the lack of technology research and industrial adoption. The presentation concludes by introducing several complementary use cases that require and benefit from massive graph processing, considering their extreme data properties and coverage of the three sustainability pillars: economy, society, and environment. It discusses how powerful graph-based analytics allows European green financial investments, automotive, and media industries can accelerate at supercomputing speed and get a competitive advantage, with evidence of improved performance and sustainability.
- Commonalities, Themes, and Take-Home Messages - David Wallom, University of Oxford
In this final wrap-up, the participants will collectively take the measure of the workshop and its common emerging threads and results in a highly interactive session.
Thank you for joining us. All the presentations are now available here.