Giuseppe Manco

Director of Research at the Italian National Research Council

Istituto di Calcolo e Reti ad Alte Prestazioni (ICAR-CNR)

Biography

Giuseppe Manco is Director of Research at the Institute of High Performance Computing and Networks of the National Research Council of Italy. His research interests include User Profiling and Behavioral Modeling, Social Network Analysis, Information Propagation and Diffusion, Recommender Systems, Machine Learning for Cybersecurity.

Expert on data science, data analytics and enabling technologies for data analytics. Interested in new frontiers of Computer Science and Technology aimed at analyzing Complex Big Data. Co-founder of Open Knowledge Technologies (OKT), a spin-off company of University of Calabria aimed at bringing innovation from academia to industry on the specific topics of Artificial Intelligence and Cybersecurity.

Interests

Machine Learning
Behavioral Modeling
Recommender Systems
Cybersecurity

Education

PhD in Computer Science, 2001

University of Pisa
MSc in Computer Science

University of Pisa

Research Topics

I am the scientific coordinator of the ICAR research group Behavioral Modeling and Scalable Analytics (formerly ADALab: Laboratory of Advanced Analytics on Complex Data).

The main focus of the research group is Behavior Computing and Analytics: that is, computationally efficient mathematical models for analysing complex systems and entities which interact within complex systems. Example include individuals, IoT Devices and sensors, Smart Object, etc. Behavior analytics is an important topic in different contexts including: consumer profiling, social computing, computational advertising and group-decision making, cybersecurity, opinion modeling, smart industry. The term “Behavior” refers to actions and reactions of any individual, in response to various stimuli or inputs. The recent advent of technologies for collecting and tracking behavioral data at large scale, has made it possible to devise new mathematical models that allow to analyse, understand, and predict actions. These include models for event streams, social network connections, purchasing habits and opinion formation. The main challenge is hence to understand structure and evolution dynamics of events, in a way that allows to disclose the latent mechanism which govern them and enact predictive abilities both on the short and long term.

The research agenda includes the study of probabilistic generative models and statistical inference, deep representation learning and constraint-based modeling (which encompasses integration of symbolic-subsymbolic learning). In particular our focus is on the following themes.

Latent variable models are among the most prominent tools for recommendation and social network analysis. Such tools can be especially fruitful in context such as (i) information diffusion, i.e. devising how information flows within a network of individuals; (ii) influence propagation, i.e. to identify the hidden factors explaining why, on the basis of his/her experience, a given subject is sensitive to specific informative content; (iv) recommendation, i.e. improving user experience within a specific situation by understanding the patterns governing the user’s choices.
Understanding the structural, topical and temporal dynamics of user behavior can provide insights on the complex patterns that govern the information propagation process and it can be used to forecast future events. Designing new methods that can both model their dynamics and predict future behavior is crucial in many domains, including: Medical events (where the focus is on sequences of acute incidents, doctor’s visits, tests, diagnoses, and medications, Consumer behavior (purchasing patterns), “Quantified self” data (such as wearable devices and apps to record eating, traveling, working, sleeping, waking) , Social media actions (previous posts, shares, comments, messages, etc.), smart cities and mobility patterns (trajectories, taxi/car/public transportation adoptions, etc.).
The capability to infer complex representations, as well as latent relationships among entities, enables a more powerful approach to information analytics. For example, within Recommender Systems, the adoption of more complex representations (such as Knowledge Graphs) extends the amount of information available, thus strengthening the connections among entities thus supporting more precise modeling, diversity and explainability.
Similarly, injecting domain constraints within machine learning allows to combine reasoning and learning, thus calibrating the resulting predictive and descriptive models to more realistic situations where bias and/or inaccuracies can be avoided.
Contributions to Security Intelligence include the development of techniques for 1) security analytics, based on behavioral profiling, to detect malicious activities and devise models of trust; 2) social sensing for prediction of sensitive information diffusion flows and secure information sharing; 3) attack prevention/response based on machine learning and AI to improve reaction to incidents; 4) privacy-preserving information handling based on theoretically guaranteed models of privacy.

Recent & Upcoming Talks

Characterizing Information diffusion. Social influence, propagation speed, polarization

I provide an overview on how information diffusion can be characterized and what factors influence the spreading process.

Feb 25, 2022 6:30 PM — 7:10 PM Virtual

Slides Follow

Threat Intelligence Platforms

Issues, open opportunities and challenges in Threat Intelligence Platforms, with a focus on AI integration.

Jan 20, 2022 8:30 AM — 10:30 AM Trento, Italy

Slides Follow

Adversarial Games for generative modeling of Temporally-Marked Event Sequences

I discuss some issues and solutions in devising generative models for marked temporal poin processes.

Feb 18, 2020 11:00 AM — 12:00 PM Torino, Italy

Slides Follow

Featured Publications

Massimo Guarascio, Nunziato Cassavia, Francesco Sergio Pisani, Giuseppe Manco

May 2022 Future Generation Computer Systems

Boosting Cyber-Threat Intelligence via Collaborative Intrusion Detection

Sharing threat events and Indicators of Compromise (IoCs) enables quick and crucial decision making relative to effective countermeasures against cyberattacks. However, the current threat information sharing solutions do not allow easy communication and knowledge sharing among threat detection systems (in particular Intrusion Detection Systems (IDS)) exploiting Machine Learning (ML) techniques. Moreover, the interaction with the expert, which represents an important component to gather verified and reliable input data for the ML algorithms, is weakly supported. To address all these issues, ORISHA, a platform for ORchestrated Information SHaring and Awareness enabling the cooperation among threat detection systems and other information awareness components, is proposed here. ORISHA is backed by a distributed Threat Intelligence Platform based on a network of interconnected Malware Information Sharing Platform instances, which enables the communication with several Threat Detection layers belonging to different organizations. Within this ecosystem, Threat Detection Systems mutually benefit by sharing knowledge that allows them to refine the underlying predictive accuracy. Uncertain cases, i.e. examples with low anomaly scores, are proposed to the expert, who acts with the role of oracle in an Active Learning scheme. By interfacing with a honeynet, ORISHA allows for enriching the knowledge base with further positive attack instances and then yielding robust detection models. An experimentation conducted on a well-known Intrusion Detection benchmark demonstrates the validity of the proposed architecture.

PDF DOI

Giuseppe Manco, Ettore Ritacco, Nicola Barbieri

March 2022 IEEE Transactions on Knowledge and Data Engineering

A Factorization Approach for Survival Analysis on Diffusion Networks

In this paper we propose a survival factorization framework that models information cascades by tying together social influence patterns, topical structure and temporal dynamics. This is achieved through the introduction of a latent space which encodes: (a) the relevance of an information cascade on a topic; (b) the topical authoritativeness and the susceptibility of each individual involved in the information cascade, and © temporal topical patterns. By exploiting the cumulative properties of the survival function and of the likelihood of the model on a given adoption log, which records the observed activation times of users and side-information for each cascade, we show that the inference phase is linear in the number of users and in the number of adoptions. The evaluation on both synthetic and real-world data shows the effectiveness of the model in detecting the interplay between topics and social influence patterns, which ultimately provides high accuracy in predicting users activation times.

PDF Code DOI

Angelica Liguori, Giuseppe Manco, Francesco Sergio Pisani, Ettore Ritacco

March 2022 IEEE International Conference on Data Mining, {ICDM} 2021, Auckland, New Zealand, December 7-10, 2021

Adversarial Regularized Reconstruction for Anomaly Detection and Generation

We propose ARN, a semisupervised anomaly detection and generation method based on adversarial reconstruction. ARN exploits a regularized autoencoder to optimize the reconstruction of variants of normal examples with minimal differences, that are recognized as outliers. The combination of regularization and adversarial reconstruction helps to stabilize the learning process, which results in both realistic outlier generation and substantial detection capability. Experiments on several benchmark datasets show that our model improves the current state-of-the-art by valuable margins because of its ability to model the true boundaries of the data manifold.

PDF Code DOI

Giuseppe Manco, Ettore Ritacco, Antonino Rullo, Domenico Saccà, Edoardo Serra

February 2022 WIREs Data Mining and Knowledge Discovery

Machine learning methods for generating high dimensional discrete datasets

The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two-step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset 𝑋′that preserves the main characteristics of X. This survey explores two possible approaches: (1) Constraint-based generation and (2) probabilistic generative modeling. The former is devised using inverse mining (IFM) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling (PGM) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons.

PDF DOI

Bruno Veloso, Luciano Caroprese, Matthias König, Sònia Teixeira, Giuseppe Manco, Holger H. Hoos, João Gama

September 2021 Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Bilbao, Spain, September 13-17, 2021

Hyper-Parameter Optimization for Latent Spaces in Dynamic Recommender Systems

A Bayesian generative model is presented for recommending interesting items and trustworthy users to the targeted users in social rating networks with asymmetric and directed trust relationships. The proposed model is the first unified approach to the combination of the two recommendation tasks. Within the devised model, each user is associated with two latent-factor vectors, i.e., her susceptibility and expertise. Items are also associated with corresponding latent-factor vector representations. The probabilistic factorization of the rating data and trust relationships is exploited to infer user susceptibility and expertise. Statistical social-network modeling is instead used to constrain the trust relationships from a user to another to be governed by their respective susceptibility and expertise. The inherently ambiguous meaning of unobserved trust relationships between users is suitably disambiguated. An intensive comparative experimentation on real-world social rating networks with trust relationships demonstrates the superior predictive performance of the presented model in terms of RMSE and AUC.

PDF Code DOI

Corrado Monti, Giuseppe Manco, Cigdem Aslay, Francesco Bonchi

June 2020 CIKM ‘21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Learning Ideological Embeddings from Information Cascades

Modeling information cascades in a social network through the lenses of the ideological leaning of its users can help understanding phenomena such as misinformation propagation and confirmation bias, and devising techniques for mitigating their toxic effects. In this paper we propose a stochastic model to learn the ideological leaning of each user in a multidimensional ideological space, by analyzing the way politically salient content propagates. In particular, our model assumes that information propagates from one user to another if both users are interested in the topic and ideologically aligned with each other. To infer the parameters of our model, we devise a gradient-based optimization procedure maximizing the likelihood of an observed set of information cascades. Our experiments on real-world political discussions on Twitter and Reddit confirm that our model is able to learn the political stance of the social media users in a multidimensional ideological space.

PDF Code Slides Video DOI

Antonio L. Alfeo, Mario G.C.A. Cimino, Giuseppe Manco, Ettore Ritacco, Gigliola Vaglini

June 2020 Pattern Recognition Letters 136 (2020)

Using an autoencoder in the design of an anomaly detector for smart manufacturing

According to the smart manufacturing paradigm, the analysis of assets’ time series with a machine learning approach can effectively prevent unplanned production downtimes by detecting assets’ anomalous operational conditions. To support smart manufacturing operators with no data science background, we propose an anomaly detection approach based on deep learning and aimed at providing a manageable machine learning pipeline and easy to interpret outcome. To do so we combine (i) an autoencoder, a deep neural network able to produce an anomaly score for each provided time series, and (ii) a discriminator based on a general heuristics, to automatically discern anomalies from regular instances. We prove the convenience of the proposed approach by comparing its performances against isolation forest with different case studies addressing industrial laundry assets’ power consumption and bearing vibrations.

PDF DOI

Noveen Sachdeva, Giuseppe Manco, Ettore Ritacco, Vikram Pudi

January 2019 Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining - WSDM ′19

Sequential Variational Autoencoders for Collaborative Filtering

Variational autoencoders were proven successful in domains such as computer vision and speech processing. Their adoption for modeling user preferences is still unexplored, although recently it is starting to gain attention in the current literature. In this work, we propose a model which extends variational autoencoders by exploiting the rich information present in the past preference history. We introduce a recurrent version of the VAE, where instead of passing a subset of the whole history regardless of temporal dependencies, we rather pass the consumption sequence subset through a recurrent neural network. At each time-step of the RNN, the sequence is fed through a series of fully-connected layers, the output of which models the probability distribution of the most likely future preferences. We show that handling temporal information is crucial for improving the accuracy of the VAE: In fact, our model beats the current state-of-the-art by valuable margins because of its ability to capture temporal dependencies among the user-consumption sequence using the recurrent encoder still keeping the fundamentals of variational autoencoders intact.

PDF Code Slides DOI