2022 – Machine Learning + Security

[UPDATE]

2022 Cohort

Alex Adams @ CMU (Mentor: Hadi)
Connie Yun @ UIUC (Mentor: Kelly)
Eric Xing @ West Kentucky U. (Mentor: Dongwon)
Hafsa Sultana @ U. Michigan (Mentor: Taegyu)
Isaac Wasserman @ Haverford U. (Mentor: Sharon)
Marcos A. Castro Trossi @ U. Puerto Rico (Mentor: Aiping)
Qing Hu @ Villanova U. (Mentor: Daniel)
Spencer Renenger @ Cornell U. (Mentor: Hadi)
Theodore Steiner @ U. North Carolina (Mentor: Rui)
Thomas Foltz @ Penn State U. (Mentor: Huijuan)
Zachariah Farahany @ Marquette U. (Mentor: Jinghui)

The 2022 program, running fully VIRTUAL due to Covid-19, plans to recruit 10-12 undergraduate students and engage them in research projects on the topic of “Machine Learning in Cybersecurity.” Selected REU students will receive the stipend support of $8,000 for 10-week-long participation. This year, as the program runs virtually, there will be NO support for lodging, meal, and travel. Each invited REU student is matched to one project, mentored by 1-2 faculty member, working alone or as a team. The 2022 program includes the following project ideas:

1. Fairness in Human-AI Interaction (Hadi Hosseini): Fairness is a fundamental requirement in many AI technologies that are developed for collective decision making (e.g. resource allocation). A variety of fairness concepts–rooted in philosophy and economics–have been in the center of attention in the past few decades. The literature in AI and economics have extensively studied these fairness notions along with other socially desirable properties (e.g. social welfare and truthfulness). Yet, despite advances in the theoretical and algorithmic understanding of these concepts, little is known about what individuals perceive as fair. This project aims at investigating the perception of fair solutions and the interaction between humans and fair AI systems.

2. Privacy Ramifications of Police Radio Communications (Shomir Wilson): Police radio communications provide a novel window for understanding law enforcement activity. Transcripts of these communications lend themselves to studying sociodemographic issues in policing, including disparities in privacy, as these communications routinely contain personal information of bystanders and suspects. In this project, we will use natural language processing and machine learning to identify mentions of personal information in transcripts of police radio communications to determine how often they occur, the significance of the information shared, and disparities in information sharing based upon individuals’ race, gender, and other demographics.

3. Gathering and Integrating Textual Information About Privacy (Shomir Wilson): Privacy policies are the standard method for organizations to communicate their privacy practices on the web and for mobile apps. However, privacy-related text can be found in other documents too, such as terms of service documents and cookie policies. In this project, we will use natural language processing and machine learning to identify privacy-related text in documents of a variety of types collected from the web, with the goal of integrating privacy information from disparate sources and finding ways to share it with privacy researchers and consumers.

4. Robot Controller State Protector Generation Framework using Machine Learning (Taegyu Kim) : Robots, such as quadruped walking robots, have been increasingly deployed in various domains, such as entertainment, military, and industry. However, there have been a lot of attack surfaces, such as spoofing sensors and manipulating controller states (e.g., a z-axis velocity). Existing researchers have researched controller state defense techniques to protect against those attacks. However, their techniques are applicable only to a certain model (e.g., a quadcopter – not a quadruped walking robot). To overcome the aforementioned limitation, this project aims to develop the automatic controller state protector generation framework for each specific robot model using machine learning.

5. Learning to Fix Programs through Natural Language Processing Techniques (Rui Zhang): In this project, we will build a coding assistant system to detect Vulnerabilities in computer programs and to secure the software by fixing the bugs. To this end, we will use NLP-Derived Tools based on large pretrained language models finetuned on code data, and use it to build a classifier to locate the code segment which contains bugs, and then build a generator to rewrite the code to fix the bug. Our systems will save tremendous human efforts and fix bugs before the code is compiled and run, greatly facilitating the computer program to operate correctly and securely.

6. Defending Against Malicious Attacks in Federated Learning (Jinghui Chen): Federated learning (FL) is a popular distributed machine learning paradigm which is capable of collaboratively training a global model without sharing clients’ data. It has been widely applied to various real-world applications including keyword spotting, activity prediction on mobile devices, smart sensing on edge devices, etc. However, its repetitive server-client communication gives room for possible malicious attacks such as backdoor attacks or poisoning attacks which aim to lower the model accuracy or mislead the model into a targeted misprediction. This project aims to find novel defense mechanisms for mitigating possible backdoors injected by one or more malicious clients during training. Basic understanding on machine learning and familiarity to Python and deep learning libraries would be helpful to carry out this research.

7. An Ethical Framework for Human-Machine Decision-Making (Daniel Susser):Important decisions are increasingly automated, delegated to algorithms susceptible to bias and other flaws. In order to guard against problems associated with automated decisions, it is often suggested that there should be a “human-in-the-loop”—i.e., some form of human review—at least in the case of high stakes decisions. But human decision-making is susceptible to its own biases and flaws, as well as to external influence. Today this includes a range of automated influences, such as targeted advertising, recommender systems, AI assistants, and digital nudges. While existing discussions tend to frame questions about these decision processes in binary terms—automated or not—this project aims to understand the normative implications of increasingly blended forms of human-machine decision-making. What ethics and policy questions are raised by decision-making systems that benefit from the strengths of both human and machine deciders, but also are subject to the weaknesses of each? How can ethics and policy guide a world in which individually and socially important decisions are reached by blended human-machine deciders?

8. Adversarial Images and Attacks (Sharon Huang): Adversarial images are images that have pixels purposely and intentionally perturbed to confuse and deceive deep neural network based image classification models so that the models may make incorrect predictions potentially resulting in harmful consequences. In this project, we will investigate how adversarial image attacks work and how we can defend against them. We will implement several attacks using Python and a deep learning library such as TensorFlow, as well as an algorithm for detecting such attacks.

9. AI and Magical Thinking (Kelley Cotter): The proliferation of algorithms in everyday life coincides with increasing levels of awareness of and meaning-making around them. Yet, knowledge about algorithms remains limited and is not universally distributed. When people do not fully understand a technology, they often turn to “magical thinking,” ascribing magical qualities to a technology in order to explain its functioning and impacts. This has the potential to divorce people from the reality of a system’s imperfections and any potential for harm, repositioning it instead as a utopian source of hope and optimism. This project aims to investigate how magical thinking shapes people’s attitudes towards, reliance on, trust in, and, ultimately, behavior around algorithmic systems. The project explores what societal consequences people’s fetishization of algorithms might carry, particularly as mainstream society begins to seriously grapple with algorithm-related concerns like data privacy, algorithmic bias and discrimination, filter/fringe bubbles, and the spread of mis/disinformation online.

10. Video Anomaly Detection (Huijuan Xu): Video anomaly detection tries to temporally localize and identify activities that deviate from normal behaviors given a sequence of video frames. It has received growing research interests due to its potential applications in autonomous surveillance systems, e.g. violent alerting, etc. A typical solution to video anomaly detection is frame reconstruction based approach, which first trains an unsupervised model on normal data, and then depending on whether the testing activities can be recognized by the trained model or not, they will be recognized as anomalies or not. In this project, we will experiment with this typical solution, as well as explore a new few-shot setting for video anomaly detection which is closer to real-world applications. We will investigate the effects of pre-trained model weights and self-supervised auxiliary tasks for video anomaly detection under few shot data regime.

11. Effect of Counterfactual Explanations in Mitigating Misinformation on Social Media (Aiping Xiong): Effect of Counterfactual Explanations in Mitigating Misinformation (Aiping Xiong): With the growth of machine learning (ML) usage in everyday settings, understanding ML models’ behavior and underlying decision-making is critical to increasing people’s trust in and acceptance of ML models. Explanations have been proposed to help users understand the labels of fake news articles detected by ML algorithms, thus mitigating the spread of misinformation on social media platforms. While those explanations reveal some details of the model-specific features, it is unclear whether users can understand those features and their impacts on the labeling of misinformation. Empirical evidence in psychology literature indicates that humans prefer contrastive explanations in their everyday explanations. This project aims to examine the effect of different types of counterfactual explanations in the context of veracity evaluation of news headlines by comparing them to other methods. Students will contribute to design counterfactual explanations and conduct human-subject experiments to evaluate the proposed explanations.

12. Neural Authorship Obfuscation (Dongwon Lee): The recent advancement in Deep Learning has enabled the synthetic generation of various artifacts with astonishing qualities in different modalities (e.g., text, image, video). Deepfake is one such an example. As we have more methods to be able to generate realistic-looking texts, in particular, a natural question emerges “can we identify the authorship of machine-generated texts among k possible language models as authors?”–so called the Authorship Attribution (AA) problem. Further, “is it possible to hide the authorship of machine-generated texts via masking part of texts?”–the Authorship Obfuscation (AO) problem. In this summer research, we devise algorithms and run experiments to answer both research questions. Basic understanding on machine learning and familiarity to Python would be helpful but not required.