2021 – Machine Learning + Security

[UPDATE]

2021 Program Schedule

Final Presentation (15 REU Students)

Melanie McCord @ New College of Florida (Mentor: Suhang): Covid Dashboard: A Website to Detect and Explain Misinformation
David Persaud @ New York Institute of Technology (Mentor: Aiping): The Effects of AI Explanations on the Human Mitigation of Misinformation => NYU, MS Student
Jasmine Mangat @ University of Massachusetts Amherst (Mentor: Anna): Data Collection for Personalized and Non-personalized Privacy-Sensitive Image Identification
Trysten Hess @ Oregon State University (Mentor: Ting): Simple Private Ride Sharing Protocol => UCSB, MS Student
Paul Nguyen @ Reed College (Mentor: Kenneth): Anonymity in the Ubuntu Dialogue Corpus => University of Wisconsin, PhD Student
Tanner Waltz @ University of Notre Dame (Mentor: Dongwon): Examining the Similarities between Neural Text Generation Models for Authorship Attribution => Purdue University, PhD Student
Joseph McCalmon @ Wake Forest University (Mentor: Dongwon): Comprehensible Abstract Policy Graphs for Policy-Level Explanations of Reinforcement Learning Agents, AAMAS 2022 (full paper) => Start-Up
Edward Burke @ Penn State University (Mentor: Linhai): Utilizing Machine Learning to Identify Unsafe Rust Code
Maggie Wu @ Amherst College (Mentor: Sarah): Unveiling Narratives with Unsupervised Learning Approaches => Fulbright Fellow, Taiwan
Vishweshwar Ramanakumar @ University of Florida (Mentor: Dinghao): Comparing LSTM and GNN models for attribute inferencing
Laura Gonzalez @ University of Puerto Rico Rio (Mentor: Sencun): Mobile App Store Rating Manipulation Detection with Machine Learning
Austin Hayes @ University of Maryland, College Park (Mentor: Hadi): Envy-freeness in Many-to-Many Matching Markets
Jordyn Dennis @ Penn State University (Mentor: Sarah): Social Implications of Cyber-Physical Threats to Critical Infrastructure in Megacities
Maria Rodriquez @ Arizona State University (Mentor: Ben): Human Direct Manipulation of Algorithms
Nathan Ganger @ University of Mount Union (Mentor: Hadi): Refugee Matching

The 2021 program, running fully VIRTUAL due to Covid-19, plans to recruit 10-12 undergraduate students and engage them in research projects on the topic of “Machine Learning in Cybersecurity.” Selected REU students will receive the stipend support of ~~$6,000~~ $8,000 for 10-week-long participation. This year, as the program runs virtually, there will be NO support for lodging, meal, and travel.

Each selected REU student may choose one project, mentored by 1-2 faculty member, working alone or as a team. The 2021 program, for instance, includes the following project ideas:

#1: Strategic Manipulation in Multi-Agent Systems (Hadi Hosseini): With the ever-increasing deployment of distributed AI technologies and online marketplaces, much attention is being drawn to decision making in multi-agent settings to promote societal values such as fairness and truthfulness. This area has tracked popularity with the rise of a variety of applications in ride-sharing platforms, gig-economies, and online charitable organizations that require fair distribution of resources. Yet, most these applications are prone to strategic manipulation by the participating agents. This project aims to investigate the strategic aspect of decision making in multi-agent settings with the goal of devising algorithmic solutions that are fair and incentivize truthful behavior.

#2: Social Media Indicators of Attack on Cyber-Physical Infrastructure (Sarah Rajtmajer): Cyber-attack on increasingly connected cyber-physical critical infrastructure has become a key concern for the national security community. Recent incidents have heightened these worries, as key adversaries have gained unauthorized access to US networks. Early work suggests the promise of social media indicators as correlates to cyberattack on cyber-physical infrastructure. This project will explore approaches to utilize social media to forecast cyberattack.

#3: Crowdsourcing Detection of Disinformation in Social Media (Sarah Rajtmajer): As disinformation campaigns associated with state and non-state actors gain sophistication, timely detection is increasingly difficult. Work is emerging to understand where human intuition succeeds and fails to recognize these campaigns. This project will explore the potential of creative assembly of user feedback (e.g., markets, gamification) for detection of disinformation campaigns on social media.

#4: Effect of Counterfactual Explanations in Mitigating Misinformation on Social Media (Aiping Xiong): With the growth of machine learning (ML) usage in everyday settings, understanding ML models’ behavior and underlying decision-making is critical to increasing people’s trust in and acceptance of ML models. Explanations have been proposed to help users understand the labels of fake news articles detected by ML algorithms, thus mitigating the spread of misinformation on social media platforms. While those explanations reveal some details of the model-specific features, it is unclear whether users can understand those features and their impacts on the labeling of misinformation. Empirical evidence in psychology literature indicates that humans prefer contrastive explanations in their everyday explanations. This project aims to examine the effect of counterfactual explanations in the context of veracity evaluation of news headlines by comparing them to other methods, e.g., fact-checking warning tags. Students will contribute to design counterfactual explanations and human-subject experiments to evaluate the proposed explanations.

#5: Identifying Unsafe Rust Code Using Machine Learning (Linhai Song): Rust is a new programming language, and it is widely adopted to build safety-critical software recently. Rust separates its source code into safe code and unsafe code and conducts strict compiler checks to rule out memory bugs for safe code, leaving the correctness of unsafe code to solely rely on programmers. A recent empirical study shows that all memory bugs in Rust are in unsafe code, and Rust’s safe code is really safe. Unfortunately, the safe/unsafe information is discarded by the Rust compiler when it compiles Rust programs into binary executables. However, knowing which part of binary code is unsafe has many security implications (e.g., guiding the detection of vulnerabilities in Rust executables). This project is to explore how to use machine learning to identify unsafe code in Rust binary code and improve the bug/vulnerability detection for Rust.

#6: Direct Manipulation of Algorithms (Ben Hanrahan): Although users are encountering increasingly complex algorithms that impact the security and usability of the systems, they are not well aware that a system is utilizing an algorithm and rarely understand how algorithms function. Making algorithms a more visible, central part of user interactions is important as users who do become aware of the presence of algorithms experience an increased feeling of control. However, aside from recognizing the current societal impacts of these algorithms, there is a gap in this discussion around the mechanics of exactly how users will understand and exercise control over these algorithms.

#7: Human Perception of Machine-Generated Narratives (Kenneth Huang): Disinformation is just one of many techniques for manipulating audiences online; there are other, more subtle techniques and campaigns that target the moral psychology of the audience, manipulate the social structure of online communities, and undertake a variety of social engineering measures to achieve strategic effects. The use of trolls, bot-armies, and “cyborgs” – humans whose influence and reach are amplified by technical means have reached epidemic proportions on social media platforms. In this project, we will study human perceptions of machine-generated stories and image captions to understand the potential malicious use and possible defensive strategies of such technologies.

#8: Privacy in Conversational Assistants (Anna Squicciarini): This research focuses on crowd-powered conversational assistants, which leverages human workers to collectively serve as personal assistants for user. One apparent concern is user privacy. Although users were explicitly informed that the system was operated by human workers, some users mentioned their sensitive personal information (e.g., phone number or address) to workers. This project proposes two approaches to protect user’s privacy—i.e., sensitive information detection using machine learning methods and hiding content from workers.

#9: Smart Contract Fraud (Dinghao Wu): As the Blockchain and smart contract become more popular, it becomes important to detect potential cyber vulnerabilities and frauds in the execution of smart contracts. Real-world attacks and huge finance loss have been reported. This project investigates machine learning based methods to detect potential vulnerabilities in smart contracts, and to mitigate the problem with binary code analysis and formal methods.

#10: Astroturfing (Sencun Zhu): We live in a world where reputation has become an economy, featuring the ubiquitous rating of everything, from e-commerce to mobile ecosystem (apps), from Uber drivers and their passengers to physicians to teachers. However, is reputation accurately reflected by ratings, reviews, popularity? Oftentimes it is not. Besides being subjective o biased when users give ratings, fake ratings/reviews are also prevalent these days. To reduce the harm caused by such manipulated information (e.g., installing low-quality apps or malware), this research will focus on detecting manipulated ratings/reviews/raters (so called “astroturfing”) on mobile app stores using machine learning techniques and introducing market intervention.

#11: Fake News Mitigation (Suhang Wang): We currently witness an unprecedented proliferation of “fake news”—i.e., false stories to deceive or mislead people. Despite the urgency and importance of the problem, however, our understanding of fake news still remains insufficient. We do not clearly understand who produces and publishes them, what characteristics distinguish fake news from legitimate ones, why some people fall for fake news (and others do not), or how to best present detected fake news to users in a convincing manner. This project aims to answer some of these questions using machine learning models, especially supervise learning approaches.

#12: Ridesharing in Whisper (Ting Wang): Description: Real-time ride sharing (a.k.a. dynamic carpooling) is a service that arranges one-time shared rides on very short notice. Like carpooling, real-time ride sharing is promoted to better utilize the empty seats in most passenger cars, thus lowering fuel usage and transport costs. Yet, to receive such services, the
passengers inevitably have to expose their desired travel routes with others, which incurs significant privacy risks. In this project, we are building a privacy-preserving ride sharing system, in which the passengers do not reveal their travel routes unless enough number of other passengers plan to take the same or similar trajectories. Specifically, we plan to build the system upon Private Set Intersection, a primitive enabling the computation of set intersection in a privacy-preserving manner. During designing and building the system, we will address a set of non-trivial challenges including: providing rigorous privacy protection, scaling the system to a large number of users, integrating the system with a social network platform.

#13: Attacking Fake/Fraud Detection Models (Dongwon Lee): The Security research community has developed many state-of-the-art machine learning models that can accurately detect diverse types of cyber frauds and fakes (e.g., fake news detector, social-bot detector, phishing email classifier). However, recently, a new type of attack, adversarial learning, has emerged, successfully demonstrating that it is in fact possible (and fairly easy) to fool such detection models to flip their verdicts (e.g., predicting fake to true or vice versa). This summer project investigates on this adversarial learning problem in depth. Basic understanding on machine learning and familiarity to Python would be helpful to carry out this research.

#14: Machine vs. Human: Turing Test (Dongwon Lee): The recent advancement in Deep Learning has enabled the synthetic generation of various artifacts with astonishing qualities in different modalities (e.g., text, image, video). Deepfake is one such an example. As we have more methods to be able to generate realistic-looking artifacts, a natural question emerges “can we differentiate machine-generated vs. human-generated artifacts?”–so called the Turing Test (TT). In this summer research, therefore, we plan to investigate on the problems, issues, and algorithms to be able to accurately differentiate machine-generated artifacts from human-generated ones. Basic understanding on machine learning and familiarity to Python would be helpful to carry out this research.