2020 – Machine Learning + Security

The 2020 program plans to recruit 9 undergraduate students and engage them in research projects (some examples shown below). Selected REU students will receive the following support for the 10-week-long participation (May 25–July 31, 2020):

Stipend: $6,000
Pre-paid meal plan ($1,700 worth)
Free lodging at PSU’s dormitory (double occupancy)
Round-trip transportation (pre-approval required)
- Driving: up to $250
- Flying: up to $650

[UPDATE] In 2020, due to Covid-19, we ran a VIRTUAL REU Site program with 3 students as a pilot:

Rachael He @ U. Rochester, “Effects of Counterfactual Explanations on Detecting News Misinformation,” mentored by Aiping Xiong
Liam Boston @ Penn State, “Upskilling Rural Workers,” mentored by Ben Hanrahan => Penn State, Web Developer
Sophia Hager @ Smith College, “Detecting Medical Misinformation using Knowledge Graph,” mentored by Dongwon Lee => Johns Hopkins University, PhD Student

Each selected REU student may choose one project, mentored by 1-2 faculty member, working alone or as a team. The 2020 program, for instance, includes the following project ideas:

Fake News Mitigation (Dongwon Lee, Suhang Wang, Aiping Xiong): We currently witness an unprecedented proliferation of “fake news”—i.e., false stories to deceive or mislead people. Despite the urgency and importance of the problem, however, our understanding of fake news still remains insufficient. We do not clearly understand who produces and publishes them, what characteristics distinguish fake news from legitimate ones, why some people fall for fake news (and others do not), or how to best present detected fake news to users in a convincing manner. This project aims to answer some of these questions using machine learning models, especially supervise learning approaches.
Cognitive Models to Predict Frauds (Aiping Xiong): Many attacks leverage the aspects and limitations of human attention and memory. Cognitive science has long developed computational frameworks to describe such limitations. How can cognitive models be used to predict and simulate attacks, given a user interface or script that describes a human-human or human-computer interaction? This project will collect behavioral data in controlled settings, such as e-mail fishing attacks, or in post-transaction marketing scenarios and develop cognitive models that can explain and predict what a computer user might pay attention to, what they will and will not recall, and how they will react, paving the way for predicting and preventing fraud. Students will contribute to build a prediction model using statistical inference.
Smart Contract Fraud (Dinghao Wu): As the Blockchain and smart contract become more and more popular, it becomes important to detect potential cyber vulnerabilities and frauds in the execution of smart contracts. Real-world attacks and huge finance loss have been reported. This project investigates machine learning based methods to detect potential vulnerabilities in smart contracts, and to mitigate the problem with binary code analysis and formal methods.
Legitimate Domain Names ∩ Legitimate Friends (Aiping Xiong): Phishing attacks against social media are on the rise and keep evolving. Traditional security detection and protection mechanisms cannot provide users full protection, requiring that users make the final decisions. This multi-disciplinary research project aims to integrate phishing warning and training to address phishing scams on social media. One goal is to understand how human information-processing and conceptions of trust influence users’ clicks on URLs delivered by phishing messages via social media apps. Another is to develop scalable embedded-training warning notices to increase users’ knowledge and skills for counteracting social engineering attacks and reducing their online information disclosure. The project also proposes mechanisms to promote users’ sharing what they learned from security training with friends, to improve the cybersecurity and privacy of the online community.
Crowdsourcing and Misbehavior (Anna Squicciarini, Kenneth Huang, Ben Hanrahan): This project studies scenarios where misinformation is not generated by intentional deviant actors (e.g., trolls), but by loosely-informed crowdsource workers, who add “noise” to commonly reliable information channels. The research hypothesis is that misinformation from the “crowd” is highly dangerous and potentially more impactful than carefully crafted fake news sources. Using statistical and machine learning methods, the project plans to develop computational models to detect potential crowdsourcing workers and misinformation generated by such workers.
Astroturfing (Sencun Zhu, Suhang Wang): We are living in a world where reputation has become an economy, featuring the ubiquitous rating of everything, from e-commerce to mobile ecosystem (apps), from Uber drivers and their passengers to physicians to teachers. However, is reputation accurately reflected by ratings, reviews, popularity? Oftentimes it is not. Besides being subjective o biased when users give ratings, fake ratings/reviews are also prevalent these days. To reduce the harm caused by such manipulated information (e.g., installing low-quality apps or malware), this research will focus on detecting manipulated ratings/reviews/raters (so called “astroturfing”) on mobile app stores using machine learning techniques and introducing market intervention.
Direct Manipulation of Algorithms (Ben Hanrahan): Although users are encountering increasingly complex algorithms that impact the security and usability of the systems, they are not well aware that a system is utilizing an algorithm and rarely understand how algorithms function. Making algorithms a more visible, central part of user interactions is important as users who do become aware of the presence of algorithms experience an increased feeling of control. However, aside from recognizing the current societal impacts of these algorithms, there is a gap in this discussion around the mechanics of exactly how users will understand and exercise control over these algorithms.
Privacy in Conversational Assistants (Anna Squicciarini): This research focuses on crowd-powered conversational assistants, which leverages human workers to collectively serve as personal assistants for user. One apparent concern is user privacy. Although users were explicitly informed that the system was operated by human workers, some users mentioned their sensitive personal information (e.g., phone number or address) to workers. This project proposes two approaches to protect user’s privacy—i.e., sensitive information detection using machine learning methods and hiding content from workers.
Chatbot-Based Deception (Dongwon Lee, Kenneth Huang): As chatbots such as Apple’s Siri or Amazon’s Alexa are quickly gaining popularity, many people view them as one of top consumer applications for AI. The gist of such chatbot-related technologies is an AI engine to simulate human-like conversation. If human cannot tell whether she is conversing with another human or chatbot, then such a chatbot has succeeded in simulating human-like conversation, and can rightfully claim that it has passed the Turing Test. In a reverse setting, then, we ask if machine can tell if one is conversing with another human or chatbot? Such a setting may occur, for instance, in social engineering attacks, where naïve users converse with a chatbot that pretends to be a human. If one detects whether the other party in conversation is human or chatbot using machine learning techniques, one can alert such naïve users about potential dangers. This project will build a benchmark dataset for reverse Turing test and develop machine learning models to classify human vs. machine artifacts.
Human Perceptions of Machine-Generated Narratives (Kenneth Huang): Disinformation is just one of many techniques for manipulating audiences online; there are other, more subtle techniques and campaigns that target the moral psychology of the audience, manipulate the social structure of online communities, and undertake a variety of social engineering measures to achieve strategic effects. The use of trolls, bot-armies, and “cyborgs” – humans whose influence and reach are amplified by technical means have reached epidemic proportions on social media platforms. In this project, we will study human perceptions of machine-generated stories and image captions to understand the potential malicious use and possible defensive strategies of such technologies.