Security and Privacy Undergraduate Research (SPUR) Scholars Program
Carnegie Mellon's Security and Privacy Undergraduate Research (SPUR) Scholars program is an opportunity for undergraduate students to spend a summer working with some of the world's leading Security and Privacy faculty researchers. A number of projects are available in diverse areas such as hardware security, usable privacy and security, software testing, program analysis, cryptography, and verification. Accepted students will work closely with CMU faculty and researchers on research problems with the potential for publication and significant impact on the future practice of software engineering and related areas.
Summer 2026 will be the first year of SPUR, but the program is modeled after our successful and long-running Research Experiences for Undergraduates in Software Engineering (REUSE) program. Past REUSE participants have joined top Ph.D. programs (including CMU, Berkeley, and UW), published in major conferences, and won prestigious honors like the NSF Graduate Research Fellowship. We are excited to see what you will accomplish with SPUR!
What will you do?
- Conduct cutting-edge research in security and privacy. You will be matched with an appropriate research project and work with an expert mentoring team to bring the project to fruition. For more information, see the list of eligible Research projects below.
- Spend 10 weeks doing research with Carnegie Mellon University's #1 ranked School of Computer Science.
- Receive mentoring from world leaders in their fields.
- Learn research skills in undergraduate seminars throughout the summer.
SPUR Projects
The following REUSE research projects planned for summer 2026 fall under the SPUR scholars program. For a full list of REUSE projects, please visit the Research page.
A Conversational Privacy Assistant for the Internet of Things
Description and Significance
The objective of this project is to empower people to better understand and control what information about them is being collected by Internet of Things (IoT) technologies commonly found in cities today. The project revolves around a collaboration between Carnegie Mellon University (CMU), the California State University, Long Beach (CSULB) and the City of Long Beach. It builds on and extends CMU's IoT Privacy Infrastructure [DDS+18] to increase people's awareness of IoT technologies around them and their data practices, and also provides them with interfaces that enable them to exercise privacy rights such as those mandated by the California Consumer Privacy Act (CCPA). As part of this research, Carnegie Mellon University's team has adopted a user-centric design approach to developing, evaluating and refining new functionality designed to increase people's awareness of and control over data collected about them. This includes the adoption of Authorized Agent functionality intended to significantly reduce user burden when it comes to exercising privacy rights such as opting out of the sale of one's data, requesting a copy of data collected about oneself or requesting the deletion of their data. This new functionality will be made available to users through an IoT Assistant app available both in the iOS app store.
While some elements of functionality offered in our IoT Assistant app best lend themselves to visual presentation, we believe that there are also significant opportunities to enhance the app with dialog functionality, where, using speech, users explore what is being collected about them, for what purpose, and what rights they might have available to restrict some of these practices. Here again, we propose to take a user-centric approach to prototyping, evaluating and refining GenAI functionality designed to explore this area. The REU student working on this part of the project will work alongside the CMU PI, Norman Sadeh, and other members of the team to contribute to the design, evaluation and refinement of GenAI functionality. This will include designing study protocols aimed at informing and refining the design of this functionality and developing APIs that can be accessed by the GenAI to answer relevant questions the user might have. the Google PlayStore.
Accelerated Software Testing
An Internet of Things Privacy Assistant
Description and Significance
The objective of this project is to empower people to better understand and control what information about them is being collected by Internet of Things (IoT) technologies commonly found in cities today. The project revolves around a collaboration between Carnegie Mellon University (CMU), the California State University, Long Beach (CSULB) and the City of Long Beach. It builds on and extends CMU's IoT Privacy Infrastructure to increase people's awareness of IoT technologies around them and their data practices, and also provides them with interfaces that enable them to exercise privacy rights such as those mandated by the California Consumer Privacy Act (CCPA). As part of this research, Carnegie Mellon University's team has adopted a user-centric design approach to developing, evaluating and refining new functionality designed to increase people's awareness of and control over data collected about them. This includes the adoption of Authorized Agent functionality intended to significantly reduce user burden when it comes to exercising privacy rights such as opting out of the sale of one's data, requesting a copy of data collected about oneself or requesting the deletion of their data. This new functionality will be made available to users through an IoT Assistant app available both in the iOS app store and the Google PlayStore.
Our research has shown that different people have different expectations and preferences about the data collection and use practices they want to be notified about, including the frequency of these notifications. Our research has also shown that selective notifications and nudges can go a long way in motivating users to engage with privacy choices available to them and to take advantage of privacy rights made available to them by different vendors As part of the research we plan to conduct over the summer, we propose to implement and evaluate different configurations of notification and nudging functionality. The REU student working on this part of the project will work alongside the CMU PI, Norman Sadeh, and other members of the team to contribute to the design, evaluation and refinement of notification and nudging functionality. This could also include functionality to review what data about oneself is likely to have been collected by different entities over the past hour, past 24 hours, using different filters to zoom in on data practices one is particularly interested in/concerned about.
Applied cryptography for ID checking, aggregating sensitive data, and image processing
BorrowSanitizer: Finding Ownership Bugs in Large-Scale Rust and C/C++ Applications
Generative AI's risks to identity verification
Improving the Usability of Formal Verification
Large-Scale Analysis of Telegram Bot Source Code
Description and Significance
Telegram, initially a messaging app, has grown into a software infrastructure to support many web services with 1 billion users. In particular, any Telegram account can deploy the programmable app, Telegram bots, to help accelerate their services. For instance, bots can process payments, authenticate users, and serve as customer service agents. Deploying bots is simple: it’s just a matter of writing a Telegram bot script and running it on one’s machine or in the cloud – there are barely any checks or moderation. This convenience lowers barriers to entry for software development, but comes at a cost. First, while most bots are benign, cybercriminals also use bots to automate and scale up their operations. For example, they can process payment for illegally obtained goods or host malicious AI endpoints (e.g., producing non-consensual images) [1]. Second, we suspect that some Telegram bots are poorly implemented, leaving them vulnerable to security flaws. Indeed, some security vulnerabilities have already been reported, such as missing authorization checks [2] and malicious code injection [3]. In this project, we will leverage the fact that some bots make their code public (for transparency and reusability purposes) to perform large-scale analysis on Telegram bot source code. First, we will compile the list of open-sourced Telegram bots and download their GitHub repository. Given that there is no central bot directory, we will 1) utilize our 800 million Telegram messages dataset to extract their GitHub URLs (expecting to contain at least a few hundred of them), 2) perform keyword searches on the GitHub API, and 3) find third-party websites that feature Telegram bot repositories. Second, we will identify bot interactions. For example, we can extract URLs in the repository to identify the types of services they often interact with (e.g., third-party payment providers, AI endpoints). Third, we will perform the static analysis to detect security vulnerabilities, defects, and low-quality or inefficient code. We may use LLMs to scale up our program analysis across different programming languages. We expect this research to inform users of the potential risks associated with Telegram bots and establish better guidelines for bot developers. Our findings will also contribute to the understanding of how bots can be exploited for malicious purposes (e.g., through forking existing benign repositories).
Student Involvement
This project is multidisciplinary, ranging from data science, computer security, and software engineering. The first phase involves data collection: students will leverage our largest message dataset on Telegram and collect data using GitHub APIs. The second phrase focuses on large-scale source code analysis, using basic data science techniques to program analysis (potentially through LLMs). The ultimate goal is to submit this paper to computer security, measurement, or software engineering conferences, so students will experience the whole process of research from literature review, experimentation, to paper writing. An introductory level of computer security knowledge is preferred.
References
[1] Taro Tsuchiya, Haoxiang Yu, Tina Marjanov, Alice Hutchings, Nicolas Christin and Alejandro Cuevas. Bots as Infrastructure: A Large-Scale Study of Benign and Malicious Uses on Telegram. 2025. In submission.
[2] TeploBot - Telegram Bot for WP <= 1.3 - Telegram Bot Token Disclosure https://www.cve.org/CVERecord?id=CVE-2024-9627
[3] unzip-bot Allows Remote Code Execution (RCE) via archive extraction, password prompt, or video upload https://www.cve.org/CVERecord?id=CVE-2024-53992
Microarchitectural Attacks and Defenses
Modern piracy channels
Description and Significance
Back in the 1990s, USENET newsgroups were a hotbed of free-flowing discussion – think of them as the decentralized predecessors of Reddit. Essentially, anybody could run a news server, and participate in a distributed forum infrastructure. (Among modern platforms, Mastodon is perhaps the closest to this.) Today, regrettably, text-based newsgroups are moribund and appear to be mostly used for spam. On the other hand, a somewhat late innovation – the ability for newsgroups to support binary files – seems to have given a second-life to USENET as a semi-decentralized way to offer pirated content (movies, TV series, video games in particular). It appears that this phenomenon has so far not been extensively studied, if at all, in the academic literature. The goal of this project is to remedy this situation by performing a quantitative analysis of how USENET is being used in 2025: we want to find ways to characterize the amount of data available from binary newsgroups both in terms of storage and transit (i.e., what and how much content is available, and how much is being transferred every day, and how). A secondary objective is to also verify the hypothesis that text-based newsgroups are a ghost town.
Student Involvement
This project combines network security and measurements. The first phase will involve a complete qualitative description of how USENET binary groups are used to deliver pirated content: which newsgroups support this, what infrastructure supports this content delivery. The second phase will involve setting up a measurement apparatus to get as exhaustive a view of what is available on the network, potentially sampling some of the offerings to determine whether they are in fact what is advertised, or whether they include malware. The third phase – or stretch goal – will be to think about how to implement ways to measure how much traffic is going to these newsgroups every day. Experience in Python and SQL is desirable but not required.