Carnegie Mellon University

Research

Carnegie Mellon’s Software and Societal Systems Department (S3D) hosts an active research group with a highly interdisciplinary approach to software engineering. Indeed, we believe that interdisciplinary work is inherent to software engineering. The field of software engineering (SE) is built on computer science fundamentals, drawing from areas such as algorithms, programming languages, compilers, and machine learning. At the same time, SE is an engineering discipline: both the practice of SE and SE research problems revolve around technical solutions that successfully resolve conflicting constraints. As such, trade-offs between costs and benefits are an integral part of evaluating the effectiveness of methods and tools.

Emerging problems in the area of privacy, security, and mobility motivate many challenges faced by today’s software engineers, motivating new solutions in SE research. Because software is built by people, SE is also a human discipline, and so research in the field also draws on psychology and other social sciences. Carnegie Mellon faculty bring expertise from all of these disciplines to bear on their research, and we emphasize this interdisciplinary approach in our REU Site.

Below, you'll find projects we are planning for summer 2026. Please check back as we continue to add new projects to the list.

Mentor:  Norman Sadeh

Description and Significance
The objective of this project is to empower people to better understand and control what information about them is being collected by Internet of Things (IoT) technologies commonly found in cities today. The project revolves around a collaboration between Carnegie Mellon University (CMU), the California State University, Long Beach (CSULB) and the City of Long Beach. It builds on and extends CMU's IoT Privacy Infrastructure [DDS+18] to increase people's awareness of IoT technologies around them and their data practices, and also provides them with interfaces that enable them to exercise privacy rights such as those mandated by the California Consumer Privacy Act (CCPA). As part of this research, Carnegie Mellon University's team has adopted a user-centric design approach to developing, evaluating and refining new functionality designed to increase people's awareness of and control over data collected about them. This includes the adoption of Authorized Agent functionality intended to significantly reduce user burden when it comes to exercising privacy rights such as opting out of the sale of one's data, requesting a copy of data collected about oneself or requesting the deletion of their data. This new functionality will be made available to users through an IoT Assistant app available both in the iOS app store.

While some elements of functionality offered in our IoT Assistant app best lend themselves to visual presentation, we believe that there are also significant opportunities to enhance the app with dialog functionality, where, using speech, users explore what is being collected about them, for what purpose, and what rights they might have available to restrict some of these practices. Here again, we propose to take a user-centric approach to prototyping, evaluating and refining GenAI functionality designed to explore this area. The REU student working on this part of the project will work alongside the CMU PI, Norman Sadeh, and other members of the team to contribute to the design, evaluation and refinement of GenAI functionality. This will include designing study protocols aimed at informing and refining the design of this functionality and developing APIs that can be accessed by the GenAI to answer relevant questions the user might have. the Google PlayStore.

Mentors: Joshua Sunshine and Brad A. Myers

Description and Significance
Generating robust software tests has been called one of the most difficult tasks in software engineering, and software testing labor is estimated to cost at least $48 billion USD per year in the United States. Despite the high cost and the widespread automation elsewhere in software engineering, practitioners continue to write tests manually.

Since the 1970s, Automatic Test Suite Generators (ATUGs) have offered a fully automated approach to test generation. But practitioners find that such tools can generate incorrect and hard-to-understand tests that take more time to find and fix than it would have taken to manually write the tests in the first place. The Accelerated Testing Project seeks to explore the large middle ground between "full automation" and "no automation" to identify how strategic tool support might help software engineers write tests that find more bugs faster than is possible with either the fully-manual or fully-automated approaches.

Student Involvement
Student researchers on this project contribute to a state-of-the-art open-source automated testing tool, called NaNofuzz, that is developed at Carnegie Mellon University and that has been featured on Hacker News and several publications. With the support of experts, student researchers can expect to collaboratively investigate, design, build, and evaluate new aspects of NaNofuzz that aim to solve complex and multi-faceted testing problems that affect the daily work of millions of software engineers worldwide. In addition to contributing to a software product that others may download and use, students may gain experience collecting and analyzing empirical data, experimental design, scientific writing, and developing a diverse set of research skills in areas such as programming languages, human-computer interaction, algorithms, artificial intelligence, and software engineering. Lean more about NaNofuzz here: https://github.com/nanofuzz/nanofuzz

Mentor: Justin Chan

Description and Significance
Imagine watching an ad or movie on an augmented reality display (e.g. Meta Orion Glasses, HoloLens). Playing on top of the video is an imperceptible visual overlay that flickers subtly to convey information to the brain, similar to a QR code, that could be used to record your attention and gaze as you watch it, without the use of any cameras.

Imagine navigating a sophisticated user interface on smart glasses while on the go using only your eyes and attention, without the use of tiring or awkward hand gestures and voice commands.

SSVEPs (steady-state visual evoked potentials) are a robust brain-computer interaction that can enable screen interaction using only visual attention and minimal user training. They have been successfully used in high-precision applications including virtual keyboards and robotic control with 97-99% accuracy.

When a user focuses on a flickering target, the visual cortex automatically generates detectable brain waves (EEG) at the same frequency, allowing real-time, hands-free selection.

However, conventional SSVEP systems use flicker frequencies that can cause substantial visual fatigue.

Here, we propose an imperceptible SSVEP-based BCI tailored for extended reality (XR) devices like smart glasses and headsets that leverage AI-generated flicker patterns that is imperceptible to the eye.

This project proposes the high refresh rate of modern augmented reality displays (conventionally used to render the world without causing cyber sickness to the user), to flash visual icons at high frequencies faster than the eye can perceive, and use this to encode information. Through carefully designed flashes, we can induce EEG brain signals at the flashing frequency.

Given recent patents from Apple and other technology companies to integrate flexible electrodes for physiological sensing around the face in their augmented reality devices, this work has the potential to lay important foundations for new interactions mechanisms with visual-based media in everyday environments.

Student Involvement
The student will have the opportunity to lead or join a project that can lead to publication at a venue in mobile systems, ubiquitous computing or HCI.

The student will engage in hardware prototyping with microcontrollers and sensors, and the design of applied signal processing, and machine learning pipelines. The exact responsibilities of the students will be tailored based on his or her exact background and skillset.

The student will benefit from being part of a vibrant lab community that is highly engaged with their work, and supportive of each other's growth and development. The student can expect to benefit from a faculty and senior PhD team that will provide hands-on research and career mentorship through the internship, and if successful, after the internship through till publication of the project.
Mentor: Norman Sadeh

Description and Significance
The objective of this project is to empower people to better understand and control what information about them is being collected by Internet of Things (IoT) technologies commonly found in cities today. The project revolves around a collaboration between Carnegie Mellon University (CMU), the California State University, Long Beach (CSULB) and the City of Long Beach. It builds on and extends CMU's IoT Privacy Infrastructure to increase people's awareness of IoT technologies around them and their data practices, and also provides them with interfaces that enable them to exercise privacy rights such as those mandated by the California Consumer Privacy Act (CCPA). As part of this research, Carnegie Mellon University's team has adopted a user-centric design approach to developing, evaluating and refining new functionality designed to increase people's awareness of and control over data collected about them. This includes the adoption of Authorized Agent functionality intended to significantly reduce user burden when it comes to exercising privacy rights such as opting out of the sale of one's data, requesting a copy of data collected about oneself or requesting the deletion of their data. This new functionality will be made available to users through an IoT Assistant app available both in the iOS app store and the Google PlayStore.

Our research has shown that different people have different expectations and preferences about the data collection and use practices they want to be notified about, including the frequency of these notifications. Our research has also shown that selective notifications and nudges can go a long way in motivating users to engage with privacy choices available to them and to take advantage of privacy rights made available to them by different vendors As part of the research we plan to conduct over the summer, we propose to implement and evaluate different configurations of notification and nudging functionality. The REU student working on this part of the project will work alongside the CMU PI, Norman Sadeh, and other members of the team to contribute to the design, evaluation and refinement of notification and nudging functionality. This could also include functionality to review what data about oneself is likely to have been collected by different entities over the past hour, past 24 hours, using different filters to zoom in on data practices one is particularly interested in/concerned about.
Mentor: Sarah Scheffler

Description and Significance
Recent applied cryptography research has made it practical to engage in privacy-preserving but still-verifiable protocols -- for example, analyzing aggregate statistics without revealing individual microdata, checking a credential/ID without revealing one's name, moderating content without revealing the content, analyzing aggregations of text messages without revealing individual messages, or proving one can execute a transaction without revealing one's full assets.

Our research group has been investigating several research questions on these topics at the intersection of applied cryptography and policy/law: Research questions include, Would these tools make tasks like age verification sufficiently private? How can applied cryptography work for a user to verify that the server is adhering to promises it made, for example that it is moderating only the content it claims? Can we tie proofs on data to real images to combat AI misinformation? Can we use these tools to improve privacy for survey participants in research on sensitive topics?

Project goals include applied cryptography contributions (building new kinds of applied crypto tools to be more efficient or expressive), new implementations and tools (to make it easier to use privacy-preserving computation in particular settings), and policy-socio-technical contributions (how should policy or law treat a particular technology?).

Student Involvement
Students will work with a mentor to answer research questions on the above topics. Most students will (learn how to) write "crypto code", such as code or circuits that can be used in Secure Multi-Party Computation, or Zero-Knowledge Proofs. Some students will also work with mentors to write proofs of security or privacy for cryptographic protocols. Some projects might involve reading data from blockchains, or writing data to blockchains, or working with blockchain wallets. Students are not required to have any cryptography experience in advance, but hopefully find these topics interesting to learn about.

References
- "Synopsis: Secure and private trend inference from encrypted semantic embeddings." Xiao, Jain, Gorelick, Scheffler (2025 preprint): https://arxiv.org/pdf/2505.23880

- "Public verification for private hash matching". Scheffler, Kulshrestha, Mayer (2023 IEEE S&P): https://eprint.iacr.org/2023/029

- "Privacy-preserving Age verification based on Improved Verifiable Credentials Framework" Liu, Scheffler (2025 workshop paper): https://conpro25.ieee-security.org/papers/liu-conpro25.pdf
Mentor: Joshua Sunshine

Description and Significance
Geometric proof is a foundational topic in mathematics, but it remains incredibly difficult for students to learn. Students must simultaneously develop conceptual understanding of geometry, learn the conventions of deductive reasoning, manage multi-step argument structure, and apply theorems appropriately. Unfortunately, most classroom settings provide limited opportunities for students to practice these individual skills with timely, targeted feedback. Recent advancements in educational technology and AI offer a promising path forward: software tools that can automatically analyze student-produced proofs and provide actionable feedback can help students build foundational proof skills more effectively.
We are currently exploring several approaches for lowering the barrier for students to learn geometric proof through interactive tooling and intelligent feedback:

1. Designing and implementing a lightweight solver capable of checking the correctness of high-school-level geometric proofs and diagnosing common student errors.
2. Building web-based interactive environments that support proof-solving activities and provide immediate, structured guidance as students work.
3. Investigating how large language models can be used to generate high-quality, useful feedback.

Student Involvement
This is an interdisciplinary project at the intersection of software engineering, human–computer interaction, mathematics education, and AI. Students will have the opportunity to engage with techniques in automated reasoning, intelligent tutoring systems, and interaction design for learning tools. They will also have the opportunity to contribute directly to an ongoing research project and research publication. Possible student projects include:

1. Designing and developing user interfaces for interactive proof construction tools, including visualizations of geometric diagrams and step-by-step proof structure.
2. Prototyping and evaluating lightweight proof-checking or proof-generating components targeted specifically at high-school geometry.
3. Exploring LLM-based methods for analyzing student proofs and generating feedback.

Students should have strong programming skills in at least one programming language; knowledge of web application development using JavaScript/TypeScript or React is strongly recommended. No prior experience with educational technology or mathematics education research is required—only an interest in building tools that help students learn complex STEM topics more effectively.
Mentor: Andrew Begel

Description and Significance
The Problem: Building software, particularly software that includes AI features, requires groups with diverse communication styles to negotiate risk. Without psychological safety – the belief that critique is welcomed – some voices stay silent, critique becomes shallow, learning narrows, and software quality suffers.

Our Solution: We are building an AI-assisted digital collaboration space that encourages users to practice psychologically safe communication and minimizes interpersonal risk during online, group software critiques.

What you’ll do. Generate software requirements through co-design workshops with software design and engineering students and instructors; Run a wizard-of-oz session where facilitators intervene with communication tips during a break in a studio session. Prototype a conversational AI interaction and user interface using Zoom, Google Speech, and OpenAI APIs.

Skills you’ll build. Rapid prototyping (TypeScript/React or Python), UX evaluation, and prompt/Large Language Model orchestration (optional).

Outputs. A working plug-in or scripted prototype, a design guideline “cookbook” for instructors, and evaluation figures suitable for a submission to an academic conference.

Who should apply? Self-motivated designers/engineers who enjoy building practical tools, running user studies, and translating findings into clear design recommendations.
Mentors: Joshua Sunshine and Jonathan Aldrich

Description and Significance
Memory safety errors have been the most common source of severe security vulnerabilities. The Rust programming language has helped to eliminate this threat by placing static restrictions on aliasing and mutability, providing safety guarantees and competitive performance. However, many security critical applications rely on memory unsafe languages. Replacing these applications with memory safe equivalents will take years, if not decades. Instead, new Rust components are being integrated within existing codebases written in C and C++.

To interoperate with these languages, developers need to use a set of "unsafe" features that bypass Rust's safety restrictions. When misused, these features can reintroduce the types of safety issues that Rust was designed to prevent and trigger new forms of undefined behavior related to Rust's aliasing rules. Developers can find these errors using Miri, a bug-finding tool that has become the de-facto solution for the Rust community. However, Miri has limited support for foreign functions, preventing it from being useful for many of the security-critical applications where Rust is being adopted.

We are creating BorrowSanitizer: a dynamic analysis tool for finding Rust-specific safety violations in multi-language applications. Our tool injects run-time checks at compile-time to detect when unsafe operations break Rust's aliasing rules—across foreign function boundaries. Our work is funded by Google through the Rust Foundation, and we aspire for BorrowSanitizer to become a production-ready tool.

Student Involvement
Students will join the BorrowSanitizer Team and contribute to the design, implementation, and evaluation of the tool. Potential projects could involve implementing and formalizing static and dynamic optimizations and investigating errors in production applications. This will be an exciting time to join our project. Today, we're close to having our first end-to-end implementation with support for real-world libraries. By Summer 2026, we anticipate that students will be contributing to a well-tested, fully-featured version of the tool.

Mentor: Christian Kästner

Description and Significance
Large Language Models (LLMs) are increasingly used as judges to evaluate model outputs—ranking responses, scoring summaries, and driving automatic benchmarks like MT-Bench and Chatbot Arena. However, these LLM judges are rarely engineered with rigor: they rely on ad hoc rubrics, show inconsistent agreement with humans, and are sensitive to minor prompt changes. This poses a reliability risk for model assessment pipelines, where evaluation outcomes directly influence research conclusions and deployment decisions.
From a software engineering (SE) perspective, LLM judges are part of the evaluation infrastructure and thus should be engineered, tested, and maintained as critical system components. We aim to introduce principles of requirements engineering, test automation, and safety assurance into the design of evaluation pipelines. The goal is to build methods and tooling that make LLM judge behavior observable, testable, and improvable in systematic ways.

Student Involvement
Students will explore responsible engineering practices for developing and maintaining reliable LLM-based evaluation systems. Possible directions include:

- Verification and Validation of Alignment: Develop lightweight, semi-automated checks to assess judge–human alignment (e.g., ensembles, meta-evaluation, reliability metrics). Operationalize “evaluation properties” (consistency, monotonicity, invariance) and design tests to flag violations.

- Tooling and Process Integration: Build prototype tools for CI/CD pipelines that run property checks, detect regressions, and version LLM judges – ideally a pytest-style framework for defining and reporting these properties.

This project emphasizes software engineering rigor in AI evaluation infrastructure. Students will gain experience with evaluation and experimenting with LLMs, including building reproducible pipelines, defining testable evaluation properties, and empirically studying alignment and reliability. Familiarity with Python is expected; prior ML / LLM experience is helpful but not required.

Mentor: Nicolas Christin

Description and Significance
In many blockchains, the primary representation used for wallet addresses is not human-readable, thus hardly memorable. As a result, users often copy and paste addresses from their recent transaction history, which enables blockchain address poisoning. The adversary first generates lookalike addresses similar to the victim’s recipient and then sends a tiny amount to the victim to sneak into the victim’s transaction history (“poisoning”). The goal is to have the victim mistakenly send tokens to the lookalike address, as opposed to the intended recipient. Our prior research [1] shows that this is one of the largest and widely targeted attacks on blockchains such as Ethereum and Binance Smart Chain (BSC). However, anecdotal evidence suggests that this attack is prevalent on a broader ecosystem: Ethereum’s L-2 (sub) chains, such as Polygon, or even structurally different ledgers, like Tron or Solana, driven by the low attack costs (i.e., transaction fees). In this project, we will first extend the detection method from Tsuchiya et al [1] to various chains. Second, we will integrate the detector into our live system [2] that warns users of the attack severity in real time. Third, we will analyze the attacker's strategies and compare with our earlier findings [1] (e.g., attacking Ethereum users on Polygon). 

Student Involvement
Students will first learn the blockchain basics and how to interact with a blockchain node to extract data, guided by the graduate student(s); no prior knowledge of blockchains is required. They will also learn to process large-scale (terabytes) data and present it in a digestible manner. The goal of the project is to integrate the detector into our live website, so experience with Golang (or any language with concurrency) and web development would be ideal.

References
[1] Blockchain Address Poisoning. Taro Tsuchiya, Jin-Dong Dong, Kyle Soska, and Nicolas Christin. In Proceedings of the 34th USENIX Security Symposium (USENIX Security'25). Seattle, WA. August 2025.

[2] Website: https://cryptotrade.cylab.cmu.edu/poisoning/ethereum, Social media bot: https://x.com/toxin_tagger

Mentors: Joseph Calandrino and Sarah Scheffler

Description and Significance
From name and age to student status or business ownership, we routinely verify aspects of our identity when doing things online. Online services may require evidence of our attributes in the form of digital media, such as a live selfie or a scan of a driver's license. Traditionally, forging this evidence required non-trivial effort and skill. These forgery challenges may be implicit in the design of verification processes. For example, a website might ofter a modest student discount and simply accept a scan of a student ID as proof of student status. The site's developers might have assumed that the effort to forge a student ID scan outweighs the benefits for most unscrupulous users.

Advances in generative AI dramatically lower the skill and effort necessary to generated realistic synthetic media. As a consequence, the assumptions underlying some services' identity verification processes may no longer hold true. We want to measure the practical impact of this change. In particular, we want to determine services that may be vulnerable to advances in generative AI, assess the present and future risk, and weigh remediation measures.

Student Involvement
The specific design of the project will depend on project status as well as student skill and experience. We expect students do some combination of the following: survey forms of digital evidence that online services require to verify aspects of identity, test the ability of generative AI tools to produce plausible versions of that evidence, assess the robustness of verification processes against synthesized media, and evaluate possible remediation measures.

Mentor: Jonathan Aldrich

Description and Significance
In the past two decades, researchers have made remarkable advances in our ability to formally specify and verify the correctness of computer programs. However, it may cost an order of magnitude more effort and money to fully verify code than to simply write it in the first place, and thus formal verification has not seen wide adoption in the software industry. A major challenge of formal verification involves reasoning about memory aliasing of dynamically allocated data structures. Type systems that feature ownership and uniqueness to control memory aliasing are an important piece of the puzzle. To lower the barrier for formal verification, we are adapting ownership and uniqueness types to work with a verification debugger that visualizes all possible execution paths through the program, displays the current state of the verifier at each point in the program, and helps the programmer localize the verification errors if there are any.

Student Involvement
This is a novel and exciting project that lies at the intersection of formal methods, programming languages, and HCI. Students will have the opportunity to learn about the state of the art in formal verification and the potential research directions in which formal verification can be improved. Several possible paths for student work include:

1. User interface design and programming for the debugger, which is currently built as a plugin for the JetBrains IntelliJ platform.
2. Performing an empirical study of the usability of the debugger.
3. Formalizing aspects of the type system or verifier to prove their soundness.

Students should have strong programming skills in Java; knowledge of Rust or a functional programming language such as Scala is recommended. No prior experience of formal verification is required, only an interest in the engineering of more reliable and secure programs.

References
1. Jenna DiVincenzo et al. 2024. Gradual C0: Symbolic Execution for Gradual Verification. https://dl.acm.org/doi/full/10.1145/3704808

2. Catarina Gamboa et al. 2023. Usability-Oriented Design of Liquid Types for Java. https://chris.timperley.info/papers/icse23liquidjava.pdf

3. Conrad Zimmerman et al. 2023. Latte: Lightweight Aliasing Tracking for Java. https://arxiv.org/pdf/2309.05637

Mentors: Nicolas Christin and Rohan Padhye

Description and Significance
Telegram, initially a messaging app, has grown into a software infrastructure to support many web services with 1 billion users. In particular, any Telegram account can deploy the programmable app, Telegram bots, to help accelerate their services. For instance, bots can process payments, authenticate users, and serve as customer service agents. Deploying bots is simple: it’s just a matter of writing a Telegram bot script and running it on one’s machine or in the cloud – there are barely any checks or moderation. This convenience lowers barriers to entry for software development, but comes at a cost. First, while most bots are benign, cybercriminals also use bots to automate and scale up their operations. For example, they can process payment for illegally obtained goods or host malicious AI endpoints (e.g., producing non-consensual images) [1]. Second, we suspect that some Telegram bots are poorly implemented, leaving them vulnerable to security flaws. Indeed, some security vulnerabilities have already been reported, such as missing authorization checks [2] and malicious code injection [3]. In this project, we will leverage the fact that some bots make their code public (for transparency and reusability purposes) to perform large-scale analysis on Telegram bot source code. First, we will compile the list of open-sourced Telegram bots and download their GitHub repository. Given that there is no central bot directory, we will 1) utilize our 800 million Telegram messages dataset to extract their GitHub URLs (expecting to contain at least a few hundred of them), 2) perform keyword searches on the GitHub API, and 3) find third-party websites that feature Telegram bot repositories. Second, we will identify bot interactions. For example, we can extract URLs in the repository to identify the types of services they often interact with (e.g., third-party payment providers, AI endpoints). Third, we will perform the static analysis to detect security vulnerabilities, defects, and low-quality or inefficient code. We may use LLMs to scale up our program analysis across different programming languages. We expect this research to inform users of the potential risks associated with Telegram bots and establish better guidelines for bot developers. Our findings will also contribute to the understanding of how bots can be exploited for malicious purposes (e.g., through forking existing benign repositories). 

Student Involvement
This project is multidisciplinary, ranging from data science, computer security, and software engineering. The first phase involves data collection: students will leverage our largest message dataset on Telegram and collect data using GitHub APIs. The second phrase focuses on large-scale source code analysis, using basic data science techniques to program analysis (potentially through LLMs). The ultimate goal is to submit this paper to computer security, measurement, or software engineering conferences, so students will experience the whole process of research from literature review, experimentation, to paper writing. An introductory level of computer security knowledge is preferred.

References
[1] ​​Taro Tsuchiya, Haoxiang Yu, Tina Marjanov, Alice Hutchings, Nicolas Christin and Alejandro Cuevas. Bots as Infrastructure: A Large-Scale Study of Benign and Malicious Uses on Telegram. 2025. In submission.

[2] TeploBot - Telegram Bot for WP <= 1.3 - Telegram Bot Token Disclosure https://www.cve.org/CVERecord?id=CVE-2024-9627

[3] unzip-bot Allows Remote Code Execution (RCE) via archive extraction, password prompt, or video upload https://www.cve.org/CVERecord?id=CVE-2024-53992 
Mentor: Jonathan Aldrich

Description and Significance
We are developing a new programming language, Meerkat, for distributed web apps. Meerkat is an end-to-end reactive programming language, meaning that it provides seamless reactivity not just on the front end but responds to changes in back end data as well. A type system allows Meerkat programs to be updated safely while they are running. Starting with a client-server concept developed by our collaborators [1], we are exploring the theory and implementation of making the approach scalable by keeping reactive data on a distributed data store [2]. The end goal is to provide programmers with a safer, more scalable, and more usable programming model compared to the state of the art.

Student Involvement
We anticipate that students will contribute to the design and implementation of Meerkat. Students should have taken a class (or gathered experience outside the classroom) in compilers. The project will involve some distributed systems work but can learn what they need to know as part of the internship. Depending on the interests and background of the student, there may be opportunities to extend the theoretical foundations of the system as well.

References
[1] Type Safe Evolution of Live Systems. Miguel Domingues and João Costa Seco. REBLS'15. https://docentes.fct.unl.pt/jrcs/files/rebls15.pdf

[2] The Meerkat Vision: Language Support for Live, Scalable, Reactive Web Apps. João Costa Seco and Jonathan Aldrich. Proc. Onward!, 2024. https://dl.acm.org/doi/10.1145/3689492.3690048
Mentor: Riccardo Paccagnella

Description and Significance
Over the past two decades, numerous attacks have emerged in which an adversary exploits properties of the hardware to breach the security of software, even in the absence of software vulnerabilities. These attacks stem from the increasing gap between the abstraction hardware is intended to expose and the one its implementation concretely exposes. This project aims to develop a deeper understanding of this abstraction gap, with a particular focus on microarchitectural attacks (e.g., Spectre, Meltdown, Hertzbleed). Project goals include developing a deeper understanding of the microarchitectural optimizations enabling these attacks, assessing the vulnerability of software (e.g., cryptographic software) against these attacks, and developing practical mitigations against these attacks.

Student Involvement
Students will work with a team to answer research questions related to the above project description. In the process, students will learn how computer systems (really) work behind abstraction layers, how to attack real software by exploiting real hardware, and how to make software and hardware more secure and efficient. No prior experience in this area is required -- just curiosity and a desire to learn about these topics.
Mentor: Nicolas Christin

Description and Significance
Back in the 1990s, USENET newsgroups were a hotbed of free-flowing discussion – think of them as the decentralized predecessors of Reddit. Essentially, anybody could run a news server, and participate in a distributed forum infrastructure. (Among modern platforms, Mastodon is perhaps the closest to this.) Today, regrettably, text-based newsgroups are moribund and appear to be mostly used for spam. On the other hand, a somewhat late innovation – the ability for newsgroups to support binary files – seems to have given a second-life to USENET as a semi-decentralized way to offer pirated content (movies, TV series, video games in particular). It appears that this phenomenon has so far not been extensively studied, if at all, in the academic literature. The goal of this project is to remedy this situation by performing a quantitative analysis of how USENET is being used in 2025: we want to find ways to characterize the amount of data available from binary newsgroups both in terms of storage and transit (i.e., what and how much content is available, and how much is being transferred every day, and how). A secondary objective is to also verify the hypothesis that text-based newsgroups are a ghost town.

Student Involvement
This project combines network security and measurements. The first phase will involve a complete qualitative description of how USENET binary groups are used to deliver pirated content: which newsgroups support this, what infrastructure supports this content delivery. The second phase will involve setting up a measurement apparatus to get as exhaustive a view of what is available on the network, potentially sampling some of the offerings to determine whether they are in fact what is advertised, or whether they include malware. The third phase – or stretch goal – will be to think about how to implement ways to measure how much traffic is going to these newsgroups every day. Experience in Python and SQL is desirable but not required.
Mentor: Jonathan Aldrich

Description and Significance
Recent work [1,2,3] has made significant advances to the state-of-the-art in type systems for guaranteeing security invariants of programs, in particular _information flow control_. Information flow is an essential security technique which guarantees confidentiality, the property that secrets do not leak, and integrity, the property that critical program logic is protected---regardless of the actions of an unprivileged attacker. Confidentiality and integrity together provide noninterference, which states that an untrusted portion of a program cannot maliciously affect the execution of, or glean information from, a disjoint and trusted portion. Obtaining that a program satisfies noninterference is the pinnacle of security reasoning because it renders the specified fragment nearly impervious to security flaws.

Despite the utility of noninterference specifications in avoiding security bugs, prior work has seen little adoption owing to usability and scalability issues. This has resulted in tools which cannot be effectively deployed, either by software engineers who often lack the background to understand them or on problems of realistic complexity. Lastly, all approaches to information flow to date operate under the conventional wisdom that practical programs must violate noninterference. Consider the case of a password checker, which must leak at least a boolean amount of information about the password to fulfill its purpose. To accommodate this need most approaches to information flow introduce unsafe casts, known as _declassification_, into the language. This often stands in the way of modular reasoning.

Student Involvement
We have recently proposed and theoretically validated [2, 3] a system which solves the declassification problem. A further development [1] vastly extends the domain of information flow to encompass _capabilities_ (in the sense of capability-based security), _quantitative leakage_, and _protocol reasoning_, leveraging insights from substructural type systems like that found in Rust. Students working on this project might get involved in one or more of the following activities, based on interest:

- Working on implementing the above, developing higher-level syntax, type inference procedures, and other tasks to this end
- Extending the system to encompass reasoning about concurrency and state
- Setting up user studies for the system and evaluating end users' mental models of information flow, plus contrasting to prior systems
- Secure compilation for information flow-typed languages
- Anything else that piques your interest; feel free to propose! :)

References
[1] https://hgouni.com/files/popl26.pdf

[2] https://hgouni.com/files/oopsla25.pdf

[3] https://hgouni.com/files/oopsla25-slides.pdf
Mentor: Hanan Hibshi

Description and Significance
picoCTF is a free online Capture-the-Flag (CTF) style game implemented by and maintained at Carnegie Mellon University to promote and facilitate cybersecurity education. The target audience of our service has been middle and high school students since 2013, but it has also been increasingly popular and attracted many other age groups, including college students and adult learners. Each picoCTF release features ~100 cybersecurity challenge problems of increasing difficulty, which are revealed over a storyline to guide players to hone their skills one step at a time. These security challenges cover concepts such as cryptography, binary exploitation, web exploitation, reverse engineering, forensics, and other topics related to security and privacy. As a year-round education platform, the picoCTF team conducts constant research in areas including cybersecurity education methodologies, CTF problem development, and education platform development. This summer, we have multiple research projects for motivated students to get involved in various missions in picoCTF:

(1) Collect empirical data through surveys, user studies, and classroom observations
To improve the design of the picoCTF platform to reach a larger number of students, especially in under-resourced communities, we need to collect and analyze empirical data to inform our design enhancement and platform scalability process. This project includes a research plan to obtain the needed data through conducting user studies, focus groups, and usability and scalability tests that examine picoCTF in a classroom setting. We are interested in understanding how we can enhance our platform to better serve K-12 teachers and students, college students, and independent adult learners in under-resourced communities. The empirical data collected by closely observing participants in focus groups and from surveys and questionnaires will provide informed feedback to the picoCTF research team about possible technical challenges and key design improvements that would enhance the students' experience.

(2) Analyze player data from previous years and develop visualization tools
In addition to the surveys and questionnaires, the current picoCTF platform is rich with player data from previous years that could be a valuable resource to conduct in-depth analysis that would help understand how to improve the game to reach a wider audience, and what/where to include new challenges. The analysis would help reveal patterns that could be mapped to educational goals and help investigate where players fail to solve challenges or where the game becomes less interesting. These findings could ultimately help improve the user experience and retainment. Other areas of analysis include a performance by gender, team diversity, age, educational background, etc. We envision students will learn to use modern data analysis toolkits to analyze our data and build an interactive web-based exploration tool for presenting the findings from these analyses.

(3) Write new CTF challenges for the game and test current CTF challenges
Writing and testing CTF challenges are ongoing tasks in picoCTF. Testing current challenges help identify errors and bugs before a future competition goes live and writing new challenges helps increase our challenges pool. For the latter, we are especially interested in new challenges in areas where we have minimal or no existing coverage on our platform. These include privacy, mobile security (Android / iOS), IoT security (e.g., embedded Linux-based devices), and AI. Students will engage in learning security problems that arise in these areas and develop new CTF challenges of gradually increasing complexity to cater to players with different stages of technical capability.

Student Involvement
We are looking for multiple students who are interested in cybersecurity and who enjoy working on projects that have a global impact on youth and workforce development. Dr. Hanan Hibshi will be the faculty mentors overseeing students' research activities, and the picoCTF team consists of software engineers and graduate students who work together with student research assistants. The picoCTF project is interdisciplinary in nature and can be attractive to students with different backgrounds. Students with Human-Computer Interaction background can enjoy conducting user studies, collecting empirical data, or examining the picoCTF interface and proposing design changes that can improve user experience. Students with an interest in CS education and/or cognitive psychology could help analyze the data from existing players to investigate ways that can improve learning outcomes. Students who enjoy software development can help the technical team improve the current gaming platform and migrate to the new 2021 picoCTF that has advanced features. Finally, students with a cybersecurity background can join our team and enjoy testing the current challenges (by playing the game!) and help create new challenges or add a new category of challenges.

References
Owens, Kentrell, Alexander Fulton, Luke Jones, and Martin Carlisle. "pico-boo!: How to avoid scaring students away in a CTF competition." In Journal of The Colloquium for Information System Security Education. 2019.

Emma Hogan, Alejandro Cuevas Villalba, Hanan Hibshi, and Nicolas Christin. 2025. Observations of Highly Successful, Distributed Teams in a K-12 Online Security Competition. In Proceedings of the 25th Koli Calling International Conference on Computing Education Research (Koli Calling '25).

Opelo Tshekiso, Lenah Chacha, Assane Gueye, and Hanan Hibshi. 2025. Scaling Cybersecurity Education in Africa: Insights from a Capacity-Building Initiative. In Proceedings of the ACM Global on Computing Education Conference 2025 Vol 1 (CompEd 2025).
Mentor: Jonathan Aldrich

Description and Significance
Language migration poses many more problems than first meets the eye, for example: what can be done automatically? What help can developers expect to receive? What tooling is possible? and so on. This problem spans programming language design, compiler design, editors, and other tools. In 2017, I published a book detailing the differences between Python 2 and 3 and how to write a code base that works with both versions. Inspired by that work, I started a collaboration with the Softdev research group from Kings College London, led by prof. Laurence Tratt; on a research that further allows smooth migrations for languages like Python in the face of incompatible releases. The result of this collaboration so far is a technique that proposes the concept of Temporary Migration Variant (TMV) language versions, which allow code to be gradually migrated until the TMV is no longer needed. We are currently finishing corner cases on strings and have written a paper on “TMVs: Smoother Program Migration in the Face of Disruptive Language Changes" (for SLE 26). We also wrote an MSR 26 paper with results that provide a data-driven view of one of the most consequential transitions in open-source language evolution, with lessons applicable to similar migrations in ecosystems such as Scala 2.13→3 and Swift 4→5.

The literature on language evolution establishes methods that can search the source code for syntactical incompatibilities but can also apply type annotations to function signatures after function monitoring. The gap in the literature lies in the missing tools for identifying semantic and security incompatibilities. Our TMVs technique uses dynamic language centric approaches as well as light-weight monitoring and tracking methods for scope to discover semantic incompatibilities code fragments in legacy code bases to facilitate migration.

This project expands the discovery to security code fragments in legacy code bases. The discovery of Semantic and Security Incompatibilities is the first step to migration and will be used as input to the automatic code transformation to match the specifications of the newer versions of the underlying language of a program. The goal is to design and evaluate algorithms that can warn with useful error messages about semantic and security incompatibilities with hints on program repair but also provide the option to modify the program to a correct version. We hope to build terminal based as well as more intuitive GUI based tools.

Student Involvement
Students will start with instrumenting a CPython interpreter to dynamically identify any security bugs, as well as identify any ways to warn for code hidden behind short circuited evaluations building on speculative approaches in the literature. Finally they can write a tool that gives hints on program repair but also provide the option to modify the program to a correct version that fixes the incompatibility.

References
1. Joannah Nanjekye. Python 2 and 3 Compatibility: With Six and Python-Future Libraries. Apress, 2017

2. Raoul-Gabriel Urma. Programming language evolution. Technical report, University of Cambridge, Computer Laboratory, 2017
Mentor: Rohan Padhye

Description and Significance
Date and time are some of the most fundamental concepts on which many computer systems rely. Getting date/time computations wrong can lead to critical system failure, expose security vulnerabilities, or lead to hard-to-detect race conditions. In high-stakes domains - such as aviation [1], healthcare [2], and legal systems [3] - such bugs can result in multi-million-dollar losses or even endanger human lives.

Date/time computations are subtly difficult to implement correctly due to the variety of complexities associated with time zones, daylight saving time, leap years/seconds, ambiguous date formats, date/time arithmetic, and more. To this end, our lab is working on several projects to strengthen the correctness of date/time logic in software:

(1) Correctness of client software: In this project, we build on our prior work analyzing and classifying date/time bugs in open-source Python projects [4]. Our next step is to investigate whether simple static checkers can automatically detect the severe bug patterns we identified. Similar ideas have been explored in earlier work by Yang et al. [5]. The broader goal is to improve the reliability of existing date/time code and to develop tools that help prevent such bugs in newly written code - an increasingly important challenge in this era of Large Language Models (LLMs) and GenAI-based coding assistants.

(2) Correctness of date/time libraries: Most programming languages provide date/time utilities through their standard libraries or widely used third-party packages. Subtle bugs in these libraries can undermine entire software ecosystems, making their correctness essential for overall reliability. This project aims to develop differential testing techniques [6] for validating date/time libraries in languages such as Python and Java, with the goal of identifying edge-case behaviors and strengthening the foundations that many applications rely on.

(3) Correctness of LLMs’ temporal reasoning: As LLMs increasingly assist in writing code, interpreting documents, and supporting decision workflows, the reliability of their temporal reasoning has become an important and understudied challenge. Recent studies highlight the difficulty of temporal reasoning for LLMs [7]. This project aims to broaden the evaluation of LLM temporal reasoning by covering a wider spectrum of real-world temporal inputs and phenomena. Building upon the evaluation, we will then propose techniques to improve the LLM's ability to reason about date/time concepts reliably.

Student Involvement
Students working on these projects will gain hands-on experience with the many complexities of date and time computations. Since the focus of our research is to ensure reliability, students will use static/dynamic analysis frameworks in conjunction with LLMs to build bug detection tools and evaluate their effectiveness. Depending on the project direction, students may work with data-flow analysis tools such as CodeQL, property-based testing libraries like Hypothesis, or fuzzing tools like JQF/Fandango.

Students are expected to have strong programming skills, familiarity with GitHub and open-source development practices, and an interest in software quality. Students should also be comfortable working with LLMs. Having prior experience in developing any kind of software to ensure quality is a plus!

References
[1] Lockheed's F-22 Raptor Gets Zapped by International Date Line (https://web.archive.org/web/20070316132413/http://www.dailytech.com/article.aspx?newsid=6225)

[2] Daylight saving time confounds hospital electronic medical records (https://www.usatoday.com/story/news/health/2018/11/03/daylight-saving-time-hospital-electronic-medical-records-emergency-fall-back/1864579002/)

[3] I see dead patents: How bugs in the patent system keep expired patents alive (https://heinonline.org/HOL/Page?handle=hein.journals/frdipm33&div=5&g_sent=1&casa_token=&collection=journals)

[4] It’s About Time: An Empirical Study of Date and Time Bugs in Open-Source Python Software | IEEE Conference Publication | IEEE Xplore (https://ieeexplore.ieee.org/document/11025657)

[5] [2503.09002] KNighter: Transforming Static Analysis with LLM-Synthesized Checkers (https://arxiv.org/abs/2503.09002)

[6] [2410.04249] DiffSpec: Differential Testing with LLMs using Natural Language Specifications and Code Artifacts (https://arxiv.org/abs/2410.04249)

[7] DATETIME: A new benchmark to measure LLM translation and reasoning capabilities (https://arxiv.org/abs/2504.16155)
Mentors: Joshua Sunshine and Eunsuk Kang

Description and Significance
Formal modeling plays a crucial role in providing robust assurance to systems by offering a rigorous and unambiguous description of system requirements, behaviors, and properties. Under this framework, one can write a set of specifications that describe the desired properties, and use tools like Alloy [1] to automatically check these specifications in a system (model).

Literature has shown, however, that formal models are hard to construct, hard to understand, and hard to validate [2-4]. Visualizations have been known to make complex subjects tangible [5], so some formal modeling tools have been incorporating visualizations in their workflow. For example, Alloy has a built-in instance visualizer that allows users to visualize valid configurations of Alloy models. Other engineering works extend on Alloy’s visualizations to create custom, domain-specific visualizations, such as Penlloy [6], Sterling [7], and CnD [8]. These tools allow users to specify a “visual specification” which maps elements of the data (an Alloy instance) into visual/spatial channels.

One problem, however, is that even with these tools, writing visual specifications is still a hard problem [6, 8, 9]. This project explores the possibility of synthesizing these visual specifications automatically, from user-provided example diagrams. The goal is to efficiently generate visual specifications that satisfy these existing examples, while also being generalizable to newer instances.

Student Involvement
Students will work with mentors on the design, implementation, and evaluation of a visual specification synthesizer tool. Tasks may include:
  • Design, implement, and analyze algorithms and techniques that synthesize visual specifications from user-provided examples. Some existing works on specification synthesis include [10].
  • Explore and design mechanisms with which the users will interact with the proposed tool.
  • Design and carry out empirical evaluations on whether or not this proposed tool indeed helps with the broader task of designing custom visualizations for formal methods.

References
[1] Daniel Jackson. 2016. Software abstractions: Logic, language, and analysis, MIT Press.

[2] Gerwin Klein, June Andronick, Matthew Fernandez, Ihor Kuz, Toby Murray, and Gernot Heiser. 2018. Formally verified software in the real world. Commun. ACM 61, 10 (Sept. 2018), 68–77. doi:10.1145/3230627

[3] Gerard O’Regan. 2023. Overview of Formal Methods. Springer Nature Switzerland, Cham, 255–276. doi:10.1007/978-3-031-26212-8_16

[4] Benjamin Tyler. 2023. Formal Methods Adoption in Industry: An Experience Report. Springer International Publishing, Cham, 152–161. doi:10.1007/978-3-031-43678-9_5

[5] Jill H. Larkin and Herbert A. Simon. 1987. Why a Diagram is (Sometimes) Worth Ten Thousand Words. Cognitive Science 11, 1 (1987), 65–100. doi:10.1016/S0364-0213(87)80026-5

[6] Yiliang Liang, Avinash Palliyil, Eunsuk Kang, and Joshua Sunshine. 2025. Towards Better Formal Methods Visualizations. PLATEAU Workshop 2025 (8 2025). doi:10.1184/R1/29086949.v1

[7] Tristan Dyer and John Baugh. 2021. Sterling: A Web-Based Visualizer for Relational Modeling Languages. In Rigorous State-Based Methods, Alexander Raschke and Dominique Méry (Eds.). Springer International Publishing, Cham, 99–104.

[8] Siddhartha Prasad, Ben Greenman, Tim Nelson, and Shriram Krishnamurthi. 2024. Grounded Language Design for Lightweight Diagramming for Formal Methods. arXiv:2412.03310 [cs.CL] https://arxiv.org/abs/2412.03310

[9] Tim Nelson, Ben Greenman, Siddhartha Prasad, Tristan Dyer, Ethan Bove, Qianfan Chen, Charles Cutting, Thomas Del Vecchio, Sidney LeVine, Julianne Rudner, Ben Ryjikov, Alexander Varga, Andrew Wagner, Luke West, and Shriram Krishnamurthi. 2024. Forge: A Tool and Language for Teaching Formal Methods. Proc. ACM Program. Lang. 8,
OOPSLA1, Article 116 (April 2024), 29 pages. doi:10.1145/3649833

[10] Changjian Zhang, Parv Kapoor, Ian Dardik, Leyi Cui, Romulo Meira-Goes, David Garlan, and Eunsuk Kang. 2024. Constrained LTL specification learning from examples. arXiv preprint arXiv:2412.02905 (2024).
Mentor: Lorrie Cranor

Description and Significance
The CyLab Usable Privacy and Security Laboratory (CUPS) works on a variety of projects aimed at understanding and improving the usability of privacy and security software and systems. Some of our current projects focus on protecting users from text messaging scams, improving the usability of mobile app privacy labels, helping users better understand what data IoT devices are collecting in their homes, user studies of age verification technologies, evaluating security awareness training, and improving AI literacy. Our projects generally involve surveys, interviews, or online user studies. We do qualitative and quantitative data analysis and sometimes we build design prototypes or tools to instrument our data collection.

Student Involvement
Students will learn how to conduct research in usable privacy and security by working with a mentor to help conduct a user study in the security or privacy area. Depending on interests and project needs, the student may help set up online surveys and collect data on a crowd worker platform, perform qualitative and/or quantitative data analysis, or design and implement prototypes.

References
https://cups.cs.cmu.edu

Hana Habib and Lorrie Faith Cranor. Evaluating the Usability of Privacy Choice Mechanisms. SOUPS ‘22. https://www.usenix.org/system/files/soups2022-habib.pdf

Shikun Zhang, Lily Klucinec, Kyerra Norton, Norman Sadeh, and Lorrie Faith Cranor. Exploring Expandable-Grid Designs to Make iOS App Privacy Labels More Usable. Twentieth Symposium on Usable Privacy and Security (SOUPS 2024). https://www.usenix.org/conference/soups2024/presentation/zhang

Xiaoxin Shen, Eman Alashwali, and Lorrie Cranor. What do Privacy Advertisements Communicate to Consumers? Proceedings on Privacy Enhancing Technologies, 2024 (4) 466-502. DOI https://doi.org/10.56553/popets-2024-0126

Mentor: Bryan Parno

Description and Significance
The Secure Foundations Lab has been developing a new language, Verus [1], for formally proving the correctness of code written in Rust [2]. Such proofs can rule out large classes of bugs and vulnerabilities at compile time. This summer we will be working on techniques to generate and verify high-performance parsers and serializers written in Rust. Such code is a common source of vulnerabilities, since it must process potentially hostile inputs at high speed.

We will also verify portions of the Rust standard library itself. The Rust standard library is of critical importance to the Rust community, since nearly every Rust program depends on it for correctness and security. And yet, for the sake of better performance, the standard library relies heavily on unsafe Rust code, which disables Rust's standard safety checks and instead expects the developer to carefully ensure that safety is preserved. Using Verus, we can instead provide the developer with an automated safety net, allowing them to demonstrate in a machine-checkable fashion that their code indeed ensures safety.

Student Involvement
Students will work with a team to verify Rust-based parsers and serializers and portions of the Rust standard library. In the process, students will learn about Rust, including some of its more advanced features, as well as about formal verification and theorem proving; no prior experience in these areas is required -- just good taste and a desire to use tools that let you write perfectly correct code.

References
[1] https://verus.rs

[2] https://www.rust-lang.org/