GAITAR Fellows | Project Descriptions
GAITAR Fellows Project Descriptions Filtered by Projects with Data
- To what extent does student use of generative AI tools to make animations impact the technical and aesthetic control over their art?
- Does type of feedback interaction (peer vs generative AI) impact the quality of revisions of interview protocols?
- To what extent do genAI usage and scaffolding impact the quality of a student writing deliverable?
- Does student use of generative AI while completing formal writing assignments impact students’ writing performance?
- To what extent does student use of generative AI while creating micro lessons affect the quality of their lesson designs?
- To what extent does brainstorming with the assistance of generative AI impact: the number of ideas generated? the quality of ideas generated?
- To what extent does utilization of AI tools impact students’ skills for data processing, cleaning, and visualization of large data sets?
- Does the source of feedback affect performance on a future assignment?
- To what extent does generative AI impact research and writing: efficiency? performance?
- To what extent does using generative AI as a feedback generator improve the quality of student design deliverables in terms of completeness and correctness?
- To what extent does student use of generative AI impact the rate of change in students’ abilities to critically read and analyze academic papers?
- To what extent does genAI impact novice students’ speaking performance? In other words, do students who practiced with genAI improve at a different rate as compared to their classmates who practiced with their peers?
GAITAR Fellows Project Descriptions Filtered by College
College of Engineering
- Does type of feedback interaction (peer vs generative AI) impact the quality of revisions of interview protocols?
- To what extent does utilization of AI tools impact students’ skills for data processing, cleaning, and visualization of large data sets?
- How does using generative AI affect students’ industry knowledge as communicated through a team Miro board and a Q&A session with the instructor?
- To what extent does using generative AI as a feedback generator improve the quality of student design deliverables in terms of completeness and correctness?
Dietrich College of Humanities & Social Sciences
- Do various use cases of generative AI yield different linguistic accuracy and complexity in French students’ writing?
- How do first-year writing students conceptualize the relationship of LLMs to cultural production?
- To what extent can the use of generative AI tools improve the student peer review process for students in an intermediate level undergraduate writing course?
- What is the impact of the use of generative AI for text analysis on students' knowledge of genre-specific discourse and linguistic features?
- To what extent does student use of generative AI impact the rate of change in students’ abilities to critically read and analyze academic papers?
- To what extent does genAI impact novice students’ speaking performance? In other words, do students who practiced with genAI improve at a different rate as compared to their classmates who practiced with their peers?
Heinz College of Information Systems & Public Policy
- Does student use of generative AI while completing formal writing assignments impact students’ writing performance?
- To what extent does the early introduction and scaffolded use of generative AI tools for learning Tableau impact students’ performance on course deliverables?
- Does generative AI tool us affect equity in student outcomes, giving less-experienced students a better chance to be successful in technical courses?
- Does the source of feedback affect performance on a future assignment?
- To what extent does generative AI impact research and writing: efficiency? performance?
- How does the way in which a generative AI tool is integrated into the course impact students’ ability to engage in critical thinking over technical troubleshooting, particularly in formulating decision questions and translating stakeholder requirements into analytical models?
School of Computer Science
- To what extent does the quality of student-designed interview protocols differ when feedback on first drafts comes from an expert as compared to generative AI?
- To what extent does student use of generative AI while creating micro lessons affect the quality of their lesson designs?
Tepper School of Business
- To what extent do genAI usage and scaffolding impact the quality of a student writing deliverable?
- What is the impact of debating with generative AI (as compared to debating with a peer) on students’ development of analytical reasoning skills?
- To what extent does brainstorming with the assistance of generative AI impact: the number of ideas generated? the quality of ideas generated?
College of Fine Arts
GAITAR Fellows Project Descriptions Filtered by Course Level
Intro Undergraduate
- Does type of feedback interaction (peer vs generative AI) impact the quality of revisions of interview protocols?
- How do first-year writing students conceptualize the relationship of LLMs to cultural production?
- To what extent can the use of generative AI tools improve the student peer review process for students in an intermediate level undergraduate writing course?
- Does the source of feedback affect performance on a future assignment?
- What is the impact of the use of generative AI for text analysis on students' knowledge of genre-specific discourse and linguistic features?
- To what extent does student use of generative AI impact the rate of change in students’ abilities to critically read and analyze academic papers?
- To what extent does genAI impact novice students’ speaking performance? In other words, do students who practiced with genAI improve at a different rate as compared to their classmates who practiced with their peers?
Advanced Undergraduate
- To what extent does student use of generative AI tools to make animations impact the technical and aesthetic control over their art?
- To what extent do genAI usage and scaffolding impact the quality of a student writing deliverable?
- Do various use cases of generative AI yield different linguistic accuracy and complexity in French students’ writing?
- Does generative AI tool us affect equity in student outcomes, giving less-experienced students a better chance to be successful in technical courses?
- What is the impact of debating with generative AI (as compared to debating with a peer) on students’ development of analytical reasoning skills?
- To what extent does the quality of student-designed interview protocols differ when feedback on first drafts comes from an expert as compared to generative AI?
- To what extent does brainstorming with the assistance of generative AI impact: the number of ideas generated? the quality of ideas generated?
- To what extent does utilization of AI tools impact students’ skills for data processing, cleaning, and visualization of large data sets?
Graduate
- Does type of feedback interaction (peer vs generative AI) impact the quality of revisions of interview protocols?
- To what extent do genAI usage and scaffolding impact the quality of a student writing deliverable?
- Does student use of generative AI while completing formal writing assignments impact students’ writing performance?
- To what extent does the early introduction and scaffolded use of generative AI tools for learning Tableau impact students’ performance on course deliverables?
- To what extent does the quality of student-designed interview protocols differ when feedback on first drafts comes from an expert as compared to generative AI?
- To what extent does student use of generative AI while creating micro lessons affect the quality of their lesson designs?
- To what extent does brainstorming with the assistance of generative AI impact: the number of ideas generated? the quality of ideas generated?
- To what extent does utilization of AI tools impact students’ skills for data processing, cleaning, and visualization of large data sets?
- Does the source of feedback affect performance on a future assignment?
- To what extent does generative AI impact research and writing: efficiency? performance?
- To what extent does using generative AI as a feedback generator improve the quality of student design deliverables in terms of completeness and correctness?
- To what extent does student use of generative AI impact the rate of change in students’ abilities to critically read and analyze academic papers?
GAITAR Fellows Project Descriptions Filtered by Course Size
Small (20 or fewer students)
- To what extent does student use of generative AI tools to make animations impact the technical and aesthetic control over their art?
- Does type of feedback interaction (peer vs generative AI) impact the quality of revisions of interview protocols?
- Do various use cases of generative AI yield different linguistic accuracy and complexity in French students’ writing?
- How do first-year writing students conceptualize the relationship of LLMs to cultural production?
- To what extent can the use of generative AI tools improve the student peer review process for students in an intermediate level undergraduate writing course?
- To what extent does the quality of student-designed interview protocols differ when feedback on first drafts comes from an expert as compared to generative AI?
- To what extent does utilization of AI tools impact students’ skills for data processing, cleaning, and visualization of large data sets?
- What is the impact of the use of generative AI for text analysis on students' knowledge of genre-specific discourse and linguistic features?
back to filters
Medium (21-50 students)
- To what extent do genAI usage and scaffolding impact the quality of a student writing deliverable?
- Does student use of generative AI while completing formal writing assignments impact students’ writing performance?
- To what extent does the early introduction and scaffolded use of generative AI tools for learning Tableau impact students’ performance on course deliverables?
- To what extent does student use of generative AI while creating micro lessons affect the quality of their lesson designs?
- How does using generative AI affect students’ industry knowledge as communicated through a team Miro board and a Q&A session with the instructor?
- Does the source of feedback affect performance on a future assignment?
- To what extent does generative AI impact research and writing: efficiency? performance?
- To what extent does using generative AI as a feedback generator improve the quality of student design deliverables in terms of completeness and correctness?
- To what extent does student use of generative AI impact the rate of change in students’ abilities to critically read and analyze academic papers?
- To what extent does genAI impact novice students’ speaking performance? In other words, do students who practiced with genAI improve at a different rate as compared to their classmates who practiced with their peers?
- How does the way in which a generative AI tool is integrated into the course impact students’ ability to engage in critical thinking over technical troubleshooting, particularly in formulating decision questions and translating stakeholder requirements into analytical models?
Large (more than 50 students)
- Does generative AI tool us affect equity in student outcomes, giving less-experienced students a better chance to be successful in technical courses?
- What is the impact of debating with generative AI (as compared to debating with a peer) on students’ development of analytical reasoning skills?
- To what extent does brainstorming with the assistance of generative AI impact: the number of ideas generated? the quality of ideas generated?
GAITAR Fellows Project Descriptions -Unfiltered List
- To what extent does student use of generative AI tools to make animations impact the technical and aesthetic control over their art?
- Does type of feedback interaction (peer vs generative AI) impact the quality of revisions of interview protocols?
- To what extent do genAI usage and scaffolding impact the quality of a student writing deliverable?
- Do various use cases of generative AI yield different linguistic accuracy and complexity in French students’ writing?
- How do first-year writing students conceptualize the relationship of LLMs to cultural production?
- How does student use of generative AI while completing formal writing assignments impact students’ writing performance?
- To what extent does the early introduction and scaffolded use of generative AI tools for learning Tableau impact students’ performance on course deliverables?
- Does generative AI tool us affect equity in student outcomes, giving less-experienced students a better chance to be successful in technical courses?
- To what extent can the use of generative AI tools improve the student peer review process for students in an intermediate level undergraduate writing course?
- What is the impact of debating with generative AI (as compared to debating with a peer) on students’ development of analytical reasoning skills?
- To what extent does the quality of student-designed interview protocols differ when feedback on first drafts comes from an expert as compared to generative AI?
- To what extent does student use of generative AI while creating micro lessons affect the quality of their lesson designs?
- To what extent does brainstorming with the assistance of generative AI impact: the number of ideas generated? the quality of ideas generated?
- To what extent does utilization of AI tools impact students’ skills for data processing, cleaning, and visualization of large data sets?
- How does using generative AI affect students’ industry knowledge as communicated through a team Miro board and a Q&A session with the instructor?
- Does the source of feedback affect performance on a future assignment?
- To what extent does generative AI impact research and writing: efficiency? performance?
- To what extent does using generative AI as a feedback generator improve the quality of student design deliverables in terms of completeness and correctness?
- What is the impact of the use of generative AI for text analysis on students' knowledge of genre-specific discourse and linguistic features?
- To what extent does student use of generative AI impact the rate of change in students’ abilities to critically read and analyze academic papers?
- To what extent does genAI impact novice students’ speaking performance? In other words, do students who practiced with genAI improve at a different rate as compared to their classmates who practiced with their peers?
- How does the way in which a generative AI tool is integrated into the course impact students’ ability to engage in critical thinking over technical troubleshooting, particularly in formulating decision questions and translating stakeholder requirements into analytical models?
GAITAR Fellows Project Descriptions
Projects with Data
 Scott Andrew
Scott Andrew 
Adjunct Faculty
Art
College of Fine Arts
Spring 2024
60-424 AI Animation (14-week course)
Research Question(s):- To what extent does student use of generative AI tools to make animations impact their technical and aesthetic control over their art?
- To what extent does student self-efficacy for animation and genAI skills change over the course of a semester in which students could use genAI to create animations?
Andrew’s students used genAI tools to support their creation of animations, especially during creative editing and stylization decisions. Applications during and between class sessions included generating storyboards, scripts, animated sequences, synthesized voice narration and voice acting, and sound designs, resulting in both narrative and experimental works of animation. The suite of genAI tools included Runway, Deforum Stable Diffusion, ChatGPT, ElevenLabs, Midjourney, Dall-E and more.
Study Design:Students used a suite of genAI tools across all animation assignments. Using genAI, the first assignment required students to recreate an animation from a previous course for which genAI was not originally used. Andrew compared students’ animations created with (treatment) and without (control) the assistance of genAI. He also measured changes in students’ self-efficacy regarding creating animations with and without genAI throughout the course.
Sample size: Total sample (13 students completed the control, followed by the treatment condition)
Data Sources:
- Students’ animations created without and then with genAI, scored via a rubric to evaluate aesthetic and technical control.
- Pre/post surveys of students’ self-efficacy regarding skills using genAI and course learning objectives.
- RQ1: Animations created with genAI scored significantly higher on aesthetic control than those created without genAI, but they did not significantly differ on technical control.  
 Figure 1. Students earned significantly higher rubric scores (0-3 points for each criterion) on aesthetic control for an animation created with genAI assistance (M = 3.00, SD = .00) than on the same animation created without the help of genAI (M = 2.33, SD = .49), t(11) = 4.69, p < .001, Hedges’ g = 1.26. Students’ rubric scores did not differ for technical control (t(11) = 1.77, p = .10). Error bars are 95% confidence intervals for the means.
- RQ2: Students entered the course with significantly lower self-efficacy for using genAI tools to make animations than for animating without genAI. By mid-semester, their self-efficacy for animating with and without genAI no longer differed, and both types were equivalent by the end of the semester, representing a doubling in confidence of using genAI for animation. 
 Figure 2. Students entered the semester with significantly lower self-efficacy for creating animations with genAI assistance compared to creating animations without genAI assistance (t(12) = 3.08, p = .01, Hedges’ g = .80), this difference was no longer present by the middle of the semester (t(10) = .98, p = .35), nor the end (t(11) = .49, p = .64). Self-efficacy for creating animations with genAI support increased significantly across the semester (F(2, 18) = 9.99, p = .001, ηp2 = .53), specifically from pre to mid, p = .04, and pre to post, p = .002, and marginally from mid to post, p = .06, whereas self-efficacy for creating animations without genAI assistance remained the same across the semester (F (2, 18) = .50, p = .61). Error bars are 95% confidence intervals for the means.
Eberly Center’s Takeaways:
- RQ1: Results suggest that genAI use may confer a possible advantage for aesthetic control, but not technical control. However, students recreated an animation done in a previous course without genAI. Consequently, improvements in aesthetic control could also reflect the impacts of repeated practice over time. Lastly, the instructor knew which animations were created with genAI assistance when scoring, which may have biased ratings.
- RQ2: While self-efficacy for creating animations without genAI remained stable throughout the semester, students’ self-efficacy for using genAI for animations had doubled by the end of the semester. Repeated practice with various genAI tools for creating animations may have contributed to these increases in student confidence.
 Brandon Bodily
Brandon Bodily
Assistant Teaching Professor
College of Engineering
49-101 Introduction to Engineering Design, Innovation, and Entrepreneurship (Fall 2024)
Research Question(s):- To what extent does type of feedback interaction (peer vs generative AI) impact the quality of revisions of interview protocols?
- To what extent does the type of feedback interaction impact the development of self-efficacy for interviewing skills?
- What are student attitudes about receiving feedback when role playing with a peer versus generative AI?
Bodily provided students with suggestions and tips for how to engage with the genAI tool (Co-Pilot) while preparing for an interview with a key stakeholder. Students then interacted with the tool to conduct a practice interview and elicit feedback on their interview protocols. Students next updated their interview protocols and engaged in real-life interviews as part of the engineering design process.
Study Design:Bodily delivered the same classroom instruction on interview protocol development to all students. All students then crafted an initial draft of an interview protocol. Bodily randomly assigned each student to engage with genAI, as described above, or to leverage peers to receive feedback on their protocols. Then, during the same class meeting, students practiced their interviewing skills by role playing an interview using their revised protocol either with a peer or with the genAI tool, depending on the study condition assigned. All students could revise their protocol after receiving feedback and roleplaying, before conducting the actual interview.
Sample size: Treatment (15 students); Control (18 students)
Data Sources:
- Rubric scores for the quality of students’ draft and final versions of their interview protocol as scored by one of two coders who did not know the students’ study condition
- Surveys of students’ self-efficacy regarding their development of an interview protocol and interviewing skills
- Student survey reflections on the feedback and role playing session
- RQ1: The type of feedback interaction (genAI or peer) did not affect the quality of students’ interview protocol revisions. 
 Figure 1. The rubric-scored (max 25) quality of interview protocol revisions was not affected by study condition (condition x draft: F(1,31) = 1.11, p = .30). Error bars are 95% confidence intervals for the means.
- RQ2: While students significantly grew in their self-efficacy for interviewing skills across the course, the type of feedback interaction (genAI or peer) did not affect the degree of this change. 
 Figure 2. Students’ growth in self-efficacy did not significantly differ depending on whether they interacted with and received feedback from a peer or from genAI to guide their interview protocol revisions (condition x time: F(1,30) = .91, p = .35). Self-efficacy did not significantly differ between conditions (F(1,30) = .41, p = .53). Error bars are 95% confidence intervals for the means.
- RQ3: Students interacting with and receiving feedback from genAI reported the feedback to be significantly more “useful” and less “awkward” than did students interacting with a peer. There was no difference between the two groups in their perceptions of how useful they found the role playing session to be, how comfortable they were, and how effective they believe their revised protocol would be at eliciting information from their interviewee. 
 Figure 3. Students interacting with genAI reported their feedback to be significantly more useful (M = 4.56, SD = 1.38) than did students interacting with a peer (M = 5.47, SD = .74) (t(26.92) = -2.29, p = .02, g = .80). Error bars of 95% confidence intervals for the means. 
 Figure 4. Students interacting with genAI reported the experience to be significantly less awkward (M = 1.20, SD = .41) than did students interacting with a peer (M = 2.44, SD = 1.38) (t(20.57) = -3.63, p = .002, g = 1.17). Error bars of 95% confidence intervals for the means.
- RQ1, RQ2, & RQ3: There was no evidence that genAI affected either the development of students’ self-efficacy for interview skills or the development of the quality of their interview protocol. This could be due to a misalignment between the evaluation rubric and the task instructions that students received, the accuracy of the coders, the short, one day, time period between draft and revision, and/or students scoring high at their initial drafts with limited room to show further improvement. Despite this, students reported finding genAI feedback to be more useful than a peer’s and the session to be less awkward than with a peer. While this could suggest that students may prefer such a feedback interaction with genAI rather than a peer, interacting with a live human may be a more valid practice experience to the interviews that students will ultimately conduct. Additionally, though students perceived genAI’s feedback to be more useful, we do not have a direct measure to determine whether or not the genAI feedback was in fact better than, or different at all from, that of peers. GenAI may provide a viable feedback mechanism for developing an interview protocol when peer feedback is not readily available.
  Emily DeJeu
Emily DeJeu
Assistant Teaching Professor
Tepper School of Business
Spring 2024
70-340 Business Communications (14-week course)
Research Question(s):- To what extent do generative AI usage and scaffolding impact the quality of a student writing deliverable?
- Section A - used generative AI, with instructor-provided LLM scaffolding
- Section B - used generative AI, without instructor-provided LLM scaffolding
- Section C - no generative AI, no scaffolding
- To what extent does detailed generative AI-related scaffolding influence students’ perceptions about the utility of LLMs for assisting their growth and development as communicators across an entire semester? (Sections A and B only)
In one of three course sections, DeJeu scaffolded four mini-lectures showcasing genAI use cases in professional communication contexts (section A). Specifically, these lessons provided instruction and modeling on using ChatGPT or Copilot to revise a document, create model documents, identify "lexical bundles" (i.e., phrases and sentences that are used often in particular genres of writing), and generate ideas. Mini-lectures occurred in tandem with each of the four major writing assignments that included reflection questions and documentation regarding the writing process. Students were instructed to use genAI on their first writing assignment and were allowed to choose whether or not to use it for all subsequent assignments. In a second section (section B), DeJeu also instructed students to use genAI on the first writing assignment with permitted use on subsequent assignments, but she did not provide scaffolded instruction regarding genAI.
Study Design:This study had three sections, two of which were taught by DeJeu (sections A and B), and the third was taught by a colleague (section C). In one of DeJeu’s two sections, students received in-class scaffolding for ethical and effective genAI tool use (section A) while students in the other did not (section B). In both sections, students were instructed to use genAI on the first writing assignment. DeJeu compared performance on the first writing assignment, and students’ global perceptions of genAI at the beginning and end of the semester across the two sections. In a third section taught by a colleague (section C), students were not permitted to use genAI on the first writing assignment. DeJeu compared writing performance on this same assignment across all three sections.
Sample size: Section A (24 students); Section B (21 students); Section C (23 students)
Data Sources:
- One writing assignment, scored with a rubric by three trained coders who were unaware of the study and students’ section. This assignment was scored for various writing skills (e.g., use of rhetorical strategies, concision, coherence).
- Pre/post surveys of students’ perceptions of genAI’s utility to influence their growth and development as communicators in terms of familiarity, helpfulness, and efficiency (sections A and B only).
- RQ1: There was a significant difference in performance on the writing assignment among the three sections. Follow-up comparisons showed no difference between sections A and B, and significant differences between section C and both sections A and B. This overall difference was consistent across all rubric criteria.  
 Figure 1. Students’ writing performance was significantly different across the three sections, F (2,64) = 8.33, p < .001, ηp2 = .21. Students in section C (M = 9.67, SD = 2.27) performed significantly lower on the assignment than students in both section A (M = 11.96, SD = 1.75), p < .001, and section B (M = 11.67, SD = 2.27), p <. 01 . Error bars are 95% confidence intervals for the means.
- RQ2: There was a significant increase in students’ perceived familiarity with genAI tools from pre to post across both sections A and B. There was a marginally significant interaction between section and time, suggesting a slightly greater increase from pre to post in section A (scaffolded genAI use) compared to section B (no scaffolding). There were no significant main effects or interactions for perceived helpfulness or efficiency of genAI tools to assist in their development as communicators. 
 Figure 2. There was a significant main effect of time, F (1, 38) = 50.51, p < .001, ηp2 = .57, indicating a significant increase in students’ familiarity with genAI tools from the beginning to the end of the semester. The time x section interaction was marginally significant, F (1, 38) = 3.86, p = .06, indicating that the pre to post change was marginally greater for section A, p < .001, compared to section B, p < .01. Error bars are 95% confidence intervals for the means.
- RQ1: Students who were permitted to use genAI (sections A and B) performed significantly better on the writing assignment than students who were not permitted to use genAI (section C), as evaluated by three trained raters who were not informed about the nature of this project. This finding suggests that the use of genAI can help students turn in higher quality deliverables in their communications classes. It is important to note, however, that quality of the deliverable does not necessarily equate to greater learning. Further research is needed to test whether permitted genAI tool use impacts students’ development of underlying writing skills (e.g., on a transfer task completed without genAI), in addition to the quality of a single deliverable.
- RQ2: Scaffolded instruction on how to use genAI slightly increased students’ perceived familiarity with using genAI to assist with communication-related tasks above what simply using the tool alone did. However, scaffolded instruction and use did not impact students’ perceptions about how helpful genAI is to their growth or efficiency as communicators. This suggests that additional interventions are needed if the goal is to shift students’ perceptions of genAI’s potential to enhance their development as communicators.
 Catherine Evans
Catherine Evans
Graduate Student Instructor
English
Dietrich College of Humanities and Social Sciences
76-106 Writing about Literature, Art and Culture (Fall 2024)
Research Question(s):
To what extent does introducing a Critical GenAI Studies lens in a writing course about art and culture change:
- students’ perceptions of generative AI-produced images of CMU student culture?
- students’ writing performance on interpretative essays about campus culture?
Teaching Intervention with Generative AI:
Evans implemented a week-long unit on Critical AI Studies in a first-year undergraduate writing course. Students in both the treatment (Section A) and control (Section B) engaged with CMU Archives and Special Collections to understand CMU students’ historical role in producing campus culture. Students then used genAI to produce images based on text from the archival collections and compared these images to the actual historical images. Students in both sections had the option to use genAI in the brainstorming stages of their writing process. However, halfway through the course students in the treatment section engaged in a weeklong conversation about emerging critical perspectives on genAI use, whereas students in the control section spent extra time covering course content not related to genAI. 
Study Design:
Evans taught two sections of the course, one in the first half of the semester and one in the second half of the semester. In the first section, she implemented the teaching intervention as described above (treatment). Students in the second section instead spent extra time covering course content not related to genAI (control). Evans compared the same data sources across the two sections.
Sample size: Treatment (15 students); Control (18 students)
Data Sources:
- Student critiques of AI-generated cultural images.
- Rubric scores on two analytical writing assignments.
Findings:
- RQ1: Engaging in a week of Critical AI Studies qualitatively changed the nature of students’ critiques of AI-generated images between the sections. When critiquing AI-generated images of real historical articles, students who were exposed to intervention tended to offer more cultural or thematic critiques of the images (e.g., the image had historically inaccurate portrayals), whereas the students in the control section tended to offer more surface level critiques (e.g., the image had distorted faces).
- RQ2: Students’ rubric scores on their writing assignments showed no difference between the sections both before (Analysis Assignment 1) and after (Analysis Assignment 2) the critical AI week intervention.

Figure 1. There was no significant main effect of time, F (1,31) = 1.86, p = .18, nor section, F (1,31) = 0.46, p = .50, nor a significant interaction, F (1,31) = 0.72, p = .40, to indicate any differences in performance on the writing assignments both before and after the intervention. Error bars are 95% confidence intervals for the means.
Eberly Center’s Takeaway:
- RQ1 & RQ2: In the context of a seven-week course, a week-long Critical AI Studies intervention qualitatively impacted the depth of students’ AI critiques, but did not translate to observed impacts on students’ analytical writing skills. Intervening to increase students’ genAI competency/literacy may help students view AI-generated cultural content with a more critical lens without observed positive or negative effects on the later cultural analysis essay.
 Rebekah Fitzsimmons
Rebekah Fitzsimmons 
Assistant Teaching Professor
Heinz School of Information Systems and Public Policy
90-717 Writing for Public Policy (Fall 2024)
Research Question(s):
- To what extent does student use of generative AI while completing formal writing assignments impact students’ writing performance?
- To what extent does student self-efficacy regarding writing and generative AI use change after instruction on, and practice with, genAI during class discussions?
Fitzsimmons taught three sections of the course, two in the first half of the Fall 2024 semester (control, sections A and B) and one in the second half of the semester (treatment, section C). She provided the same classroom instruction on, and practice activities with, genAI to students in all sections. Classroom discussions specifically targeted prompt engineering, the evaluation of genAI outputs, and the ethics of using genAI in various realistic professional contexts (e.g., writing policy papers).
Study Design:Fitzsimmons permitted, but did not require, students to use a genAI tool of their choice to support the completion of graded writing assignments in one of three course sections (treatment). Students in this section were then able to opt into using the tool (self-selected treatment) or not (self-selected control). The other two sections (previous-section control) were not allowed to use genAI on course writing assignments. Fitzsimmons compared students’ performance on the second, most central, writing assignment, and she compared changes in self-efficacy across course sections in which genAI use was permitted vs. not.
Sample size: Self-Selected Treatment (11 students); Self-Selected Control (14 students); Previous-Section Control (47 students across two sections)
Data Sources:
- Rubric scores for students’ performance on the course’s second major writing assignment (completed after genAI instruction).
- Surveys of students’ self-efficacy regarding policy writing and genAI use at the beginning, middle, and end of their course.
- RQ1: Students who opted to use genAI assistance for their second writing assignment (“Policy Recommendation”) did not perform differently (as measured by final rubric score) than students who chose not to use genAI, or students who were not permitted to use genAI (Figure 1).  
 Figure 1. In Fall 2024, students’ rubric scores (0-200 points) did not differ statistically by whether they were not able to use genAI (previous section control, n = 47, M = 182.57, SD = 11.99), chose not to use genAI when permitted (self-selected control, n = 14, M = 179.64, SD = 8.86), or opted to use genAI when permitted (self-selected treatment, n = 11, M = 180.55, SD = 8.98) on their policy recommendation writing assignment, F(2,69) = .45, p = .64. Error bars are 95% confidence intervals for the means.
- RQ2: Both types of self-efficacy increased significantly over the course of the semester, regardless of section (genAI permitted vs. not).  
 Figure 2. Students’ self-efficacy for writing policy documents increased across time, F(1.40, 74.007) = 44.59, p < .001, ηp2 = .46, from pre to mid, p < .001, mid to post, p < .001, and pre to post, p < .001. Self-efficacy did not differ by condition (genAI permitted vs. not), F (1,53) = 3.18, p =.08, nor did the change across time differ by condition, F (1.40, 74.007) = 2.64, p = .10. Error bars are 95% confidence intervals for the means. 
 Figure 3. Students’ self-efficacy for effective and ethical genAI use increased across time, F(1.54, 81.52) = 22.07, p < .001, ηp2 = .29, from pre to mid, p = .002, mid to post, p <.001, and pre to post, p < .001. Self-efficacy did not differ by condition (genAI permitted vs. not), F (1,53) = .99, p =.32, nor did the change across time differ by condition, F (1.54, 81.52) = 1.36, p = .26. Error bars are 95% confidence intervals for the means.
- RQ1: There was no evidence that students who chose to use genAI on writing assignments when permitted (Fall 2024, self-selected treatment) performed differently than students who were not permitted to use genAI (Fall 2024, control) or students who were permitted to use genAI but chose not to (Fall 2024, self-selected control). However, the extent to which students used genAI varied and may not have contributed meaningfully to their performance. In submission reports, students indicated using genAI most frequently for help with grammar, structure, and formatting, which was only a small portion of the overall grade and may not have affected the final product enough in a sample of more experienced writers in this graduate course. Additionally, the presence of high grades throughout suggests that students may have had limited room for improvement with genAI assistance. Lastly, although genAI was a permitted tool for all students in the treatment section, only some students self-selected into using genAI, limiting causal conclusions about the effectiveness of genAI as a tool for improving writing.
- RQ2: Self-efficacy for writing policy documents and self-efficacy for ethical and efficient genAI use both increased significantly over the 7-week course, regardless of whether students were permitted to use genAI on writing assignments (Section C, treatment section) or not (Sections A and B, control sections). It is feasible that repeated writing practice and detailed instructor feedback contributed to increases in writing self-efficacy in all sections. However, there was also no difference across sections in the increases in self-efficacy of genAI use over time. This suggests that the genAI instructions students received in all sections may have been sufficient to raise self-efficacy in students, regardless of whether they elected to use the tool on assignments.
 Gabriela Gongora-Svartzman
Gabriela Gongora-Svartzman
Assistant Teaching Professor
Heinz College of Information Systems and Public Policy
Fall 2024
94-819 Data Analytics with Tableau (7-week course)
Research Question(s):- To what extent does the early introduction and scaffolded use of generative AI tools for learning Tableau impact students’ performance on course deliverables?
- How do student self-efficacy in data literacy skills change over time in a course in which a generative AI tool was introduced early?
Gongora-Svartzman introduced students to a genAI tool (Explain Data) designed to assist in data exploration. Gongora-Svartzman demonstrated how this Tableau genAI tool can provide an efficient way to view the landscape of potential data analysis pathways in a given project. The teaching intervention provided students with a scaffolded introduction to Explain Data early in the course. In the control course, students were briefly exposed to the tool, without scaffolding, during the last two weeks of the course.
Study Design:Gongora-Svartzman taught three sections of the course, one control section in Spring 2024 and two treatment sections in Fall 2024. She briefly introduced Explain Data late in the course in the Spring 2024 section (control), whereas she introduced it earlier in the course and in more scaffolded form in the Fall 2024 sections (treatment). She compared data sources from student deliverables across sections.
Sample size: Treatment (55 students); Control (25 students)
Data Sources:
- Student deliverables (in-class exercises, final group projects, and case study challenges) from course assignments that required students to perform data analysis.
- Pre-post surveys of students’ self-efficacy regarding their data literacy (treatment sections only).
- RQ1: Student performance in the course during the Fall 2024 (treatment) semester did not differ significantly from student performance during the Spring 2024 (control) semester on any course deliverables.
- RQ2: Treatment students’ (Fall 2024) self-efficacy for data analysis skills and use of genAI tools for data analysis significantly improved from pre to post, marking an increase of nearly 50% from baseline.

Figure 1. In the Fall 2024 (treatment), students’ self-efficacy for data literacy significantly improved from the beginning (M = 59.82, SD = 21.98) to the end (M = 88.40, SD = 9.07) of the semester, t(41) -8.17, p < .001, g = -1.24. Error bars are 95% confidence intervals for the means.
- RQ1. Student performance did not change when they were introduced in a more scaffolded fashion to the genAI tool Explain Data earlier in the semester compared to a semester in which students received a more cursory introduction to the tool later in the semester. However, students in the Spring 2024 (control) already evidenced very high performance on all deliverables, limiting the ability to detect improvements.
- RQ2. Students’ self-efficacy for course-related skills (including the use of genAI tools) did significantly improve from the beginning to the end of the course in Fall 2024 (treatment). These data were not collected during the Spring 2024 (control) section, however, so we cannot say to what extent these pre/post increases are attributable to the intervention.
 Alan Thomas Kohler
Alan Thomas Kohler
Senior Lecturer
English
Dietrich College of Humanities and Social Sciences
76-270 Writing for the Professions (Spring 24, Fall 24)
Generative AI Tool(s) Used
Copilot
Research Question 
To what extent can the use of generative AI tools impact the quality of revisions from and student perceptions of the peer review process in an intermediate level undergraduate writing course?
Teaching Intervention with Generative AI (genAI):
Kohler’s students completed a peer-review feedback process for each of five writing projects in his course. For two of the projects in Spring 2024, students completed this process using a genAI tool (Copilot), rather than another student, as the source of feedback. Students submitted their writing along with the rubric and an instructor-engineered prompt to receive feedback from the AI tool on their writing sample. Additionally, students submitted a carefully refined instructor-engineered prompt to the AI tool to generate a writing sample in order to practice providing feedback. Kohler introduced Copilot during class and provided all pre-engineered AI prompts. For each project, students documented the feedback they received and gave, as well as their perceptions on the usefulness of each experience for learning.
Study Design:
Students used traditional peer review for the first three projects (control) in Spring 2024, but substituted genAI for peer reviewers (treatment) during the fourth and fifth projects. In Fall 2024, this design was counterbalanced, with the first three projects using the genAI-based peer review (treatment) and the fourth and fifth projects using traditional peer review (control). All assignments, rubrics, and genAI prompts were identical each semester. Kohler compared student perceptions of the feedback process and the quality of writing deliverables across conditions.
Sample size: 36 students across two semesters experienced counterbalanced treatment and control conditions.
Data Sources:
- Students’ drafts for each writing project, scored with rubrics measuring writing skills before and after the review process (scored without knowledge of condition and draft).
- Students’ survey responses regarding the feedback process for each project.
Findings:
- Across all five writing projects, human peer feedback led to significantly greater improvement than did genAI partner feedback.  
 Figure 1. Averaging all five projects, students using a peer (M = 7.66 , SD = 1.80) for the review process improved their drafts significantly more than students using genAI (M = 6.57 , SD = 1.22 ) as a review partner, t (34) = 2.13, p < .05, g = 0.70. Error bars are 95% confidence intervals for the means.
- Across all five writing projects, students engaging in the review process rated both receiving and giving feedback with a human peer to be significantly more helpful than with a genAI partner. 
 Figure 2. Averaging all five projects, students rated the experience of receiving feedback from a peer (M = 3.44 , SD = 0.78) to be significantly more helpful than receiving feedback from genAI (M = 3.09 , SD = 0.63), t (33) = 2.35, p <.05, g = .39. Students also rated the experience of giving feedback to a peer (M = 3.53 , SD = 0.74) to be significantly more helpful than giving feedback to a genAI partner (M = 3.04 , SD = 0.80), t (33) = 2.93, p <.01, g = .49. Error bars are 95% confidence intervals for the means.
Eberly Center’s Takeaways:
- Giving and receiving feedback with a peer helped students significantly improve their writing more than feedback from carefully prompted genAI. The consistent trend of peer review being more effective than genAI was observed across all five projects. This finding suggests that, for intermediate-level undergraduate students, the use of genAI as a review partner may not be as effective at improving students’ writing as collaborating with a human peer for the review process.
- Students rated both giving and receiving feedback while engaging with a peer to be more helpful than engaging in the process with genAI. The trend of peer review being rated as more helpful was consistent across all five projects. Notably, the pattern of differences in students’ reported helpfulness for each project mirrored that of their actual writing performance, suggesting that students’ perceptions of how helpful the feedback process was for a given assignment is well aligned with subsequent improvements in their revised writing.
 Marti Louw
Marti Louw
Director, Learning Media Design Center
Human-Computer Interaction Institute
School of Computer Science
Fall 2024
05-291/05-691 Learning Media Design (14-week course)
Research Question(s):- To what extent does the quality of student-designed interview protocols differ when feedback on first drafts comes from an expert as compared to generative AI?
- To what extent does students’ self-efficacy of their interviewing skills change across the semester when receiving generative AI feedback?
- What are students’ attitudes about simulating an interview with generative AI and receiving generative AI feedback on an interview protocol?
Louw’s Fall 2024 students first used genAI as a coaching tool to receive feedback on their written interview protocols drafts (e.g., subject matter experts, stakeholders, or end-users). Next, students simulated the interview by roleplaying with the genAI using spoken inputs to the tool. Both genAI experiences provided opportunities for students to reflect and iterate on their protocol. For both the written and spoken genAI interactions, Louw provided specific instruction on prompt engineering strategies during class sessions.
Study Design:Louw required pairs of students to use genAI for feedback on interview protocol drafts and for simulated interview practice in Fall 2024. On the same assignments, she compared team performance in Fall 2024 to that of teams from Fall 2023, when students did not use genAI and instead received instructor feedback on their protocol draft. Student surveys regarding self-efficacy and other attitudes were deployed at the beginning and partway through the Fall 2024 treatment semester.
Sample size: Treatment (10 teams); Control (9 teams)
Data Sources:
- Rubric scores for each team’s draft and revised interview protocols (scored after removing indicators of team identity, study condition, and which draft the protocol was)
- Pre/post surveys of students’ self-efficacy for interviewing skills (treatment only)
- Students’ written reflections following genAI feedback and interview simulations (treatment only)
- RQ1: Coding of the protocols showed that whether the students received feedback from the instructor or genAI did not impact the quality of their interview protocol revisions based on total rubric score. However, teams in the genAI semester did score higher on one rubric criterion (number of thematic areas).
- RQ2: GenAI students entered the course with fairly high self-efficacy for their interviewing skills (mean = 78.2 out of 100). After hands-on learning experiences of preparing for and conducting interviews, they maintained this confidence (mean = 82.9) though they did not significantly grow their self-efficacy.
- RQ3: Students were slightly positive about the usefulness of genAI for feedback and its ability to stimulate new interview questions. They were less positive about the usefulness of genAI for simulating an interview. Despite this, 75% of the respondents said that they would choose to use the tool to help prepare for future interviews.
- RQ1: For the most part, genAI did not impact the quality of students’ revised interview protocols, with the exception of helping them generate more thematic areas for their interviews. The genAI condition included two interactions: feedback/coaching and simulation of the interview. We cannot disentangle which of these interactions had the impact on students’ themes or whether both experiences are necessary to achieve this outcome.  
 Figure 1. Students’ rubric scores on teams’ deliverables did not significantly differ depending on whether they received feedback from the instructor or from genAI to guide their interview protocol revisions (condition x time: F(1,17) = .12, p = .73). Error bars are 95% confidence intervals for the means.
- RQ2: GenAI students did not significantly grow in their self-efficacy for interviewing skills. This could be due to the relatively high self-efficacy students entered the course with, possibly as a result of prior interviewing experience in 75% of students, or the small sample size. Alternatively, this could indicate that students need additional mastery experiences to build confidence in these skills.
- RQ3: Despite mixed results in their opinion of how useful genAI was for feedback and simulating an interviewee, the majority of students indicated that they would use the tool in future interviewing tasks (e.g., the tool is a supplement to human thinking without a tangible cost).
 Ganesh Mani
Ganesh Mani
Distinguished Service Professor of Innovation Practice
Tepper School of Business
Spring 2025
46-992: MSBA Experiential Learning
Research Question(s):
To what extent does students’ interaction with a consensus-building genAI tool (Thinkscape):
- impact students’ self-reported ability to efficiently make group decisions?
- After completing both trials
- Immediately upon completion of each trial
 
- impact students’ perceptions that their voices were heard during group deliberation?
- impact students’ perceptions of the quality of groups’ decisions?
Teaching Intervention with Generative AI (genAI):
As part of their Experiential Learning Capstone course, Mani had his students complete collaborative resource allocation tasks that challenged them to arrive at a group consensus in an efficient manner. Students worked in small groups using a consensus-building genAI tool (Thinkscape) and had 30 minutes to arrive at a group consensus regarding how to allocate a financial portfolio based on the profile of a potential client.
Study Design:
Mani taught the course in the Spring 2025 semester. First, students used a whiteboard to make small-group decisions for a resource allocation task. A week later, students used the genAI tool (Thinkscape) to complete a similar resource allocation task for a different profile. This design allowed all students to experience both conditions and serve as their own point of comparison. Students completed a brief follow-up survey to gauge their perceptions after each activity. At the conclusion of both tasks, students were also asked to compare the two modalities in terms of perceived efficiency.
Sample size: 30 students completed the control task, followed by the treatment task
Data Source:
- Student survey data collected immediately following each activity and at the conclusion of both trials.
Findings:
RQ1a: When surveyed at the conclusion of both trials, students were asked to choose which modality more efficiently helped their groups reach a conclusion. Students indicated that the Thinkscape trial (67%) more efficiently helped them reach a conclusion, compared to the whiteboard trial (13%).

Figure 1. At the conclusion of both trials, 67% of the students thought Thinkscape more efficiently helped them reach a conclusion compared to using a whiteboard (13%). A Chi-Square Goodness of Fit test revealed these differences to be significant, χ2(2) = 15.20, p < .001.
RQ1b, 2 &3: When surveyed with Likert-type items immediately following each trial, students did not report a significant difference between the conditions for the perceived efficiency of the process, perceptions of their voices being heard, or the perceived quality of the decisions their group made.

Figure 2. There was no significant difference between students’ ratings of efficiency t (29) = 0.96, p = .34, feelings of their voice being heard t (29) = 1.31, p = .20, or perceived quality of the groups’ decisions t (29) = -1.20, p = .24 when completing the allocation task with a whiteboard compared to using the genAI tool.
Eberly Center’s Takeaways:
- RQ1, RQ2, & RQ3: At the conclusion of both trials, students were asked to choose between the two modalities in terms of efficiency, and a majority of the students chose the Thinkscape genAI tool over the whiteboard. When rating each trial independently, however, students’ perceptions regarding group process and decision making did not differ between the use of the genAI tool and a traditional whiteboard. All of these students experienced the whiteboard trial first, followed by the genAI trial, so we cannot say the extent to which ordering effects impacted any of the results. There were plans to include a second section that completed the trials in a counterbalanced order (genAI first, followed by whiteboard), but low attendance at one of the trials prevented that section from being included in this analysis.
 Steven Moore
Steven Moore
Graduate Student Instructor
Human Computer Interaction Institute
School of Computer Science
Spring 2024
05-840 Tools for Online Learning (14-week course)
Research Question(s):- To what extent does student use of generative AI while creating micro lessons affect the quality of their lesson designs?
- How does student self-efficacy for educational design and generative AI use change over the course of the semester?
Moore’s students engaged with four interactive, online learning modules on fundamental teaching and learning principles. Each module contained two micro lesson design activities, for a total of eight micro lesson activities, in which he challenged students to apply the learning principles to their practice. For half of the micro lesson activities, he instructed students to use genAI (ChatGPT) as a collaborator in their design process.
Study Design:Moore implemented two conditions, use of genAI (treatment) or not (control), in the single section of his course. For the first micro lesson assignment in each of four online learning modules, he randomly assigned half of the students to use genAI (treatment) and half of the students not to use genAI (control). For the second micro lesson assignment in each module, students switched to the other condition. Moore compared data sources for each student between conditions and across modules and micro lesson assignments.
Sample size: Total sample (27 students, randomly assigned to alternating treatment and control conditions)
Data Sources:
- Students’ deliverables from eight micro lesson assignments (half completed with genAI assistance, half without), scored via a rubric with criteria for topic selection, learning objectives, assessments, instruction, and incorporation of the given learning science principle.
- Pre/post surveys of students’ self-efficacy regarding skills using genAI and educational lesson design.
- RQ1: Using genAI for the creation of micro lessons improved performance: Students earned higher rubric scores on the four micro lessons they created with the help of genAI than on the four micro lessons they created without the help of genAI.  
 Figure 1. Students earned significantly higher scores on the four micro lessons created with genAI assistance (M = 12.69, SD = .99) than the same students earned on four micro lessons created without the help of genAI (M = 11.72, SD = 1.49), t(26) = 4.72, p < .001, Hedges’ g = .88. Error bars are 95% confidence intervals for the means.
- RQ2: Students entered the course with comparable self-efficacy for creating educational lessons with and without genAI. After three months of using genAI tools for designing educational lessons on half of the micro lessons taught in the course, both types of self-efficacy increased by 11%.
- RQ1: GenAI assistance conferred an advantage for the design of rubric-scored micro-lessons: Students who prompted ChatGPT to help with generating LOs, instructional text, and assessments outperformed students who generated this content without the help of genAI. Because raters were unaware of conditions at the time of scoring the deliverables, these differences are unlikely to be the result of bias and therefore suggest that genAI as a thought partner benefitted students’ lesson plan design. Students’ deliverables improved when genAI was available to them, but there was no evidence of transfer of skills when students worked without the help of genAI. These data suggest that using genAI increased the quality of deliverables, but using genAI did not persistently alter students’ competencies.
- RQ2: Students’ self-efficacy for course-related outcomes and genAI use increased to an equal extent from the beginning to the end of the semester. Since all students completed an equal number of micro-lessons with and without genAI use, it is unclear whether these gains are due to genAI use alone.
 Carrington Motley
Carrington Motley
Assistant Professor
Tepper School of Business
Spring 2024
70-415 Introduction to Entrepreneurship (14-week course)
Research Question(s):To what extent does brainstorming with the assistance of generative AI impact:
- the number of ideas generated?
- the quality of ideas generated?
- students’ self-efficacy regarding generative AI use and course learning objectives?
Motley implemented scaffolded brainstorming sessions during class to support ideation for entrepreneurship projects (by individuals). Students then leveraged genAI tools (Copilot) to support both the generation and evaluation of ideas for new business ventures. Individual students created “pitch decks” (slides) to present their ideas to their peers to recruit collaborators to design a business implementation plan. Teams of students then collaboratively designed implementation plans for the entrepreneurship projects chosen.
Study Design:All students in two concurrent course sections received training on brainstorming techniques. Motley randomly assigned two conditions to sections: students used (treatment) or did not use (control) genAI tools in brainstorming exercises during class. The treatment group received training on brainstorming techniques and genAI use focused on prompt engineering. Control groups received training on brainstorming techniques alone. Data sources were compared between course sections, statistically controlling for variation in students between conditions.
Sample size: Treatment section (56 students); Control section (43 students)Data Sources:
- Artifacts of brainstorming sessions, including google docs (control and treatment) and transcripts from genAI use (treatment)
- Students’ pitch decks (slides from student presentations), scored using a rubric with criteria for uniqueness of the problem being solved, the solution, and the customer segment targeted
- Pre/post surveys of students’ self-efficacy regarding skills using genAI tools and course learning objectives
- RQ1: Students did not differ in the number of ideas they generated with or without the help of genAI across two individual brainstorming sessions. However, students who brainstormed without genAI experienced a decline in idea production over time, whereas students who used genAI did not. 
 Figure 1. Although the average number of ideas generated did not differ across conditions, F (1, 83) = .59, p = .45, η2 = .007, students across conditions experienced a decline in number of ideas generated over time, F(1, 83) = 8.80, p = .004, η2 = .10. However, a closer investigation of the significant time x condition interaction, F(1, 83) = 5.04, p = .03, η2 = .06 suggests that this decline was only true for the non-genAI condition, F(1, 83) = 11.54, p < .001, η2 = .12, whereas students who used genAI did not experience a decline in number of ideas generated over time, F(1, 83) = .32, p = .58, η2 = .001.
- RQ2: The quality of entrepreneurial pitches submitted by students did not differ in uniqueness, feasibility, or compellingness across conditions using and not using genAI (see Figure 2). The subset of students who critically evaluated (“filtered”) ideas early on that genAI produced (i.e., they did not automatically submit all ideas suggested by genAI) pitched marginally higher quality ideas to their peers (see Figure 3). 
  
 Figure 2. Students’ entrepreneurial pitch deck scores (uniqueness, feasibility, and compellingness total, out of 6 pts.) did not differ when students used genAI (M = 4.30, SD = .81) or not (M = 4.19, SD = .96) for idea generation. Error bars are 95% confidence intervals for the means. An independent-samples t-test showed that the mean difference was not significant, t(97) = -.66, p = .51. 
 Figure 3. In the condition that used generative AI, entrepreneurial pitch deck scores (uniqueness, feasibility, and compellingness total, out of 6 pts.) were marginally higher when students critically evaluated and filtered genAI-generated ideas during the first brainstorming phase (M = 4.40, SD = .77) than when they retained every genAI-produced idea (M = 3.75, SD = .89), t (8.82) = -1.94, p = .08, g = -.82. Error bars are 95% confidence intervals for the means.
- RQ3: Across a single class session (i.e., two individual brainstorming sessions), students’ confidence in formulating an idea increased significantly, regardless of genAI use. Engaging in brainstorming with genAI significantly increased students’ self-efficacy for using genAI when compared to the condition that did not use the tool.
Eberly Center’s Takeaways:
- RQ1: Even though students self-reported that genAI helped them generate more ideas, students who used genAI did not differ in the number of ideas they submitted compared to students who did not use genAI. If anything, students who used genAI submitted slightly (though not statistically significantly) fewer ideas than students who did not use genAI. This finding is in line with existing work that suggests integrating genAI into a brainstorming process does not necessarily offer a safeguard against the kinds of productivity losses experienced in human brainstorming groups. However, use of genAI in the present study enabled students to maintain a level of productivity while students who did not use genAI experienced a pattern of exhausting their ability to generate new ideas. This suggests that genAI may be particularly helpful at later stages of the idea-generating process when human capabilities have been maximized.
- RQ2: 
- There was also no evidence that genAI conferred an advantage when it came to the quality of students’ chosen ideas, as measured by final pitch deck scores. Together with RQ1a, these findings echo other research that suggests students overestimate the benefits of genAI for academic performance.
- On a cautionary note, students who retained all the ideas genAI produced without critically evaluating, or “filtering” them, had a tendency to perform worse on pitch deck scores than students who filtered genAI-produced ideas early on. This was true of students who retained all genAI ideas without adding any of their own ideas, and of students who kept genAI ideas and added on to them. In other words, students who did not filter also did not come up with good ideas on their own. This is consistent with emerging research suggesting that academic performance is reduced when students overly rely on genAI and fail to invest sufficient cognitive effort into evaluating genAI output.
- Motley is collecting a second semester of data in Spring 2025 to further explore the impact of critical evaluation of genAI ideas.
 
- RQ3: Regardless of genAI use, students reported an increase in their confidence in generating a startup idea after a single class session whereas only students who used genAI in their brainstorming showed an increase in confidence to use genAI to produce desired results. However, confidence was assessed right after the brainstorming activities and it is unclear if these differences persist over time.
 Nimer Murshid
Nimer Murshid
Assistant Teaching Professor
Mellon College of Science
CMU-Qatar
Spring 2025
09-111 Nanolegos: Chemical Building Blocks (14-week course)
Research Question(s):- To what extent does the use of generative AI as a scaffolded study partner impact the learning outcomes of non-majors when exposed to a new chemistry topic?
- To what extent does student self-efficacy change over the course of a semester in an elective undergraduate Chemistry course for non-majors in which generative AI is used as a study partner?
Teaching Intervention with Generative AI (genAI):
Murshid prompted students to use genAI (ChatGPT) to help clarify concepts before the completion of homework assignments on two particularly challenging chemistry topics. For each topic, students completed an in-class activity to practice using the tool as a study partner. He then gave students homework assignments that consisted of two parts: (A) several homework-related ChatGPT prompts to use as a way of learning the topic, with instructions to ask the tool further clarification questions, and then (B) practice problems on the topic that students completed without genAI assistance. Students demonstrated their learning of the homework concepts on an in-class exam completed without genAI assistance.
Study Design:
Murshid taught the course during the Spring 2024 (control) and Spring 2025 (treatment) semesters. In the Spring 2024 semester, he did not permit students to use genAI. In Spring 2025, Murshid integrated the use of genAI as a study partner as described above. He compared students’ performance on two quizzes, two homework assignments, and one exam across the semesters. Students’ pre and post self-efficacy for course learning objectives and genAI use was also measured for the Spring 2025 (treatment) section only.
Sample size: Treatment (16 students); Control (16 students)
Data Sources:
- Students’ performance on a course exam, two quizzes, and two homeworks.
- Pre/post surveys of students’ self-efficacy for skills related to using genAI and course learning objectives (treatment semester only).
Findings:
- RQ1: Interacting with genAI as a study partner improved students’ performance on the exam and one of two quizzes in the treatment semester (S25) compared to the control semester (S24). Interacting with genAI as a study partner did not impact performance on the two homework assignments.  
 Figure 1. Students’ scores on Exam 2 (controlling for GPA), which covered all topics targeted by the intervention, were higher in S25 (treatment; M = 85.62, SE = 2.87) than in S24 (control; M = 75.07, SE = 2.87), F(1, 29) = 6.57, p = .02, η2p = .19. Error bars are 95% confidence intervals for the means.
- RQ2: There was a significant increase in self-efficacy for course learning objectives from pre to post, but not for genAI use (treatment only). 
 Figure 2. In Spring 2025 (treatment), students’ self-efficacy for genAI tool use did not increase significantly from pre (M = 72.90, SD = 18.93) to post (M = 76.70, SD = 15.52), t(14) = .87, p = .40, but it did increase significantly from pre (M = 63.27, SD = 25.12) to post (M = 85.66, SD = 15.23) for course learning objectives (LOs), t(10) = 2.40, p = .04, g = .67. Error bars are 95% confidence intervals for the means.
Eberly Center’s Takeaways:
- RQ1: Students in S25 (treatment) showed higher performance than their peers in S24 (control) on 2 out of 5 deliverables. Notably, this included the exam covering both topics the intervention targeted. Historically in the course, there have been performance differences between business and computer science students. The original research question aimed to test if the intervention differentially affected these student populations. However, during the treatment semester only two business majors enrolled in the course, which was too small a sample to meaningfully test this question. As such, it is unclear whether the intervention was equally effective for all majors, and further data collection is needed for this more nuanced analysis. All together, these results suggest that using genAI as a study partner with scaffolded support is potentially effective for helping non-majors learn challenging chemistry topics.
- RQ2: Although there was a generally consistent pattern of increase for self-efficacy items related to course LOs, this was not true for genAI-related self-efficacy. However, students entered the course with relatively high self-efficacy in this domain, allowing for less room to grow. When looking at an item-level analysis, one of the genAI items showed a significant increase, suggesting students felt a lot more confident in creating tailored prompts after practicing this skill consistently. A different item showed a significant decrease (evaluating genAI output for accuracy). Students expressed in qualitative responses that they found ChatGPT helpful as a learning tool, but that they also discovered the need for cautious engagement with the tool, as they found some mistakes in its output. This suggests that students may have overestimated their ability to evaluate genAI output initially.
 Fethiye Ozis
Fethiye Ozis
Assistant Teaching Professor
Civil and Environmental Engineering
College of Engineering
Spring 2024
12-333 Experimental and Sensing Systems Design and Computation for Infrastructure Systems (14 week course)
Research Question(s):
- To what extent does utilization of AI tools impact students’ skills for data processing, cleaning, and visualization of large data sets?
- What are the attitudes, perceptions, and experiences of students regarding AI-powered tools for data processing and visualization?
Ozis introduced genAI (PerplexityAI) as a possible support tool during students’ multi-week, big-data group project. Students had the option to use genAI during two of their data cleaning and visualization tasks, one completed individually and one in a team. They were not restricted in how they could choose to use the tool but were given some possible uses, such as a coach to provide advice or a tool to detect outliers in the dataset or to provide code to create data visualization plots in Python.
Study Design:Students could choose to opt into using genAI during their data project, creating a self-selected group of genAI users (treatment) to compare to a group of non-genAI users (control) within the course. Ozis also compared students’ work to a previous iteration of the course in which students were not permitted to use genAI.
Sample size: Self-Selected Treatment (19 students); Self-Selected Control (15 students); Previous-Semester Control (18 students); 12 teams across the three conditions
Data Sources:
- Students final course grades as well as students’ data visualizations, cleaned datasets, and documentation of process, scored with a rubric for ability to clean, analyze, and visualize large data sets.
- Rubric grade for quality of data analysis (following removal of treatment condition and randomization of both semesters’ team deliverable).
- Students’ reflections on how effective, challenging, and rewarding their data cleaning process was and whether or not they used genAI in their process (treatment semester only).
- RQ1: Students who chose to use genAI for their data tasks did not perform differently (as measured by final course grades) than students who never chose to use genAI to work with their data nor students who didn’t have the option to use genAI (Figure 1). Additionally, rubric scores for the quality of teams’ data analysis did not differ across conditions. 
 Figure 1. Students’ grades did not differ statistically, whether they self-selected to use genAI (M = 94.0, SD = 7.3), self-selected to never use genAI (M = 91.3, SD = 9.4), or were required to not use genAI (M = 90.6, SD = 3.2) for their data cleaning and analysis (F(2,49) = 1.23, p = .30). Error bars are 95% confidence intervals for the means.
- RQ2: When given the option to use genAI for data tasks, 44% of students chose never to use genAI. Their reasons revealed critical thinking about the added value of the tool (e.g., it can be inaccurate) as well as confidence in their own data skills. Students who chose to work with genAI primarily used it for guidance alone (i.e., opted to clean their datasets without genAI).
- RQ1: There was no evidence that using genAI (Spring 2024 self-selected treatment) improved or harmed students’ grades in a course that requires cleaning, visualizing, and analyzing data compared to students who never used genAI (Spring 2024 self-selected control and Spring 2023 control). This null result could be due to alternative factors including high course grades across all students, small sample size, weak manipulation strength, self-selection for whether or not to use genAI (Spring 2024), and minimal instruction on ways to leverage genAI for data tasks.
- Teams’ final analysis deliverable was evaluated by a coder who was unaware of the semester and condition. Rubric scores for the quality of their presented analysis did not significantly differ across conditions. Due to the team nature of this final task, the sample size for analysis was extremely small making it difficult to interpret the results as meaningful.
 
- RQ2: Student perspectives about the value of genAI and their confidence in their own data analysis skills could play a role in whether or not a student opts to use genAI when permitted. There is an opportunity for more scaffolding to teach the students the affordances and limitations of this tool to better inform their decisions.
67-262 Database Design and Development (Fall 2024)
Research Question(s):
- To what extent does the source of feedback (instructor vs generative AI) affect Structured Query Language (SQL) assignment and exam performance?
- Does the source of feedback affect student perceptions about the usefulness of and comfort during the feedback session?
- Does the source of feedback impact the development of students’ self-efficacy?
Students uploaded their coding assignment deliverables to a customized, genAI chatbot called the Intelligent Assessor (designed by Sooriamurthi and Tu). Instructors fine-tuned the chatbot using the assignment rubric, their paper detailing the three-step heuristic process of formulating any SQL inquiry, SQL style guidelines, and documentation of mistakes made by previous students. The customized chatbot asked each student questions about their individual assignment responses, probing them to describe their thinking and decision process. For each student, the chatbot created unique follow-up questions encouraging the student to essentially “think out loud”.
Study Design:After completing a SQL assignment, Sooriamurthi and Tu randomly assigned students to debrief and receive feedback on their assignment from an instructor or the customized genAI chatbot. Students then completed another SQL assignment and debriefed in the counterbalanced condition. Following each debriefing session, students responded to questions about the experience and the value of the feedback received.
Sample size: Total sample (33 students, randomly assigned to alternating conditions)
Data Sources:
- Rubric scores from students’ three SQL assignment deliverables
- Pre/post surveys of students’ self-efficacy for working with SQL, administered at the beginning of the course and after both debrief sessions
- Post surveys of students’ perceptions of the value of feedback received and comfort with the feedback interaction, administered after each debrief session
- The source of feedback (instructor vs genAI) did not affect performance on either the SQL assignments or the exam. 
 Figure 1. Students’ performance on three SQL assignments showed a significant main effect of Assignment (F(2, 62) = 8.53, p < .001, η2p = .22) in which performance on Assignment 1 was significantly lower than that on Assignment 2 and 3 (p < .001 and .03, respectively). Time did not significantly interact with Order of Conditions (F(2, 62) = .22, p = .81) were significant. Error bars are 95% confidence intervals for the means.
- Students consistently reported feeling less nervous receiving feedback from the genAI than the instructor, regardless of whether this experience came first or second in the two feedback sessions. In agreement with this, students reported being more comfortable during the genAI feedback session rather than the Instructor feedback session. There was no difference between feedback conditions on how useful they perceived the feedback to be, how much they believed the feedback deepened their understanding, their enjoyment, their intention to make revisions to their process, or their interest in having this type of feedback again.
- Students significantly grew in their self-efficacy from the beginning of the semester to the end of the first feedback session. They maintained their increased self-efficacy for working with SQL to the end of the second feedback session but did not show significant further growth. The source of their feedback (instructor vs genAI) did not affect the trajectory of their self-efficacy growth. 
 Figure 2. Students’ self-efficacy (0-100% confidence) for working with SQL showed a significant main effect of Time (F(1.176, 29.407) = 29.341, p < .001, η2p = .54) in which self-efficacy significantly grew from pre to post Feedback 1 and post Feedback 2 (ps < .001), but the growth from Feedback 1 to Feedback 2 was nonsignificant (p = .29). Time did not significantly interact with Order of Conditions (F(1.176, 29.407) = .78, p = .40). Error bars are 95% confidence intervals for the means.
- RQ1: Debriefing a SQL assignment with a customized genAI chatbot (the Intelligent Assessor) did not affect SQL performance as compared to feedback from a course instructor. Performance was high to begin with, however, meaning there was little room for improvement. It would be useful to test performance in a control condition in which students do not receive feedback in order to determine if both groups in the present study experienced similar positive effects or whether performance did not change at all.
- RQ2 and RQ3: Although performance and self-efficacy did not change, using this kind of customized genAI may be a viable option for giving personalized feedback in large classes. There may be an added benefit of using genAI in this way by reducing student nervousness when receiving feedback (as compared to interacting with a course instructor). Of important note, however, is that this genAI chatbot was carefully fine-tuned to help ensure accuracy for SQL material, to maintain an encouraging persona with students, and to guide students toward understanding rather than giving answers directly.
 Jordan Usdan
Jordan Usdan
Adjunct Faculty
Heinz College of Information Systems and Public Policy
Spring 2024
94-816 Generative AI: Applications, Implications, and Governance (7-week course)
Research Question(s):- To what extent does generative AI impact research and writing:
- efficiency?
- performance?
 
- Are there different impacts of generative AI across writers with different English language proficiencies or other characteristics?
Usdan provided students with multi-class instruction of prompt engineering and ways to use genAI (e.g., ChatGPT), as a tool for summarization, information synthesis, research, explanation, idea generation, and more. Classroom demonstrations of potential student applications included using genAI as a prose assistant, editor, thought partner, and critic. Students also practiced completing course assignments with genAI during class sessions.
Study Design:All students in the course prepared a writing assignment in each of two conditions: first without genAI and then with genAI. While the order of these conditions did not vary, Usdan counterbalanced (randomized) equivalent policy scenarios assigned to students on each assignment to control for the difficulty of the assignment.
Sample size: Total sample (27 students, assigned to control followed by treatment condition)
Data Sources:
- Students’ self-report of writing efficiency, i.e., students tracked the time they spent actively engaged in completing each writing assignment.
- Students’ two writing assignments, scored with rubrics measuring quality of policy recommendations and supporting arguments, integration of external survey results as evidence, and writing style.
- Pre/post survey about students’ writing confidence and perspectives on genAI as an educational tool.
- Post survey about students’ perceived improvement in their writing and attribution of improvement to repeated writing practice versus use of genAI.
- RQ1a: Students’ self-reported time spent on the writing task reduced by 64.5% with the use of genAI, i.e., students spent roughly 1.5 fewer hours on the writing task. 
 Figure 1. Students spent significantly less time preparing their memo assignment with genAI assistance (M = 66.8 min, SD = 29.1 min) than preparing manually without genAI (M = 191.8 min, SD = 130.3 min) (F(1,23) = 23.15, p < .001, ηp2 = .50). Error bars are 95% confidence intervals for the means.
- RQ1b: Based on grading rubrics, student performance significantly improved from an average of B+ (without genAI) to an A grade (with genAI assistance).
- RQ2: Changes in performance and writing efficiency did not significantly differ between English-as-a-second-language (ESL) and English-as-a-first language (EFL) students. However, ESL students initially reported lower self-assessed writing competency than EFL students and this difference disappeared by the end of the semester after writing with the assistance of genAI.
  
 Figure 2. Students earned higher grades when preparing their memo assignment with genAI assistance (M = 88.3%, SD = 10.3%) than preparing manually without genAI (M = 94.1%, SD = 8.0%) (F(1,25) = 4.74, p = .04, ηp2 = .16). This improvement did not differ by English language status (condition x language interaction was nonsignificant: F(1,25) = .12, p = .74). Error bars are 95% confidence intervals for the means. 
 Figure 3. The interaction between English language status and condition on perceived writing competency was marginally significant (F(1,25) = 3.99, p = .06, ηp2 = .14). English-as-a-second-language (ESL) students entered the course with significantly lower perceived writing competency (M = 3.07, SD = 1.03) than their English-as-a-first language (EFL) peers (M = 4.25, SD = .62) (t(25) = 3.49, p = .002, d = 1.35). By the end of the semester, this difference had disappeared with ESL students reporting equivalent perceived writing competency (M = 3.73, SD = .96) to their EFL peers (M = 4.08, SD = .67) (t(25) = 1.07, p = .30). Error bars are 95% confidence intervals for the means.
- RQ1a: Consistent with previously published research, when using genAI, students completed their assignments in less than half the time. However, time on task was self-reported, which may have been inaccurate. In addition, the genAI-assisted writing always came after a manual writing task. Hence, it is possible that students were able to complete the second assignment faster due to practice with the task itself, in addition to the use of genAI.
- RQ1b: Students earned significantly higher assignment grades when using genAI but did not differ by ESL status. However, there is a possible practice effect from doing the assignment a second time that could be responsible for improved performance.
- RQ2: While ESL students entered the class with significantly lower self-reported writing competency than their EFL peers, this difference disappeared by the end of the semester. However, we cannot attribute this to using genAI specifically. It is possible that the repeated writing practice had a greater positive effect on ESL students than on EFL students.
- This study did not measure learning directly (i.e., the study did not ask students to complete an additional assignment to measure transfer of skills and thus the change in learning when genAI was not available). We acknowledge that observed increases in students’ efficiency and performance therefore do not necessarily mean that the students’ skills improved (i.e., when not using genAI).
66-139 DC Grand Challenge Seminar: Reducing Conflict Around Identity and Positionality (14-week course)
Research Question(s):- To what extent does student use of generative AI impact the rate of change in students’ abilities to critically read and analyze academic papers?
- How does students’ self-efficacy as critical readers and generative AI users change over the course of a semester in which students used genAI as a reading support tool?
Walker and Youngs provided classroom training on how to read academic papers as well as how to engineer genAI prompts and evaluate genAI output. Students then used genAI (Perplexity AI) as a reading support tool prior to class discussions by uploading assigned readings and individually engaging with the genAI as a dialogue partner, asking questions to clarify paper content and potential interpretations of the text.
Study Design:Walker and Youngs required every student to use the genAI tool for each assigned reading diary/critical analysis assignment in Spring 2024. The comparison group consisted of students enrolled in the same course in the Fall 2023 semester, who completed the same assignments and did not use genAI. Walker and Youngs compared student responses to reading questions used in both semesters. Student self-efficacy was measured at the beginning, middle, and end of Spring 2024 (genAI semester).
Sample size: Treatment (17 students); Control (29 students)
Data Sources:
- Students’ responses to assigned reading questions (“diaries”), scored with rubrics for academic reading skills (e.g., reading comprehension, metacognition, critical analysis of text).
- Surveys of students’ self-efficacy regarding skills using genAI and course learning objectives administered at the beginning, middle, and end of the semester (Spring 2024 only).
- RQ1a: The rate of development of students’ abilities to critically analyze text across the semester did not differ between semesters when students used genAI to support their reading.  
 Figure 1. Students significantly improved in their critical analysis abilities in both the Fall 2023 and Spring 2024 semesters, as measured through diaries evaluated on 3 rubric criteria (total rubric score range: 3-9 pts.) at the beginning (Diary 1), middle (Diary 3) and end of the semester (Diary 5), F(2,88) = 200.1, p < .001, η2 = .82. Scores for both sections increased from pre to mid, mid to post, and pre to post, all ps < .001. Although diary scores were always higher for students in the genAI condition (Spring 2024) than in the non genAI condition (Fall 2023), F(1,44) = 5.86, p < .02, η2 = .12, students in the genAI condition (Spring 2024) started at a significantly higher level, p < .001, but then leveled off to the same extent of critical analysis as students in the non genAI condition (Fall 2023) by Diaries 3 and 5, ps > .05.
- RQ1b: Students in the Spring 2024 (genAI condition) entered the course with significantly higher confidence in their skills for critically and independently analyzing texts than for using genAI assistance. With repeated practice and targeted instruction on both critical analysis and genAI use, students’ self-efficacy for both types of skill increased to similar, high levels.   
 Figure 2. Spring 2024 (treatment) self-efficacy measurements. Students entered the semester with significantly lower self-efficacy for genAI-assisted reading compared to their self-efficacy for independent-reading (t(15) = 2.48, p = .03, g = .59), this difference was no longer present by the middle of the semester (t(15) = .29, p = .78), nor the end (t(15) = .74, p = .47). Both self-efficacy for independent-reading (F(2, 30) = 26.81, p < .001, ηp2 = .64) and for genAI-assisted reading (F(1.074, 16.107) = 26.52, p < .001, ηp2 = .64) increased significantly between each measurement time (all ps <.001). Error bars are 95% confidence intervals for the means.
Eberly Center’s Takeaways:
- RQ1a: There is no compelling evidence that genAI affected the rate of change in students’ abilities to critically read and analyze academic papers. Although rubric scores in the treatment condition were higher than in the comparison group, this difference could be due to a cohort effect at time 1. Specifically, students in the non genAI condition earned the lowest possible scores at time 1, whereas students in the genAI condition earned significantly higher scores on their first diary assignment. This could be because both instructors and students had an additional semester of experience by Spring 2024 (genAI condition).
- RQ1b: In Spring 2024 (genAI condition), students’ self-efficacy in their ability for independently reading and analyzing articles increased over the course of the semester, as did their self-efficacy for using genAI assistance. Importantly, students entered the course with less confidence in genAI use than independent reading, but exhibited equivalent confidence by mid-semester. Walker and Youngs offered students repeated practice and scaffolded exposure to genAI, suggesting this could be one effective way of building student confidence.
 Rafal Wlodarski
Rafal Wlodarski
Assistant Teaching Professor
Electrical and Computer Engineering
College of Engineering
18-656 Functional Programming in Practice (Fall 2024)
Research Question(s):
- To what extent does using generative AI as a feedback generator improve the quality of student team design deliverables in terms of completeness and correctness?
- To what extent do students' self-efficacy and perceptions of generative AI change across the semester?
- What are students' experiences with a generative AI bot as a feedback tool?
Teaching Intervention with Generative AI (genAI):
Wlodarski introduced students to an instructor-created, customized genAI tool (chatbot) to serve as a personalized tutor available throughout a team project for students (3-4 students per team) to receive individual feedback and explanations of concepts related to domain knowledge (cryptocurrency trading) and the Domain Driven Design framework. This strategy was an attempt to help scale up the frequency of feedback to students on the team project above and beyond what the instructor could provide during class. The instructor provided conspicuous training to students on genAI use.
Study Design:
Wlodarski taught the course during the Spring 2024 (control) and Fall 2024 (treatment) semesters. During Spring 2024, students did not have access to the customized chatbot. Instead, they only received instructor feedback during class sessions. In Fall 2024, students had additional, unlimited access to the customized genAI tool for feedback. Wlodarski compared student performance across semesters and investigated changes in students’ attitudes during the treatment semester. Additionally, Wlodarski analyzed students’ deliverable submission reports for the extent to which teams systematically and transparently integrated genAI feedback into their deliverables.
Sample size: Treatment (35 students - 8 teams); Control (34 students - 9 teams)
Data Sources:
- Student performance on Milestone I (team project) rubric scores for correctness,  completeness. 
- Follow-up analyses of student submission reports, rubric-scored for systematic and transparent articulation of genAI feedback integration (treatment only).
 
- Pre-post survey on self-efficacy and perceptions of genAI (treatment only).
- Student reflections of genAI’s helpfulness and their experience interacting with the tool (treatment only).
- RQ1: The quality of students’ design deliverables, in terms of completeness and correctness, did not significantly differ between the Fall 2024 (genAI) and Spring 2024 (control) semesters. Clear and careful integration of genAI feedback by student teams, based on teams’ self-reports on their process, was correlated with higher Milestone I scores (r(6) = .79, p = .02). 
 Figure 1. Students’ Milestone I completeness scores (out of 5 points) did not differ between S24 (control, M = 3.97, SD = .35) and F24 (treatment, M = 3.18, SD = 1.24), t (7.96) = -1.75, p = .12. Error bars are 95% confidence intervals for the means. Note: Milestone data were assessed at the team level. There were 9 student teams in S24, and 8 teams in F24. 
 Figure 2. Students’ Milestone I correctness scores (out of 3.5 points) did not differ between between S24 (control, M = 2.43, SD = .29) and F24 (treatment, M = 2.58, SD = .56), t (15) = -.66, p = .52. Error bars are 95% confidence intervals for the means. Note: Milestone data were assessed at the team level. There were 9 student teams in S24, and 8 teams in F24.
- RQ2: Students’ self-efficacy for both course learning objectives and use of genAI tools increased significantly from pre to post during the Fall 2024 (genAI) semester (Figure 3). Students' perceptions of genAI tools did not significantly change during that period (Figure 4). 
 Figure 3. In Fall 2024 (treatment), students’ self-efficacy for genAI tool use increased from pre (M = 57.00, SD = 24.68) to post (M = 81.46, SD = 11.16), t (21) = 4.81, p <. 001, g = .99, and increased from pre (M = 41.00, SD = 18.15) to post (M = 76.02, SD = 15.28) for course learning objectives, t (21) = 7.18 p <. 001, g = 1.48. Error bars are 95% confidence intervals for the means. 
 Figure 4. In Fall 2024 (treatment), students’ perceptions of genAI were moderately favorable at pre (M = 3.54, SD = .64) and did not show a significant increase at post (M = 3.79, SD = 1.11), t (23) = -1.07, p = .29. Error bars are 95% confidence intervals for the means. Displayed is a composite score of 7 Likert items that showed high internal consistency at pre (McDonald’s ⍵ = .85) and post (McDonald’s ⍵ = .96).
- RQ3: Qualitative analysis of students’ reported experience suggests they found the genAI-based chatbot tool helpful and indicated that they used the tool frequently, which they reported benefited their engagement and motivation. However, students also indicated finding inaccuracies in the tool’s output and the need for frequent refinement of prompts onerous.
- RQ1: There is no compelling evidence that interacting with a customized genAI tutor impacted team performance for the hypothesized criteria on the projects.  Wlodarski plans to further develop the tool, as students reported several errors in genAI’s feedback suggesting the need for refinement of custom chatbots as feedback assistants.
- Wlodarski’s post-hoc findings from students’ individual reflections further highlight that merely receiving feedback was not sufficient for improving the quality of students’ deliverables. Students who clearly articulated their decision-making process when interacting with a genAI-based chatbot for feedback scored higher on Milestone I. Furthermore, some student reflections reported that they would have benefitted from additional time to consider and execute genAI-based feedback.
 
- RQ2: Although self-efficacy increased in the Fall 2024 (treatment) semester, survey data were not collected during the Spring 2024 (control) section, so we cannot say to what extent the self-efficacy increases in that cohort are attributable to the intervention.
- RQ3: Students’ comments suggest interacting with the bot was mostly a positive experience, although comments also reflect a need for careful guidance and scaffolding, e.g., through specific prompts and repeated practice, in order to have a positive influence on their learning experience. Moreover, it is important for instructors to allocate sufficient time for students to request, refine, and effectively integrate genAI output.
 Bo Zhan
Bo Zhan
Lecturer
Dietrich College of Humanities and Social Sciences
82-171 Elementary Japanese I (Fall 2024)
Research Question(s):
- To what extent does genAI impact novice students’ speaking performance? In other words, do students who practiced with genAI improve at a different rate as compared to their classmates who practiced with their peers?
- To what extent does genAI impact the development of students’ confidence in speaking Japanese?
Teaching Intervention with Generative AI (genAI):
During a class session, Zhan supplied students with general prompts and guidelines for using the genAI tool (ChatGPT) as a speaking partner. Prompts including asking chatGPT for feedback on grammar corrections, structure of responses, vocabulary, pronunciation while simulating a speaking partner. Additionally, during that class session, students used chatGPT to practice speaking Japanese.
Study Design:
Zhan provided the same classroom instruction on Elementary Japanese I (e.g. vocabulary, grammar, etc.) across two concurrent sections of the course. Students in both sections of the course completed a baseline speaking assessment with Zhan to act as a pre-measure of student performance. Students then practiced speaking Japanese with either the genAI tool (treatment section), as described above, or a peer (control section).
This practice took place during a single class session and continued through a 12-minute homework assignment focused on topics that would later be assessed in the post-assessment
Two weeks after the pre-assessment, all students completed the post-speaking assessment, which followed the same format as the pre-assessment and was conducted with the instructor. All speaking assessments were audio recorded and later scored by the instructor using a rubric.
Sample size: Treatment (12 students); Control (11 students)  
Data Sources:
- Rubric scores from recordings of student speaking assessments.
- Surveys of student’s confidence regarding their Japanese speaking skills.
- RQ1: There was a significant increase in students’ performance on speaking assessments from pre- to post- across both conditions. There was a significant interaction between condition and time, which indicates that there is a greater increase in performance from the pre- to post-assessment for students who practiced speaking with the genAI tool as compared to students who practiced with their peers. A similar pattern was found for three of seven subskills: comprehensibility, vocabulary, and grammar. These three subskills had significant interactions between condition and time, with the genAI group demonstrating a greater rate of change as compared to the peer group. 
 Figure 1. There was a significant main effect of time, F (1, 21) = 14.59, p = .001, ηp2 = .41, indicating a significant increase in performance on speaking assessments from pre to post across both conditions. There was a significant time x condition interaction, F(1, 21) = 8.43, p = .009, η2 = .286 suggests that there is a different rate of change between the two conditions with the genAI group improving at a faster rate. While the speaking scores between the two groups are not statistically different at pre, there is a significant difference between the scores of the genAI group (M = 68.60, SD = 13.11) and the peer group at post (M = 58.44, SD = 9.18), F(1, 21) = 4.55, p = .045, η2 = .18. Error bars are 95% confidence intervals for the means.
- RQ2: There was a significant increase in students’ self-reported confidence in speaking Japanese from pre- to post-assessment across both conditions. The lack of significant interaction between condition and time indicates that both groups improved their confidence at a similar rate.  
 Figure 2. There was a significant main effect of time, F (1, 21) = 9.405, p = .006, ηp2 = .309, indicating a significant increase in speaking confidence pre to post across both conditions. There was no significant time x condition interaction, F (1, 21) = .001, p = .973.
- RQ1: These results indicate that speaking Japanese with a genAI tool is a promising opportunity for introductory students to practice these skills. Additionally, the instantaneous and on-demand nature of genAI creates an abundance of asynchronous practice opportunities not afforded by peer-to-peer interactions. One caveat to consider is the “time on task”. It is possible that students practicing with the genAI tool could have continued practicing with the tool outside of class time, above and beyond the homework assignment, whereas it is less likely that students sought out their peers for additional speaking practice.
- RQ2: Both groups increasing in confidence at approximately the same rates suggests that it is speaking practice in general (rather than the particular genAI tool use) that increases student confidence. As introductory students, they are reporting moderate levels of confidence with room for improvement with additional practice.
Projects with Data Forthcoming
 Sébastien Dubreil
Sébastien Dubreil
Teaching Professor 
Modern Languages
Dietrich College of Humanities and Social Sciences
82-304 French and Francophone Sociolinguistics Oral Language and Storytelling (Spring 24)
Generative AI Tool(s) UsedChatGPT
Research Questions
- Do various use cases of generative AI yield different linguistic accuracy and complexity in French students’ writing?
- What are French students’ perceptions of using generative AI to complete writing assignments?
Dubreil introduced generative AI (ChatGPT) as a support for students writing in a foreign language. In one condition, he instructed students to create their initial draft while using AI as a language assistant for support to suggest vocabulary or specific language features (e.g., a rhyme, an alliteration), check the accuracy of sentences, or to edit. In the other condition, he instructed students to use AI as a creative assistant, prompting the AI to create their initial draft. They adjusted their prompting to create three different drafts that the students then refined into a single, final deliverable.
Study design
All students in the course prepared a writing assignment in both AI conditions. Dubriel randomly assigned the order in which students experienced conditions, which counterbalanced the type of AI usage across different writing genres.
Data Sources
- Students' two writing assignments scored with a rubric for linguistic accuracy in vocabulary, grammar, and syntax as well as genre conventions, emotional impact, and originality
- Students’ reflections on their writing process and the quality of their written assignments
- Pre/post surveys about students’ familiarity, competency, and confidence working with genAI
67-272 Application Design and Development (Spring 24)
Generative AI Tool(s) UsedChatGPT, Copilot
Research Question
Does generative AI tool us affect equity in student outcomes, giving less-experienced students a better chance to be successful in technical courses?
Teaching Intervention with Generative AI
Heimann, Bouamor, and Huang introduced generative AI tools (Copilot, ChatGPT) in their course and encouraged students to leverage these tools for solving computer lab assignments and the main course project during the semester. Instructors demonstrated effective generative AI tool use during class to help scaffold students’ learning. They required students to document the frequency of generative AI usage while completing course assignments.
Study design
To gauge students’ level of programming and programming-related experience, Heimann, Bouamor, and Huang surveyed their Spring 2024 students as well as students from the past two iterations of the course (Spring 2022 and Spring 2023) when there was no formal policy for generative AI use and such tools were not as omnipresent in the academic landscape. Then, they encouraged Spring 2024 students to use generative AI tools while completing course assignments. To determine the extent to which generative AI tool use impacts less experienced students, these instructors will compare student work between the past and present cohorts using prior level of experience as a hypothesized moderator.
Data Sources
- Surveys of students’ background programming experience
- Students’ documentation of generative AI tool use frequency during coursework
- Students’ deliverables from coding exercises, exams, and a course project
 Derek Leben
Derek Leben
Associate Teaching Professor
Tepper School of Business
70-332 Business, Society, and Ethics (Fall 2024)
Research Question(s):- What is the impact of debating with generative AI (as compared to debating with a peer) on students’ development of analytical reasoning skills?
- How does student self-efficacy regarding their analytical reasoning and debate skills change throughout the course and does this vary across experimental conditions?
Leben provided suggestions and tips for how to engage in a debate with a generative AI tool, about arguments written by students. Next, Leben had students prompt the generative AI tool to: a) give them objections to their argumentative paper from both the same and different normative frameworks, and b) engage in debate with the students about their arguments.
Study Design:Leben taught three course sections. Leben provided the same classroom instruction on leveraging normative frameworks to design policies across all sections of his course. Students in all sections drafted an argumentative paper for a policy supported with a normative framework. Then, in each section, Leben randomly assigned students to one of two study conditions. In one condition, Leben implemented the generative AI intervention described above. In the second condition, Leben had students work with peers to elicit objections and engage in debate. The cycle of drafting a paper, receiving feedback, and revising was repeated for two paper assignments, with students remaining in the same treatment conditions. Leben will compare data sources across the two groups in which generative AI use was permitted and not permitted.
Data Sources:- Rubric scores for students’ performance on both draft and final versions of two major writing assignments (i.e. argumentative papers).
- Surveys of student’s self-efficacy regarding their analytical reasoning and debate skills.
 Omid Saadati
Omid Saadati
Adjunct Professor
Integrated Innovation Institute
CMU Integrated Innovation Institute 
49-750 Integrated Thinking for Innovation (Fall 2024)
Research Question(s):- How does using generative AI affect students’ industry knowledge as communicated through a team Miro board and a Q&A session with the instructor?
- How does the timing of generative AI use impact the quality of future assignments?
- How many students choose to use genAI on future assignments when the use is optional?
Outside of class time, teams of students leveraged generative AI as a “subject matter expert" to gain insight into their assignment subject, such as analyzing or mapping their assigned industry. For instance, students may use the LLM to help them identify various parts of the commerce value chain. (e.g., retail, payments, e-commerce, m-commerce) for their assigned industry. Saadati provided prompting tips as well as prompt templates for sample questions.
Study Design:Saadati randomly assigned each team of students to use an LLM (i.e., Microsoft Copilot) on either the second or third course assignment. Consequently, students served as their own controls by completing one of two comparable assignments without generative AI, counterbalancing the order of conditions across student teams. Saadati will grade both assignments with the same rubric. Additionally, on at least one subsequent assignment, Saadati will offer all students the choice of using generative AI or not, and will track which students report using generative AI.
Data Sources:- Rubric scores from two team-based Miro Board assignments
- Students’ written reflections following the final course assignment, including whether or not they used generative and why
 Jungwan Yoon
Jungwan Yoon
Senior Lecturer
Dietrich College of Humanities and Social Sciences
76-100 Reading and Writing in an Academic Context (Fall 2024)
Research Question(s):- What is the impact of the use of generative AI for text analysis on students' knowledge of genre-specific discourse and linguistic features?
- What is the impact of the use of generative AI for text analysis on students’ feelings toward writing?
- What is the impact of the use of generative AI for text analysis on students' self-efficacy for producing genre-appropriate text?
Yoon provided students with instructions on how to use generative AI (ChatGPT) as a pedagogical tool to help support their identification and understanding of linguistic features, focusing on genre awareness. Students practiced using this tool for model text analysis during class for certain assignments, prompting the tool to analyze model text looking for specific rhetorical features. Students were then asked to critically evaluate the output to help reinforce their understanding.
Study Design:For certain units in the course, students practiced their text analysis skills using generative AI, and for others, generative AI was not used. Later on in the semester, students were asked to independently complete transfer tasks that corresponded to the learning units for which they either did or did not use generative AI as a practice tool. Yoon will compare students’ performance on these tasks, reported feelings toward writing, and changes in self-efficacy to assess the impact of the generative AI tools.
Data Sources:- Rubric scores of transfer tasks and student generated concept maps compared from before and after each unit.
- Students’ self-reported feelings toward writing.
- Surveys of student’s self-efficacy for producing genre-appropriate text at the beginning and end of the course.
 Peter Zhang
Peter Zhang
Assistant Professor
College of Engineering
19-867: Decision Analytics for Business and Policy (Fall 2024)
Research Question(s):- How does the way in which a generative AI tool is integrated into the course impact students’ ability to engage in critical thinking over technical troubleshooting, particularly in formulating decision questions and translating stakeholder requirements into analytical models?
- How does the way in which a generative AI tool is integrated impact equity by enabling students with varying levels of technical preparation to participate equally in the critical thinking process?
Zhang delivered scaffolded instruction on how to leverage the generative AI tool to solve decision analytic scenarios. During this instruction, students learned about prompt engineering, fine-tuning existing AI models, and how to use a group of generative AI agents to perform a specific data analysis task.
Study Design:Zhang taught two course sections. Zhang provided the same classroom instruction to all students on modeling frameworks and technical topics, such as contextual optimization and optimization under uncertainty. Zhang assigned two study conditions to the sections. In one condition, Zhang implemented the generative AI intervention described above. In the second section, Zhang provided information on general generative AI use without specific guidance on how to apply the tools to decision analysis problems. All students then completed a data optimization course group project. Zhang will compare data sources across course sections in which students were provided with a general introduction to generative AI vs. a more applied and structured approach to using generative AI to solve analysis problems.
Data Sources:- Rubric scores for students’ performance on a data optimization course project, including an assessment of critical thinking.
- Student’s self-reported time on task.
- Surveys of student’s prior experience with the technical concepts and self-efficacy regarding their data analytic skills and their ability to use generative AI to complete analytic tasks.
- Rubric scores for students’ performance on an in-class quiz.






 
                              
                               
          
          
          

