Rafal Wlodarski

Assistant Teaching Professor
Electrical and Computer Engineering
College of Engineering
Fall 2024
18-656 Functional Programming in Practice (14-week course)
Research Question(s):- To what extent does using generative AI as a feedback generator improve the quality of student team design deliverables in terms of completeness and correctness?
- To what extent do students' self-efficacy and perceptions of generative AI change across the semester?
- What are students' experiences with a generative AI bot as a feedback tool?
Wlodarski introduced students to an instructor-created, customized genAI tool (chatbot) to serve as a personalized tutor available throughout a team project for students (3-4 students per team) to receive individual feedback and explanations of concepts related to domain knowledge (cryptocurrency trading) and the Domain Driven Design framework. This strategy was an attempt to help scale up the frequency of feedback to students on the team project above and beyond what the instructor could provide during class. The instructor provided conspicuous training to students on genAI use.
Study Design:Wlodarski taught the course during the Spring 2024 (control) and Fall 2024 (treatment) semesters. During Spring 2024, students did not have access to the customized chatbot. Instead, they only received instructor feedback during class sessions. In Fall 2024, students had additional, unlimited access to the customized genAI tool for feedback. Wlodarski compared student performance across semesters and investigated changes in students’ attitudes during the treatment semester. Additionally, Wlodarski analyzed students’ deliverable submission reports for the extent to which teams systematically and transparently integrated genAI feedback into their deliverables.
Sample size: Treatment (35 students - 8 teams); Control (34 students - 9 teams)
Data Sources:
- Student performance on Milestone I (team project) rubric scores for correctness, completeness.
- Follow-up analyses of student submission reports, rubric-scored for systematic and transparent articulation of genAI feedback integration (treatment only).
- Pre-post survey on self-efficacy and perceptions of genAI (treatment only).
- Student reflections of genAI’s helpfulness and their experience interacting with the tool (treatment only).
- RQ1: The quality of students’ design deliverables, in terms of completeness and correctness, did not significantly differ between the Fall 2024 (genAI) and Spring 2024 (control) semesters. Clear and careful integration of genAI feedback by student teams, based on teams’ self-reports on their process, was correlated with higher Milestone I scores (r(6) = .79, p = .02).
Figure 1. Students’ Milestone I completeness scores (out of 5 points) did not differ between S24 (control, M = 3.97, SD = .35) and F24 (treatment, M = 3.18, SD = 1.24), t (7.96) = -1.75, p = .12. Error bars are 95% confidence intervals for the means. Note: Milestone data were assessed at the team level. There were 9 student teams in S24, and 8 teams in F24.
Figure 2. Students’ Milestone I correctness scores (out of 3.5 points) did not differ between between S24 (control, M = 2.43, SD = .29) and F24 (treatment, M = 2.58, SD = .56), t (15) = -.66, p = .52. Error bars are 95% confidence intervals for the means. Note: Milestone data were assessed at the team level. There were 9 student teams in S24, and 8 teams in F24.
- RQ2: Students’ self-efficacy for both course learning objectives and use of genAI tools increased significantly from pre to post during the Fall 2024 (genAI) semester (Figure 3). Students' perceptions of genAI tools did not significantly change during that period (Figure 4).
Figure 3. In Fall 2024 (treatment), students’ self-efficacy for genAI tool use increased from pre (M = 57.00, SD = 24.68) to post (M = 81.46, SD = 11.16), t (21) = 4.81, p <. 001, g = .99, and increased from pre (M = 41.00, SD = 18.15) to post (M = 76.02, SD = 15.28) for course learning objectives, t (21) = 7.18 p <. 001, g = 1.48. Error bars are 95% confidence intervals for the means.
Figure 4. In Fall 2024 (treatment), students’ perceptions of genAI were moderately favorable at pre (M = 3.54, SD = .64) and did not show a significant increase at post (M = 3.79, SD = 1.11), t (23) = -1.07, p = .29. Error bars are 95% confidence intervals for the means. Displayed is a composite score of 7 Likert items that showed high internal consistency at pre (McDonald’s ⍵ = .85) and post (McDonald’s ⍵ = .96).
- RQ3: Qualitative analysis of students’ reported experience suggests they found the genAI-based chatbot tool helpful and indicated that they used the tool frequently, which they reported benefited their engagement and motivation. However, students also indicated finding inaccuracies in the tool’s output and the need for frequent refinement of prompts onerous.
Eberly Center’s Takeaways:
- RQ1: There is no compelling evidence that interacting with a customized genAI tutor impacted team performance for the hypothesized criteria on the projects. Wlodarski plans to further develop the tool, as students reported several errors in genAI’s feedback suggesting the need for refinement of custom chatbots as feedback assistants.
- Wlodarski’s post-hoc findings from students’ individual reflections further highlight that merely receiving feedback was not sufficient for improving the quality of students’ deliverables. Students who clearly articulated their decision-making process when interacting with a genAI-based chatbot for feedback scored higher on Milestone I. Furthermore, some student reflections reported that they would have benefitted from additional time to consider and execute genAI-based feedback.
- RQ2: Although self-efficacy increased in the Fall 2024 (treatment) semester, survey data were not collected during the Spring 2024 (control) section, so we cannot say to what extent the self-efficacy increases in that cohort are attributable to the intervention.
- RQ3: Students’ comments suggest interacting with the bot was mostly a positive experience, although comments also reflect a need for careful guidance and scaffolding, e.g., through specific prompts and repeated practice, in order to have a positive influence on their learning experience. Moreover, it is important for instructors to allocate sufficient time for students to request, refine, and effectively integrate genAI output.