Generative AI Tools FAQ
The rapid evolution of AI tools like ChatGPT, DALL·E 2, and GitHub Copilot, along with their widespread media coverage and ever increasing ubiquity within other tools, raises important questions for teaching and learning. In what ways are GenAI tools opportunities or threats? When is it appropriate for students to use them? Do instructors need to adapt their approaches?
While such questions aren’t entirely new, the answers are complex and depend on disciplinary norms, course goals, assignments, and student backgrounds. Unsurprisingly, recent advances in AI have sparked both excitement and concern among instructors.
In response to inquiries from CMU colleagues, we’ve compiled a list of frequently asked questions. Our answers draw on evidence-based, inclusive teaching strategies, CMU policies, and the current state of AI tools. We approached this work with two guiding beliefs:
- The challenges and opportunities posed by AI may feel daunting, but instructors can and do continue to teach effectively.
- Students will not automatically make dishonest choices simply because new technologies exist.
We hope this resource supports thoughtful, intentional teaching with evolving technologies. To discuss your specific context or to share a use case that could inform this resource please contact us at eberly-assist@andrew.cmu.edu.
1. What are generative AI tools?
Artificial Intelligence (AI) tools can generate art, computer code, or natural language which may often be difficult to distinguish from human work. Using a range of parameters, you can prompt a given tool to instantly respond to simple and complex requests, like writing comparative essays, producing the steps for some math problems, writing or fixing bugs in code, generating an image, and writing emails.
Perhaps the most well known new AI tool is ChatGPT, a “chatbot” which can produce writing and code, including common assignment forms such as reading reflections, generating and/or fixing code, or research essays. If it cannot fulfill a request, it will respond with clarifying questions. Because ChatGPT is not a “mind,” but a model trained on existing writing, its results may sound convincing but be factually incorrect. And the tool cannot be used to assess the validity or sourcing of information that it has generated. (More reliable research alternatives are collected in this document from CMU LIbraries.) ChatGPT is being updated rapidly, and many other AI chatbots are coming on the market, including some specialized for programming languages.
It is important to know that students may use these tools in more creative ways than simply generating a full, final product. Anecdotal evidence from students across the country suggests that students are using AI tools to look up information, create study guides, brainstorm possible topics or approaches to problems, write outlines or pseudocode, and polish or correct the English grammar of their own written work.
Because AI tools can generate natural language, functioning code, or artistic products, they can be concerning for instructors who rely on assessing these forms of student work. It is important to continue to be intentional in your teaching choices, because being reactive (only) isn’t sustainable in our technologically advancing world. You can take this opportunity to think deeply about your courses, and the kinds of knowledge, skills, values, and attitudes you want your students to develop. As with any new technology – such as graphing calculators, search engines, or Wikipedia – the fundamental principles of how learning works and what we can intentionally do to facilitate learning remains largely the same. If you would like to discuss your courses in light of emerging AI tools with an Eberly Center consultant, please contact eberly-assist@andrew.cmu.edu.
2. How should I write an academic integrity policy that includes generative AI tools?
The university’s policy on academic integrity is not changing as a response to generative AI tools. According to CMU’s Office of Community Standards and Integrity, CMU’s academic integrity policy already implicitly covers such tools, as they may be considered a type of unauthorized assistance. However, this policy is intentionally designed to allow instructors to define what is “authorized” vs. “unauthorized” and what constitutes “plagiarism” and “cheating.” We recommend that instructors carefully examine their own policies to make these distinctions clear in both writing and verbally to students.
Please see these example syllabus policies which include prohibiting and encouraging generative AI use as well as blend of these policies.
Note that expectations vary between instructors and what is considered authorized assistance in one course may not be acceptable in yours; clarity is key, especially because expectations, norms, and policies may vary across instructors.
We recommend that you adopt an academic integrity policy that considers the following:
- Whether or not AI tools are considered authorized or unauthorized assistance and in what circumstances.
- How students should cite assistance from or content/ideas generated by either AI tools or humans.
Additionally, consider doing the following:
- Engage students in transparent discourse about the rationale behind your policies, learning objectives and assignments.
- Discuss academic integrity with students, including its importance and your expectations. Don’t assume that students know what is acceptable and not in your context. Norms change across disciplines, cultures, and courses. Sometimes this is referred to as “the hidden curriculum.”
- Include improvement across assignments, rather than performance alone, as a component of how you calculate grades.
- Talk to your students about AI tools as they relate to all of the above.
For more general help, see this resource on crafting your own academic integrity policy.
3. How should I talk with my students about the use of generative AI tools in my course?
Each semester, students take multiple courses, all of which may have different expectations and policies. It is important to talk with your students about your own expectations so they don’t make incorrect assumptions. If your policies are not explicitly stated, then your students may NOT be able to effectively interpret what kinds of AI use are authorized or not in your course (for example, in cases of suspected violations of academic integrity). See FAQ 2 for what you can consider as you write an academic integrity policy.
There are several access points to start a conversation. We encourage you to be curious and have an open mind when discussing AI tools with students, rather than assuming or focusing on the worst-case scenario.
Communicate the purpose of your assignments and why they will benefit your students’ learning.
Knowing why they are doing an assignment can help students to increase their motivation. How does the assignment connect to your course’s learning objectives and to the world beyond your course. Describe the skills you want your students to practice and/or the knowledge you’d like them to gain through completing an assignment. Alternatively, for larger assignments, you could ask students to think about the purposes of the assignment in small groups or as a class before you share your perspective. You can also give students time to reflect on how the assignment connects to their personal or professional goals or values.
Convey confidence in your students' ability.
Let them know your goal is to support their learning. Discuss your course learning objectives and why they matter. Provide positive encouragement that students can succeed with effort, good strategies, and support (see Promoting a Growth Mindset).
Talk about academic integrity early on and why it's important.
Define what plagiarism, unauthorized assistance, and cheating look like in the context of your course because students may assume another course’s policies are the same as yours or may have a different cultural understanding of academic integrity. Provide examples of what kinds of work is appropriate and not. Use AI tools, like ChatGPT, as an example and discuss the ways in which it can be appropriately used (if any) in the context of your course and discipline.
Ask students about their experiences with AI tools generally.
Have they heard of them? Do they know what they do? For many, these are exciting, fun tools. Acknowledging that can provide a way to connect with your students. If students are familiar with AI tools, what kinds of prompts have they plugged into them and what did they think of the responses? Try playing with an AI tool yourself using both class-related and non-class-related prompts so you can also share about your experiences.
Be transparent about why AI tools are concerning or exciting to you in the context of your course.
This is an opportunity to explain how your assignments are structured to help students develop key skills and expertise, and how the use of AI may disrupt or enhance this process, either helping or hindering student development in the short and long term. Articulate for your students the inherent inequities that arise when some students are generating their own work for the class, while others are automating that labor. Being transparent about the purpose of your policies around academic integrity and assignment guidelines helps students understand why they are beneficial, rather than arbitrary.
Give students multiple opportunities or means to ask questions about academic integrity.
Starting from the perspective that students do not want to cheat, allow students to ask questions about academic integrity and AI tools without judgment. This can be as simple as inviting questions using any of the above approaches and by encouraging students to contact you outside of class or in office hours. Remind students that it’s better for them to ask you well before an assignment is due than to operate from a place of uncertainty and anxiety as they are trying to complete it.
4. How can I design my assignments to facilitate students generating their own work?
Regardless of the technological environment, the first thing to consider in assessment design is always whether what you are having students do aligns with what and how you want them to learn. Be transparent with your students about why your assessments are designed to support their learning and help them develop the skills and patterns of thought that they will want to rely on in their future professions. Additional structure that illuminates the how of their assessments, like grading criteria, evaluation rubrics, and assignment briefs, will often make it easier for your students to engage in the work than not.
Scaffold assessments:
Break your assessments into smaller pieces that will build on each other. The final product could be a culmination of the prior components, which also had the chance to benefit from formative feedback and low-stakes evaluation. Alternatively, consider requiring multiple drafts and value improvement across drafts in response to your feedback. Provide in-class time for students to work on these components but allow them to expand on or refine their work outside of class as well. Rather than focusing on the product alone, scaffolded assessments like this prioritize the process of generating the final deliverable. Students are less likely to turn to quick solutions for a deliverable if they’ve already put in considerable time and effort, and gained their own expertise, to the point where they feel confident in their own ability to do quality work in a timely fashion.
Schedule assessments to balance workload:
Students may turn to AI tools if they are feeling stressed, overwhelmed, unsupported, or out of time. Even if they are motivated and engaged, external pressures and incentives can often lead them to make choices that save time rather than enhance their learning. Consider timing assessments to take place outside of typical exam weeks (see also Assign a Reasonable Amount of Work). Prioritize preserving students’ breaks (for wellness) and giving a longer time horizon for when items are due. Build in some time in-class for students to work on assignments or projects.
Focus on process:
Ask students to explain their process and reflect on their own learning. This could look like:
-
a reflection checklist or rubric
-
a list of specific steps they took, what they could have done differently, and why
-
annotations on an assignment or deliverable justifying the creative choices they made (or a separate deliverable reflecting on and referencing specific aspects of their previous work).
You might consider assessing students on how much they have improved rather than on one instance of their performance, which traditional exams and final papers often do. This could mean awarding additional points for students who are able to articulate mistakes, why the mistake was made, and how they can avoid them in the future.
Design assessments to make learning visible through connections:
In his 2013 book Cheating Lessons: Learning from Academic Dishonesty, James Lang defines “original student work” as that in which students “create an original network of connections.” This network can be made from various sources that the student is uniquely positioned to curate (e.g., information presented in the course, from other courses in their curriculum, their personal experiences, and external sources they’ve encountered), and is helpful for learning as those connections between their prior knowledge and new course content make students more likely to remember and be able to apply the new information in the future. This idea can be leveraged in assessment design, where the emphasis is less on the originality of the ideas students are generating (countless scholars have already analyzed the same poem, or have written a similar line of computer code and their work is out there), and more on how students relate these disparate ideas to one another. This can be accomplished by reframing assignment descriptions and rubric criteria, as well as considering the types of deliverables that best align with the learning objectives and which allow students to demonstrate the original network they’ve created (see also Designing Aligned Assessments). Remember that providing an environment of positive support, which instills in your students the confidence to generate their own unique and successful ideas, can go a long way in promoting students’ motivation.
Provide choice, creativity, and flexibility for assignments:
Students may turn to plagiarism or AI tools when they lack motivation to complete assignments. One way to increase the value perceived by students (thus increasing their motivation to complete them authentically) is to provide more choice on the assignment deliverable. For instance, if your goal is to assess how students synthesize, evaluate, and communicate about multiple sources, some students may choose to write an essay, while others could demonstrate those same skills in a video or by designing an infographic, as long as the deliverable demonstrates the required learning objectives. Consider the component skills your assignment is targeting and what competencies students must demonstrate. Then design an assignment prompt that includes these skills, but which allows more choice in what the final product looks like. Finally, design a rubric with criteria that are agnostic of the form of the deliverable. Click here for additional examples and considerations for designing assessments that allow for student choice.
Avoid over-reliance on hand-written deliverables, in-class evaluations, or oral exams and presentations:
We do not recommend drastically changing your assessments to exclusively or excessively rely on the aforementioned approaches as a reaction to concerns regarding generative AI tools. While one or more of these approaches may appear to be a simple solution, these changes could raise more difficulties than they solve, particularly for reasons of equity and inclusion. For example, some of these approaches may inadvertently disadvantage English language learners or students requiring accommodations for disabilities (see also FAQ 5). Prioritizing student success means providing an environment where everyone has an equitable opportunity to demonstrate their capabilities. Timed, hand-written exams, for example, may disadvantage students who know the material well, but are unable to hand-write their answers quickly. Oral presentations may put extra stress on students with anxiety, who then are faced with additional challenges which their peers do not face. Does completing a writing assignment during a class session, without the ability to adequately revise while drafting, authentically and fairly assess written communication skills across students? We suggest reflecting on who will be advantaged or disadvantaged by particular assessment choices and how well they align with your highest priority learning objectives. Ultimately, a mix of assessment approaches and providing support for student success maximizes equity and inclusion.
Don’t necessarily redesign all assessments to focus on the perceived, current limitations of AI tools:
All new technologies have limitations. However, limitations change over time as technologies are refined by their developers. The capabilities of AI tools have evolved rapidly. A tool may now (or soon) perform well on a task on which it performed poorly six months ago. For example, natural language generators, like ChatGPT, are trained on historical data that needs to exist and be available to the tool online prior to the training. When it was originally released, Chat GPT was not particularly good at responding to prompts about current events. It also struggled to cite peer reviewed literature accurately, could not leverage data that was protected by paywalls, such as JSTOR articles, and could not reference classroom discussions. Nevertheless, AI technologies evolve just as the new data they train upon and generate evolves. Therefore, some of the limitations of ChatGPT described above have changed and will continue to change over time. Consequently, designing assignments around any current limitations of an AI tool may be a temporary solution, but not a sustainable one. Instead, consider some of the strategies discussed previously.
5. What equity and inclusion considerations should I be thinking about?
Unauthorized assistance, cheating, and plagiarism create inequities; all are unfair to students who do the work themselves. However, some approaches to preventing academic dishonesty, or students’ use of AI tools, may inadvertently create inequities or marginalize some students. All teaching strategies have pros and cons, so we recommend that you consider potential implications for equity and inclusion.
Designing assessments:
Some assessment strategies directly support equity and inclusion. These include providing student choice when appropriate in assignment deliverables or topics, varying the type of required assessments or deliverables, strategically leveraging low stakes assessments, scaffolding high stakes assignments to include milestones and drafts, and more (see also How to enhance inclusivity and belonging in teaching and Creating a Welcoming Classroom Climate. However, other strategies may disadvantage certain students. For instance, some commonly discussed potential strategies to eliminate the use of AI tools include intentionally shifting assessment designs toward hand-written deliverables, in-class evaluations, or oral exams. Relying exclusively or excessively on these approaches may prevent English language learners or students with disabilities requiring accommodations from fully demonstrating their learning. Additionally, in-class writing or other time-limited assessments may not align well with learning objectives. Adopting such approaches may result in assessing students’ speed more than their true competency. Consider whether speed is a high priority learning objective or a fair assessment criteria across your students. For additional support on determining how assessments may impact students with disabilities or how to make appropriate accommodations for CMU students, please contact the Office of Disability Resources.
Considerations for choosing resources:
Requiring students to purchase particular texts or resources (e.g., the newest editions of textbooks, subscriptions to educational cases including newspapers, or purchasing sample data) to avoid the expertise of certain AI tools may disadvantage students with limited resources and cause undue financial burdens. Consider how you can provide such resources for free through Canvas or University Libraries. Additionally, if you are encouraging use of AI tools, consider whether or not they are digitally accessible to all learners. Please carefully consider the legal considerations of requiring students to use AI tools in your courses (see FAQ 8)
AI tool output may be biased:
AI tools will reproduce any latent biases in the data on which it was trained. Depending on the prompt, AI tool output can directly cause harm to underrepresented or marginalized students via microaggressions. Also, relying on these tools will inherently bias student work towards mainstream, existing ideas, if the data they train on is biased or not representative of underrepresented or marginalized student identities. Consequently, some applications of AI tools for education may be at odds with efforts to center diversity, equity, inclusion, and belonging. Careful consideration must be given to how to still engage with marginalized ideas or viewpoints.
6. How can I integrate generative AI tools like ChatGPT into my course?
Remember that AI Tools are web resources, and like other such tools may not always be accessible to you or your students. Before planning any activities or assignments using this tool, ensure you and your students can go online and successfully and equitably access it (see also FAQ 5). Additionally, carefully consider the legal considerations of using or requiring AI tools in your course (see FAQ 8).
As with any new technology, there are often exciting avenues for new or enhanced learning experiences. It is important to be transparent with your students about the purpose you have in mind. Let them know the best way to approach the technology to maximize their learning. Try connecting this purpose to one of your existing learning objectives.
What generative AI tools have been vetted by CMU?
A growing list of tools have been vetted by CMU that are FERPA compliant and therefore able to be used for teaching and learning purposes. GenAI tools currently licensed by Computing Services include: Microsoft Copilot, ChatGPT edu, Google’s Gemini, and NotebookLM.
IMPORTANT NOTES FOR MAINTAINING FERPA COMPLIANCE:
- Individuals must be logged in as instructed via CMU authenticated mechanisms.
- Not all tools listed on Computing Services site are FERPA compliant and are typically indicated as such (see CMU’s Google Workspace for Education webpage here showing “Core” vs “Additional” services). If you are uncertain about whether or not a tool you are using or want to use is meeting these privacy and legal requirements, don’t hesitate to contact us.
- If you wish to use tools that are not yet FERPA compliant (i.e., any tool that the university has not vetted and approved as FERPA compliant), reach out to us to get the tool vetted.
Make a clear statement in your course syllabus about the use of these tools
If you are allowing or encouraging the use of generative AI tools, consider adding language to your generative AI syllabus statement to let your students know that Microsoft Copilot is a tool vetted by the university for compliance with the Americans with Disabilities Act (ADA) and the Family Educational Rights and Privacy Act (FERPA) and for other considerations.
Sample syllabus language: Microsoft Copilot provides data protection when accessed with your AndrewID. Unlike open commercial tools, Microsoft will not retain your prompts or responses to train its AI models when using CMU's licensed version and appropriately logged in with your Andrew credentials.
If you would like to integrate AI tools into your course, here are some ideas:
Explore the limitations:
Let your students explore the capabilities and limitations of AI generation. Guide them on big questions surrounding what defines things like communication and interaction. For example, if a GenAI tool like ChatGPT writes your emails for you, are you really communicating? Have your students think about the nature of the data an AI tool pulls from and its intersections with ethics. For example, what is the range of “inappropriate requests” and why? What might your students want to change about an AI tool to make it more useful for their lives? What does it mean to create with or without AI assistance? How might the use of an AI tool enhance equity or create inequities?
Spot the differences:
Prepare a class session where students attempt to identify differences between two pieces of writing or art or code, one created by their peers, and the other created by an AI. In advance, choose a set of prompts to provide to small groups of students to input into the AI. For example, ask students to request a paragraph, email, or poem from a GenAI tool in a particular style or from a certain perspective to a specific audience on a topic. Next, ask each group to write their own response to a different prompt and collect them. Then match the student- and AI-generated responses to the same prompt. Give each pair to a group. Be sure you don’t give students the same prompt that they wrote on. Challenge your students to identify differences in tone, clarity, organization, meaning, style, or other relevant disciplinary habits of mind, as well as which sample was AI-generated.
If you’re teaching a math course or computer science course, input some homework problems, and have your students critique where the AI succeeds or not (and how it could improve) or articulate alternative solutions. Can your students determine whether code was written by humans or an AI?
Facilitate discussion:
Have students prompt a GenAI tool to generate discussion questions for the next class session, then have students create their own responses to those questions. The GenAI tool can also ask follow-up questions and responses of its own, and students can continue their discussions with AI assistance. This approach to discussion facilitation could work well in small groups first, with a large group debrief afterwards. This helps students engage and learn about the topic while fostering and sustaining discussion, but it will also bring up interesting secondary questions. For example: Will the small groups have all learned and discussed the same things? Different things? Did the GenAI tool lead some groups off topic?
Language prompts:
Assign a topic and let your students come up with different ways to input it into a GenAI tool. Then task them with writing the same thing, but in a different way. Ask your students to explain their decisions. How might they change the language? Why? What rhetorical strategies could make it sound better, worse, more beautiful, more parsimonious, or more confusing? Have your students take on the role of an instructor and “grade” the GenAI tool on its output.
Generate samples for students to critique:
Have your students enter your assignment prompt into a GenAI tool. Then ask them to use your grading criteria/rubric to evaluate the output that the tool generates. This can be a helpful way to provide “sample” work to your students who may be looking for examples or curious about what a “good” and “bad” version of the deliverable looks like. You can also include your own comments and critiques and use the AI-generated output like you would use an example of a past student’s work. This approach not only enhances transparency of grading criteria, but also helps students practice and get feedback on necessary skills.
Have fun:
Have a GenAI tool write an academic integrity policy forbidding its use. Ask it to write an email to students’ pets. After requesting that it write in another language, compare the output to other translation algorithms. Input an unsolvable math or coding problem into a GenAI tool. Be creative! Regardless, talk about what it means to do things “the human way;” have your students make a list of all the things they would rather do than have an AI tool do for them, then have them ask an AI tool to write up that list and compare!
See additional ideas for classroom learning activities leveraging AI tools that generate code or text and considerations for responsible use of AI tools (created by colleagues in the Heinz College of Information Systems and Public Policy).
7. How can I tell if students are using generative AI (e.g., what detection tools are available)?
The Eberly Center recommends extreme caution when attempting to detect whether student work has been aided or fully generated by AI. Although companies like Turnitin offer AI detection services, none have been established as accurate.
In addition to false positives and false negatives, detection tools may often produce inconclusive results. A detection tool can provide an estimate of how much of a submission has the characteristics of AI-generated content, but the instructor will need to use more than just that number to decide whether the student violated the academic integrity policy. For example, an instructor will need to have a plan for how to proceed if a tool estimates that 34% of a submission was moderately likely to be AI-generated. Even very strong evidence that a student used AI may be irrelevant unless you have a clear academic integrity policy establishing that the student’s use of the AI tool constitutes “unauthorized assistance” in your course. Furthermore, research suggests that the use of detection tools may disproportionately impact English language learners.
Until (and after) robust and stable AI detection tools are available, we recommend that you consider the variety of instructional design and teaching strategies provided in this resource. You might first want to consider if AI output will pose a problem for your teaching and learning context. Start by trying out a few applicable tools using your assignment prompt to see if you need to make any adjustments (see FAQ 4).
8. What legal implications should I consider before using an AI tool in my course?
The university vets teaching technologies for pedagogical value, compliance with both the Family Educational Rights and Privacy Act (FERPA) and the Americans with Disabilities Act (ADA), security, and stability. Before using any technology tool or app, including AI tools, ensure that its use falls within the university’s legal guidelines.
For more information or help finding a vetted tool that fits your and your students’ needs, please contact eberly-assist@andrew.cmu.edu to schedule a consultation.
Can I require students to use an AI tool to complete an assignment?
We encourage you to use a CMU vetted generative tool (see FAQ 6). As long as these tools are used with an Andrew email/ID, students’ data will be kept confidential.
For example, when using Co-pilot, this statement should appear near the prompt entry box:
Note that this will NOT appear if you do not sign in with your Andrew email!
Consumer tools that have not been vetted through the university and/or are not FERPA compliant (i.e., student work, which is part of their academic record, is being shared with third-party individuals and/or platforms), means that students cannot legally be required to create an account to use an AI tool for course assignments or work. Note that this is and always has been true for any third-party, unvetted platform or app! Therefore, if you plan to use such tools in your course, you will need an alternative plan for any student who does not want to create an account.
Consumer tools that have not been vetted by the university and/or are not FERPA-compliant (i.e., student work, which is part of their academic record, is being shared with third-party individuals and/or platforms) means that instructors cannot legally require students to create accounts and/or use these tools to complete course assignments.
Note that this has always been true for any unvetted third-party platform or app. Therefore, if you plan to use these tools in your course, you must provide an alternative for students who choose not to create an account.
For example: If the original plan was for students to, individually, enter a prompt question into a non-FERPA compliant tool and analyze the response generated, then here are just two types of alternatives you could offer:
If you would like to discuss ideas or alternative assignments for your course(s), please contact eberly-assist@andrew.cmu.edu to schedule a consultation.
Can I prohibit the use of AI tools in my course?
As the instructor, you are allowed to prohibit the use of AI tools in your course. (See FAQ 2 on how to write an academic integrity policy that includes generative AI tools.) If you choose to do so, make sure to be transparent about why you are not allowing their use (see FAQ 3 for examples on how to talk with your students about the use of AI tools), and remember that detection is extremely limited, so enforcement may be both difficult and unreliable (see FAQ 7 for more explanation).
Can I encourage students to purchase a subscription to a particular generative AI tool?
Please consider whether the cost of the tool may disadvantage students with limited resources and cause undue financial burden. Additionally, if you are encouraging the use of AI tools, consider whether or not they are digitally accessible to all learners. (See FAQ 5 for other equity considerations.)Eberly Center consultants can meet to discuss the particular context of your course and, if necessary, help you navigate the process. Please contact eberly-assist@andrew.cmu.edu to schedule a consultation.
What kinds of student data or work can be shared with an AI tool platform (and by whom)?
The main concern with sharing student work relates to students’ privacy rights. FERPA protection begins after an instructor accepts an assignment for assessment and grading. If you are permitting the use of generative AI in your course, that is not FERPA complian, students are allowed to (but, again, should not be required to) submit their own work into the AI tool. However, you should set up an assignment workflow where students export their course work from the tool (e.g., saving the work as a PDF), which they then submit for review and grading using a FERPA compliant tool like Canvas. The instructor should not need to access the AI tool directly to see or grade the submitted work.
Some instructors may be interested in using an AI tool to grade student work. If you choose to do so, every effort must be made to anonymize the student’s work by not connecting that work to any Directory Information on the student (see the FERPA guidance for examples of Directory Information).
If you would like to discuss various tools to assist with grading, please contact eberly-assist@andrew.cmu.edu to schedule a consultation. See also FAQ 9 What should instructors consider before using GenAI for grading and feedback?
For more on the use of AI tools for various academic contexts, please see the Guidance Memorandum from University Contracts (current as of Summer 2023).
What legal issues about using outputs generated by AI should my students and I be aware of?
In terms of copyright, the US Copyright Office and recent court decisions have stated that the originator must be a human being to claim copyright protection, so work made by AI will not be considered copyrightable.
Legal principles apply to generative AI outputs. An AI tool may have inherent biases and inaccuracies from its training, and using problematic statements generated by AI can subject users to legal liabilities.
How can the use of generative AI tools be acknowledged and cited?
OpenAI provides an example of how to acknowledge the use of ChatGPT: “The author generated this text in part with GPT-3, OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.”
Several style guides also offer instructions on how to cite content created by generative AI tools including: APA | Chicago | MLA
NEW! 9. What should instructors consider before using generative AI for grading and feedback?
The introduction and evolution of generative AI (genAI) tools continues to disrupt higher education in ways that are challenging to navigate. These tools seem to offer potential ways to enhance or circumvent learning, depending on how they are used and to what end. In our FAQs page, we covered common questions around student use of genAI tools, including how to decide whether to incorporate them deliberately into the classroom and how to write a syllabus policy. However, it is equally important to consider the use of genAI by instructors and TAs, including possible affordances, concerns, and how this will be communicated to students, especially when it comes to grading and delivering feedback.
Here we provide a heuristic that can help instructors decide whether or not to incorporate genAI tools, in particular CMU vetted/FERPA compliant GenAI tools, into their grading and feedback processes. We acknowledge that LLMs and genAI technologies are continually evolving rapidly (e.g., ChatGPT 4.0 continues to improve and AI agents are coming into play), but still need further research. Each question below links to its own section with research and recommendations.
- What are your motivations for using genAI for grading or feedback?
- Do you have enough materials and time to fine-tune the tool to provide high quality feedback? How will you review its output for reliability, accuracy, and fairness?
- What other tools or strategies could accomplish the task?
- If you have TAs, what will you communicate to them about the use of genAI for grading or feedback? How will you train them to appropriately use genAI for this task?
- What are the legal, ethical, and privacy concerns of students that you should think through before using the tool?
- How will you communicate your decisions to use genAI in grading and feedback (or not) to students? Does your use of it align with your policy for student use of genAI?
1. What are your motivations for using genAI for grading or feedback?
Many instructors and TAs may be drawn to genAI because of a belief that it can streamline the grading and feedback process, thereby saving them time and effort, as well as getting feedback to students sooner. While this sounds desirable, if true, it does not necessarily suggest a reduction in instructor and/or TA responsibilities, or a reduction in the number of TAs hired. Any increase in efficiency from leveraging technology raises the question: What opportunities does this create for instructors and/or TAs to interact with students differently and more impactfully to further enhance learning?
Regardless, genAI use for grading and feedback raises additional questions about quality assurance, such as “How would using genAI impact the reliability of grades, the quality of feedback, and/or student learning?” Research on genAI tools and their ability to accurately grade and provide effective, quality feedback is still emerging, but initial results suggest instructors and TA should be cautious about how and when to use it, if at all. Here are a few emerging trends:
GenAI feedback can potentially provide helpful, “on-demand” feedback for students as a supplement to instructor/TA feedback.
For instructors who want students to be able to get additional, formative feedback during practice or on draft work, genAI automated feedback systems could potentially be another tool for students to use as part of their learning process. Although we lack sufficient experimental data to confirm its utility, some initial studies have integrated genAI as a digital tutor, to give feedback in class and outside of class (Hobert & Berens, 2024; Li. et al. 2023). These studies were not seeking to replace instructor feedback on major deliverables, however, nor did they include a control group without genAI to test whether student improvement was due to genAI specifically.
GenAI feedback is a poor replacement for instructor/TA feedback for writing-based, subjective, or complicated assignments.
For standardized tests with objective, finite responses, genAI has been found to have high consistency with human scorers. However, it has low consistency with exams that feature more subjective or nuanced answers, such as essays, or high levels of complexity such as assessments with video components (Coskun & Alper, 2024).
GenAI has moderate consistency in scoring quantitative work, such as problem sets and coding, but with some important caveats. One analysis of GPT-4’s grading ability on handwritten physics problems found decent agreement compared to human graders (Kortemeyer, 2023). However, the authors note that even after running every solution through the genAI tool 15 times, it still gave incorrect or misleading statements as part of its feedback, awarded more points than the human graders, and was less accurate at the low end of the grading scale. Similarly, another study compared ChatGPT-3.5 to human graders on Python assignments (Jukiewicz, 2024). Like the previous study, they ran each set of assignments through the tool 15 times to gauge consistency. Once again the agreement with human graders was good, but the genAI tool consistently awarded fewer points compared to human graders. In another study, however, the authors found even lower agreement between ChatGPT-3.5 and Bard on Python problem sets compared to human graders, with an overall accuracy rate of 50% (Estévez-Ayres et al. 2024). Kiesler et al. 2023 also noted that ChatGPT’s feedback on programming assignments for an introductory computer science course contained misinformation 66% of the time, which would be particularly problematic for novice learners, since they would likely be led astray.
For writing-based assessments more generally, studies have consistently found large discrepancies (i.e., low reliability) between genAI and human scorers. In one study comparing human and genAI scoring of formative feedback on writing, genAI only scored equivalently well on quantitative criteria-based feedback (Steiss et al. 2024; see also Jauhiainen et al. 2024). However, human feedback was of higher quality because it was more accurate and actionable and better prioritized and balanced. These four features are evidence-based characteristics of effective feedback (Hattie & Timperley, 2007). Additionally, genAI feedback quality varied based on essay score, with greater leniency demonstrated on lower quality writing, and overly strict responses on papers that scored above average (Wetzler et al., 2024). Research has also shown that genAI has difficulty with certain types of cognition, such as analogies and abstractions (Mitchell, 2021), as well as nuances and subtle variations in the subject material (Lazarus et al. 2024).
GenAI grading accuracy varies considerably based on the specific tool used.
The genAI tool used also matters: one study found that ChatGPT 3.5 and 4o graded the same essays differently, despite being fine-tuned on the same prompt and rubrics. Both models were significantly different from human scorers (Wetzler et al. 2023). Even when the same genAI tool (ChatGPT-4) was tested 10 times with the same data (assignment prompt, rubric, and student work), the same grade was assigned only 68% of time (Jauhiainen et al., 2024).
Another study compared how well ChatGPT, Claude, and Bard could accurately score and provide feedback on both undergraduate and graduate writing samples (Fuller & Bixby, 2024). The rubrics and writing samples were run through ChatGPT and Claude five times each to assess consistency in scoring and feedback. Bard was not able to complete the tasks: although it scored the samples initially, on the second iteration it responded that it was not able to be used for assessment purposes. It was therefore omitted from data collection. As with the other studies, both ChatGPT and Claude had significant discrepancies in their scoring of the same writing samples using the same rubrics through multiple iterations, and also differed widely from the human grader.
It is important to note that for all of the above studies, the authors/instructors spent considerable time training the genAI on the assignments and fine-tuning the prompt engineering in order to create the ideal setup to hopefully get the desired grading and feedback output. For instructors or TAs who are not customizing the genAI tool nor running multiple iterations of scoring for quality assurance, the grading and/or feedback quality is likely to be much poorer. See #2 below for more details about customizing a genAI tool.
Students have mixed opinions about receiving genAI feedback.
In addition to considering the efficacy of genAI feedback, one should also keep in mind the recipients of that feedback. Although in some cases students report being open to genAI-assessment (Braun et al. 2023, Zhang et al. 2024), many studies show that students prefer human feedback or a combination of human and AI feedback; they do not trust or feel satisfied receiving AI feedback alone (Er et al., 2024; Tossell et al., 2024; Chan et al., 2024). We also do not have sufficient research about the impact of receiving genAI feedback on student motivation and engagement in their learning. However, we know that students are more likely to engage with feedback if they feel that it is helpful and fair (Jonsson 2013, Harks et al. 2013, Panadero et al. 2023). Since genAI feedback in its current state is prone to errors, misinformation, and the feedback quality is inferior to a human grader’s, it is crucial that instructors or TAs review and amend its output to ensure students get feedback that they can trust and can use. Additionally, it is important that instructors and TAs are transparent with students about both evaluation criteria as well as how (and by whom) it is assigned fairly and accurately to students’ work (see also #6 below about how to communicate your policy and practices to students).
2. Do you have enough materials and time to fine-tune the tool to provide high quality feedback? How will you review its output for reliability, accuracy, and fairness?
One of the challenges of incorporating GenAI tools into the classroom is the necessary time investment required up front. Getting the tool to do what you want is not as simple as pasting a prompt or rubric in and expecting it to produce reliable and accurate grading or feedback -- it is not yet “plug and play.” Many of the studies above used a customized genAI chatbot, on which the researchers spent a significant amount of time customizing the tool, fine-tuning the prompt engineering, and testing the output. To appropriately customize a genAI tool for this purpose requires the following:
- Materials needed: a genAI platform of your choice (e.g., ChatGPT), detailed grading rubric, assignment instructions, sample student work, thoughtfully engineered prompts.
- Norming: Initially grade a subsample of student work manually and do rubric norming (see #4 below for more about norming). Compare these norms against genAI outputs to assess the accuracy of the tool’s grading as well as the appropriateness and effectiveness of its feedback.
- Quality assurance: Check over AI-generated outputs and incorporate human edits/revisions. Do NOT assume that quality grading occurred or haphazardly share unreviewed genAI feedback with students.
Additionally, as genAI tools continue to evolve and be updated, results may vary and require ongoing review and adjustments. In other words, even after the initial work of customizing the tool to provide reasonable feedback, users need to maintain vigilance for changes in the reliability or accuracy of the tool’s outputs.
The quality of grading and feedback provided by the bot can vary a lot, depending on the following:
- Clarity and precision of prompts: Vague or overly broad prompts often yield inconsistent results. Avoid subjective language. Instead, align the prompt with precise explanations of performance expectations for the assignment (e.g., a well-structured rubric). In the prompt for the genAI tool, it can help to include examples of student submissions of varying quality along with the feedback you would provide each of them.
- Availability of public material on the topic: GenAI models typically perform better on widely discussed topics due to more extensive training data. Assignments on highly specialized or very recent subjects may result in less accurate or reliable outputs.
- Format of student deliverables: GenAI tools are primarily text-based, and thus their ability to provide accurate feedback diminishes significantly if submissions include visual elements such as diagrams and images.
Although the swiftness of genAI seems like a promising way to save instructors and TAs time and effort when it comes to grading and delivering feedback, currently the amount of time it takes to set up the tool, test and refine the prompt engineering, and do quality assurance on the output may not differ greatly from human grading/feedback. Overall, one might simply allocate their time differently.
3. What other tools or strategies could accomplish the task?
Before reaching for a genAI tool, it’s always good to consider whether existing, vetted tools can help you accomplish a task. Canvas, CMU’s Learning Management System, can make grading easier and more efficient in a number of ways:
- Have students submit their work to Canvas Assignments. This enables you to use SpeedGrader to assign points and leave comments and annotations directly on student work.
- Additionally, you can attach Rubrics to specific Assignments and use them to leave feedback through the SpeedGrader.
- Use Canvas Quizzes to automatically grade certain kinds of questions (i.e., multiple choice and matching questions). You can also use SpeedGrader to evaluate other types of Quiz questions.
Other tools, such as Gradescope, Autolab, and FeedbackFruits allow instructors and TAs to partially automate grading.
- Gradescope can speed up workflow by using integrated reliable AI tools to analyze student handwriting and group responses by similar answers. This allows the grader to batch grade the same types of student responses and score them using a rubric.
- Gradescope also has a Programming Assignment tool that allows instructors to autograde student’s code in any language.
- Autolab is another tool for programming assignments that allows you to set up test cases, but still requires careful thought and setup (and is only suitable for certain types of problems).
- FeedbackFruits is a suite of tools integrated with Canvas that can automatically assign grades for a number of assignment types, including peer review activities, document or video annotation, asynchronous discussions, and self-assessment.
The Eberly Center can support you in finding the right tool for your context and using it effectively. Feel free to send us an email to set up a consultation.
4. If you have TAs, what will you communicate to them about the use of genAI for grading or feedback? How will you train them to appropriately use genAI for this task?
For instructors with TAs, it is important to communicate your policy about GenAI use in the class, both in terms of student use and instructor/TA use. As with students, TAs may have experienced a different policy in another course. Rather than assume (or let TAs assume), be explicit.
Some TAs may wish to use genAI in the hopes of making grading or generating feedback easier, especially if they are overwhelmed. In addition to clearly communicating your stance about the use of genAI, it’s important to also properly train TAs in how to be successful in their various responsibilities. Here are some ideas regarding effective TA training for grading/feedback:
Strategy | Why it’s important | If using genAI |
---|---|---|
Meet with the TAs before the semester starts. |
This meeting is a chance for the instructor and TAs to build rapport, go over the syllabus, establish a cadence for future meetings, and address any TA questions. This is also an opportunity to review which platforms and tools TAs will be using (e.g., Canvas, Gradescope, etc.). It is particularly important that the instructor and TAs have a shared understanding of course policies, including possible edge cases that can come up, as well as any relevant department policies. |
Review both the student policy for AI use as well as your expectations (and policy) for how TAs will or will not use GenAI. This should include conversations around TAs’ experiences with GenAI, what the research says (see above), and the measures involved for training GenAI on grading and/or providing feedback. Consider TA concerns around using GenAI and whether it makes sense for all TAs to use it (or not), and how it would impact students if only some of the TAs used GenAI. |
Hold regular meetings with your TAs (e.g., weekly). |
These meetings should cover any trends or questions TAs noted since the last meeting, and upcoming material to be covered and assignments. TAs can report what students are struggling with, which could warrant additional coverage by the instructor, and the instructor can give TAs a heads up about where students tend to struggle with upcoming concepts. |
Include a discussion of how the use of genAI is going on the TA end. How much time are they spending on prompt engineering to refine results? Are they reviewing the output and ensuring it is of high quality? Ensure that TAs are internally consistent with how they are using GenAI so that students receive comparable feedback. |
For each assignment type, hold a “grade norming” session. |
When grading and providing feedback, it is important that TAs are consistent with each other and with the instructor’s expectations. Norming sessions typically involve instructors and TAs scoring a sample set of student work against a shared rubric, and then comparing and discussing their scores until consensus is reached. This ensures that students receive accurate grades and feedback across TAs and that TAs have clarity on what is expected of them. For assignments of the same type, only one norming session is typically needed. This can be incorporated into the weekly meeting above, or an additional meeting can be scheduled where the TAs and instructor meets together to norm and grade. See also Grading Strategies section below for more ideas. |
How are TAs expected to utilize GenAI? Has the tool already been trained on the assignment, rubric, and student work (e.g., a custom GPT for a class), or is the TA expected to do this on their own? What should TAs do to ensure quality assurance? |
Establish which issues should be handled by TAs and which should be handled by the instructor. |
It can be helpful to talk through things that arose in previous semesters of teaching, what the TAs have experienced in their other classes, and potential challenges. |
Discuss possible scenarios where students have questions or concerns about their feedback or how they were graded, as well as cases where students are not using GenAI as directed. What should the TAs handle on their own, and how will they know when to bring it to the instructor? What will regrading policies look like, if genAI is the expectation? What if GenAI calculates a different grade, as shown in section 1 above? Can students request to be graded by the TA directly? |
Establish a communication policy with TAs. |
Just as students should know how they can contact the instructor, so too should TAs. Being transparent about when and how TAs can contact and get a response from the instructor can help them feel reassured about getting the support they need. |
Establish resources for TAs and clear channels of communication in case they run into issues. |
Use grading strategies and TA norming
In addition to technology tools, there are many strategies that instructors and TAs can use to grade efficiently, effectively, and fairly.
How to set up your grading for success:
- Use a rubric. Randomly select 5-7 assignments to check your rubric and make sure it works. If necessary, adjust your rubric, put those 5-7 assignments back in the pile, and re-grade all of the assignments.
- Work with the other TAs. Make sure your grading is consistent among everyone (this is usually called “grade norming”). Working together can help with motivation and provide a second set of eyes if you get stuck. You can also divide up labor among TAs to maximize grading/feedback consistency within an assessment item. For example, on an exam or homework assignment, assign TAs to grade all students for a subset of the questions/items, rather than assign TAs to grade all questions/items for a subset of the students.
- Leverage educational technology. Depending on your course context, educational technology like Canvas or Gradescope may make your grading easier and more consistent.
- Develop a “key” or "common comments” document (e.g., AWK = awkward phrase, “TS” = topic sentence missing or needs revising, etc.). This will save you time so you do not have to write out the same feedback every time.
While you’re grading, here are some other tips to keep in mind to ensure that you are grading fairly and efficiently:
- Grade unbiased by student names. Place Post-It notes over students’ names to ensure fair, unbiased grading, and then shuffle the assignments.
- Prevent grading drift. Go back and compare the first five assignments to the last five. Have your standards changed?
- Set a timer. This will ensure that you spend the same amount of time on each student, and will also help prevent burnout.
- Grade/provide feedback question-by-question. Grading one question at a time (e.g., question #1 for everyone, then question #2…) rather than student-by-student will help you stay in the same mental space. Gradescope can help with this!
- Provide specific group-level feedback. This strategy is an efficient way to provide feedback on common errors, and can be done via Canvas, email, or even verbally at the beginning of class.
Talk to Eberly Center about supporting TA training! This can include tailoring our Graduate & Undergraduate Student Instructor Orientation (GUSIO) to fit your course context, as well as individualized support for implementing the strategies above, and more!
5. What are the legal, ethical, and privacy concerns of students that you should think through before using the tool?
Privacy: data sharing restrictions and FERPA compliance; Legal: IP, digital accessibility; Ethical considerations
When using tools for teaching and learning (and grading), proper data management and FERPA compliance are a must. First and foremost, consider whether or not the tool or system you are using has been licensed by CMU and is FERPA compliant.
CMU Licensed Tools that are FERPA compliant
GenAI tools currently licensed by Computing Services including Microsoft Copilot, ChatGPT edu, Google’s Gemini, and NotebookLM are FERPA compliant.
IMPORTANT NOTES FOR MAINTAINING FERPA COMPLIANCE:
- Individuals must be logged in as instructed via CMU authenticated mechanisms.
- Not all tools listed on Computing Services site are FERPA compliant and are typically indicated as such (see CMU’s Google Workspace for Education webpage here showing “Core” vs “Additional” services). If you are uncertain about whether or not a tool you are using or want to use is meeting these privacy and legal requirements, don’t hesitate to contact us.
If you wish to use tools that are not yet FERPA compliant (i.e., any tool that the university has not vetted and approved as FERPA compliant),
- Reach out to us to get the tool vetted.
- If the tool cannot be made FERPA compliant and you have received guidance from University Contracts on how to proceed (i.e., make sure you are using it responsibly with respect to FERPA compliance and data security), then you should plan out the details of how you work with the tool to ensure responsible use.
- Once your use case and process is well thought out, next consider what data you are entering into any publicly available/consumer tool. Generally speaking you should only be entering data you would share publicly into these systems. Become familiar with the classifications of data you are handling and using with tools that are not CMU licensed.
Once you have a good handle on FERPA compliance and data management requirements, now let’s turn to digital accessibility and intellectual property (IP) management.
- Digital Accessibility: Tools you use and the content you create must be digitally accessible. University licensed tools have been vetted for digital accessibility, but your content also needs to be accessible. There is guidance on how to make your content/feedback digitally accessible and we can also consult with you on this aspect.
- IP management: Do you have proper copyright permissions to enter data into these tools? Consider what will happen with that data – in CMU licensed environments, the models will not train the consumer models and privacy is managed. In public/commercial genAI tool environments, the content you enter will be used for training their models and CMU data privacy requirements will not be met.
Above and beyond privacy, legal requirements, and content management, there are important ethical concerns you will want to consider. For example, some graders and students may have personal ethical concerns about the use of generative AI tools that cannot be resolved by any changes to information privacy settings. Also, every one of the tools that is CMU licensed comes with a financial cost. Who will pay for this and should they need to? And there are many other ethical considerations not discussed here. It is important to be prepared with the ways those concerns can be raised, heard, and responded to with potential alternatives.
As always, as you navigate these requirements, guidelines, and process details, we are here to help so feel free to reach out for a consultation.
6. How will you communicate your decisions to use genAI in grading and feedback (or not) to students? Does your use of it align with your policy for student use of genAI?
Just as it is critical to include a syllabus policy for student use of genAI, it is also important to explicitly state how instructors and TAs will use genAI (if at all). A student use policy should spell out which uses are permitted or prohibited, include a rationale for that policy, and identify the consequences if the policy is violated. Similarly, the instructor/TA genAI use policy should address the following:
- What are the parameters under which genAI will be used? Will it be used for all assessments or just certain types, and all others will receive human feedback?
- How will instructors train a genAI tool to grade and/or provide feedback, and how will the instructor and/or TAs evaluate/assure the fairness and quality of its output?
- How will this policy benefit students, e.g., if it saves instructors time, how will that time be reinvested in supporting student learning?
- Whether genAI-generated feedback is opt-in or opt-out: will students receive genAI-generated feedback by default (and they can opt out of that process) or human feedback (students can opt-into receiving additional AI feedback)?
- How can students express their concerns or questions, if they have them?
It’s important for instructors to also consider whether their policy for student use is in alignment with how they themselves will be incorporating GenAI into the course (or not). For instance, if students are not allowed to use GenAI as a thought partner or to support their learning, but instructors and TAs plan to use it for grading and/or feedback, this can create an unequal dynamic that may cause students to disengage or be less inclined to follow the policy. Not sure whether students should use it or not, or uncertain how to talk to them about it? Check out our FAQs page for recommendations and ideas.
Sample policy language
Instructor/TAs to use GenAI for grading on certain assessment types
To facilitate the X, the TAs will be using a customized ChatGPT bot to assist in grading and providing feedback on homeworks and draft assignments. This bot has been carefully trained on the assignment types, rubrics, and specific kinds of feedback that we require; additionally, TAs will review its output to ensure that the grading and feedback are accurate. All final deliverables will be graded by the TAs, without the use of GenAI. If you have any questions or concerns about this process, please do not hesitate to reach out to myself or one of the TAs.
Instructors/TAs inviting students to use GenAI as supplemental feedback
All assignments in this class will be graded by the instructor and TAs, without the use of GenAI. If you wish to receive additional or more frequent feedback, you are welcome to use the tool yourself (and see Student Use of GenAI policy above). If you have any questions or concerns about this process, please do not hesitate to reach out to myself or one of the TAs.
A No-GenAI policy for both instructor/TAs and students
To best support your own learning, you should complete all graded assignments in this course yourself, without any use of generative artificial intelligence (AI). Please refrain from using AI tools to generate any content (text, video, audio, images, code, etc.) for any assignment or classroom exercise. Passing off any AI-generated content as your own (e.g., cutting and pasting content into written assignments, or paraphrasing AI content) constitutes a violation of CMU’s academic integrity policy. If you have any questions about using generative AI in this course please email or talk to me.
Similarly, all assignments in this class will be graded by the instructor or TAs, without the use of AI. If you have any questions about the grading and feedback process, please do not hesitate to reach out to myself or one of the TAs.
References
Chan, S. T. S., Lo, N. P. K., & Wong, A. M. H. (2024). Enhancing university level English proficiency with generative AI: Empirical insights into automated feedback and learning outcomes. Contemporary Educational Technology, 16(4), ep541. https://doi.org/10.30935/cedtech/15607
Er, E., Akçapınar, G., Bayazıt, A., Noroozi, O., & Banihashem, S. K. (2025). Assessing student perceptions and use of instructor versus AI-generated feedback. British Journal of Educational Technology, 56, 1074–1091. https://doi.org/10.1111/bjet.13558
Estévez-Ayres, I., Callejo, P., Hombrados-Herrera, M. Á., Alario-Hoyos, C., & Delgado Kloos, C. (2024). Evaluation of LLM tools for feedback generation in a course on concurrent programming. International Journal of Artificial Intelligence in Education, (2024) https://doi.org/10.1007/s40593-024-00406-0
Fuller, L. P. , & Bixby, C. (2024). The Theoretical and Practical Implications of OpenAI System Rubric Assessment and Feedback on Higher Education Written Assignments. American Journal of Educational Research, 12(4), 147-158.
Jauhiainen, J. S., & Garagorry Guerra, A. (2024). Generative AI in education: ChatGPT-4 in evaluating students’ written responses. Innovations in Education and Teaching International, 1–18. https://doi.org/10.1080/14703297.2024.2422337
Jukiewicz, M. (2024). The future of grading programming assignments in education: The role of ChatGPT in automating the assessment and feedback process. Thinking Skills and Creativity, 52, 101522. https://doi.org/10.1016/j.tsc.2024.101522
Kiesler, N., Lohr, D., & Keuning, H. (2023). Exploring the potential of large language models to generate formative programming feedback. In 2023 IEEE Frontiers in Education Conference (FIE), College Station, TX, USA (pp. 1–5). https://doi.org/10.1109/FIE58773.2023.10343457
Kortemeyer, G. (2023). Toward AI grading of student problem solutions in introductory physics: a feasibility study. Physical Review Physics Education Research, 19: 020163-1-20. https://doi.org/10.1103/PhysRevPhysEducRes.19.020163
Lazarus, M.D., Truong, M., Douglas, P., Selwyn, N. (2024). Artificial intelligence and clinical anatomical education: Promises and perils. Anat Sci Educ, 17: 249–262. https://doi.org/10.1002/ase.2221
Mitchell, M. (2021). Abstraction and analogy-making in artificial intelligence. Ann. N.Y. Acad. Sci., 1505: 79-101. https://doi.org/10.1111/nyas.14619
Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olson, C. B. (2024). Comparing the quality of human and ChatGPT feedback of students’ writing, Learning and Instruction, 91: 101894. https://doi.org/10.1016/j.learninstruc.2024.101894
Sung, G., Guillain, L., & Schneider, B. (2023). Can AI help teachers write higher quality feedback? Lessons learned from using the GPT-3 engine in a makerspace course. In Blikstein, P., Van Aalst, J., Kizito, R., & Brennan, K. (Eds.), Proceedings of the 17th International Conference of the Learning Sciences - ICLS 2023 (pp. 2093-2094). https://repository.isls.org//handle/1/10177
Wetzler, E. L., Cassidy, K. S., Jones, M. J., Frazier, C. R., Korbut, N. A., Sims, C. M., Bowen, S. S., & Wood, M. (2024). Grading the Graders: Comparing Generative AI and Human Assessment in Essay Evaluation. Teaching of Psychology, 0(0). https://doi.org/10.1177/00986283241282696