Office of the CIO › Computing Services › Services › Generative Artificial Intelligence (AI) › Meet AI › Choosing APIs for Secure Data Use

Choosing APIs for Secure Data Use

Generative AI tools are becoming part of our daily work, learning, and research. However, how you access these tools greatly impacts the security and privacy of your data. At Carnegie Mellon University, it's important to choose the right access method — especially when handling sensitive information like academic records or research data.

Here’s a simple guide to three common ways you might interact with AI models, along with their security implications.

1. Unprotected APIs: Data May Be Used for Training

When you use an AI model via a public or free API, such as creating a free ChatGPT account or using an AI tool without logging in through CMU credentials, your data could be stored and used to further train the model.

Key Points:

Data entered and prompts received are often collected to improve the model.
Sensitive information (like student data or unpublished research) should not be shared.
Suitable for experimenting with general ideas, but not for confidential work.

Example: Setting up a personal ChatGPT Free account.

2. Protected APIs: Secure and Confidential

When you access an AI tool through a protected environment, your data is governed by strict privacy policies. These services ensure your data is not used for model training.

Key Points:

Data remains private and secure.
Suitable for academic work, administrative tasks, and professional research.
Recommended by Computing Services.

Examples:

ChatGPT Edu (available for purchase through the software catalog)
gemini.google.com (free with your CMU credentials)
NotebookLM (free with your CMU credentials)
Microsoft Copilot (free with your CMU credentials)

3. Local Deployment: Maximum Control

For the highest level of security, you can download an AI model and run it locally on your own device or a secure server. In this setup, data stays entirely under your control and is not transmitted to external servers for training or storage.

Key Points:

No external access to your data.
Suitable for highly sensitive projects (e.g., confidential research).
Requires more technical setup and resources.

Example: Running an open-source model like LLaMA 3, Mistral, or a fine-tuned local version of GPT-NeoX on your own machine.

Choosing the Best Option

When deciding how to access an AI tool, consider:
Who owns the data once you enter it?
Could the data be reused to train AI?
What type of data are you using?
Do you need compliance with regulations like FERPA?

When in doubt, choose the most secure option available, especially when working with student, research, or institutional information.