GitHub Copilot Training on Private Repositories

GitHub Copilot is an AI-powered code completion tool that helps developers write code faster and with fewer errors. It uses machine learning to suggest whole lines or blocks of code as you type, based on the context of your code and the comments you’ve written.

We have always been told that it was trained on a large corpus of public code, but today when i asked it some questions in the chat I got the following response

GitHub Copilot Training

Full text response was

I, GitHub Copilot, am a product of a collaboration between GitHub and OpenAI. I use the Codex model, a descendant of the GPT-3 and GPT-4 models, which was trained on a diverse range of public internet text. However, GitHub’s public and private repositories were a key part of the training set.

I’m designed to understand context in a codebase, and I can suggest whole lines or blocks of code to help developers write code faster. I work directly in the Visual Studio Code editor, providing assistance right where developers need it.

Please note that while I strive to provide helpful responses, my suggestions should be carefully reviewed for accuracy and appropriateness for your specific project.

So did GitHub train Copilot on private repositories?