Microsoft's 38TB AI data leak

Updated 21st Sep '23

Microsoft AI Researchers Accidentally Leaked 38TB of Company Data

In a recent incident, Microsoft AI researchers inadvertently exposed 38TB of company data, according to an article from Fortune. This data leak occurred when the AI team was uploading training data on GitHub with the intention of allowing other researchers to train AI models for image recognition. However, due to a misconfiguration, the data became publicly accessible, including sensitive information like secrets, private keys, passwords, and over 30,000 internal Microsoft Teams messages.

Cloud security platform Wiz first discovered the data exposure. Fortunately, Microsoft reported that no customer data was compromised, and no internal services were put at risk. The link to the leaked data was generated using a feature in Azure known as "SAS tokens," which enables users to create shareable links. Microsoft promptly addressed the issue by fixing the misconfiguration, revoking the token, and adjusting the permissions of SAS tokens to prevent similar incidents in the future.

The leaked data mainly consisted of information specific to two former Microsoft employees and their workstations. Microsoft has assured customers that no additional action is required to maintain their security. Nevertheless, this incident serves as a reminder of the critical importance of implementing robust security measures when handling large amounts of training data for AI models.

For more information, please refer to the following source: