pen AI has accidentally deleted data that is relevant to the case where the artificial intelligence company was sued by The New York Times and Daily News
for training their AI models on their works without permission.
After Open AI agreed to provide The Times and Daily News with two virtual machines that could have been used to search for their copyrighted content on OpenAI’s training sets. For a better understanding, virtual machines are software-based computers that work within another computer’s operating system and are most commonly encountered in testing processes, backing up data, or running apps).
The publishers’ attorneys and the experts they have hired said that they spent over 150 hours since November 1st searching for proofs and clues on the OpenAI training data. Yet, on November 14th, OpenAI’s engineers erased all the publisher’s search data that has been stored that has been stored on the virtual machines, according to the affirmation letter that was completed by the U.S. District Court for the Southern District of New York later this Wednesday.
Even more so, OpenAI tried to recover the data, succeeding with most of the information. Yet, because the folder structure and the names of the files were “irretrievably” lost, the recovered data “cannot be used to determine where the news plaintiffs’ copied articles were used to build [OpenAI’s] models,” as written in the letter.
Subscribe to our newsletter
“News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time,” counsel for The Times and Daily News wrote. “The news plaintiffs learned only yesterday that the recovered data is unusable and that an entire week’s worth of its experts’ and lawyers’ work must be re-done, which is why this supplemental letter is being filed today.”.
The plaintiffs’ counsel made it clear that the deletion of the data was not intentional. Yet, they do mention that OpenAI is “in the best position to search its own datasets” for possibly interfering with the content using their tools.
OpenAI neither denied nor confirmed that it trained its AI systems on any copyrighted works without having permission.
By
Daria Dondea
•
November 21, 2024 2:05 AM