Apple has refuted using unethically obtained data to train Apple Intelligence — but it has acknowledged its use for another project.
On Tuesday, it was learned that an AI research lab called EleutherAI had harvested subtitles from YouTube videos without express permission from the creators. It also gathered data from Wikipedia, the English Parliament, and Enron staff emails. The data was then added to a dataset called "the Pile."
EleutherAI notes that its goal was to lower the barrier to AI development for those outside Big Tech. However, companies such as Nvidia, Salesforce, and Apple have all used the Pile to train various AI projects.
Now, Apple has spoken out, saying that while it had used the Pile, the dataset was not used for Apple Intelligence. Instead, it was used to train its open-source OpenELM models, which it released in April.
Apple has since confirmed to AppleInsider that OpenELM models don't power any of its AI or machine learning features. Instead, the tech giant claims that it created OpenELM to contribute to the research community.
It also notes that OpenELM models were never intended to be used for Apple Intelligence. It also says it has no plans to build any new versions of the OpenELM model.
Apple has repeatedly claimed that its sources for its artificial intelligence projects are ethical, and it's known to have paid millions to publishers, and licensed images from photo library firms.