The fine print under research and development now reads: “Google uses data to improve our services and develop new products, features and technologies that benefit our users and the public. For example, we use public data to help train Google’s AI models and build products and features such as Google Translate, Bard and Cloud AI capabilities.”
We use publicly available data to help train Google’s AI models and build products and features
interesting, Reg Employees outside the US cannot see the posts referenced in the links above. However, this Google policy PDF states: “We may collect data that is publicly available online or from other public sources to help train Google’s AI models and build products and features such as Google Translate, Bard and Cloud AI capabilities.”
Google’s boundary changes for AI training. Previously, the policy only mentioned “Language Format” and referred to Google Translate. But the term has been modified to cover “AI models” and include Bard and other systems built as applications on its cloud platform.
A spokesperson for Google said Register That update has not fundamentally changed the way it trains its AI models.
Some are not happy that their own content is not only used to create a machine learning system that repeats their work, and therefore may endanger their livelihood, but that the output of the model flies close to copyright or license violation by collecting this training data. does not change.
AI developers may argue that their efforts fall under fair use, and that what emerges is a new model of work and not actually a copy of the original training data. It is a hotly debated issue.
For example, AI Stability has been sued by Getty Images for mining and using millions of images from its stock photo website to train text-to-image tools. Meanwhile, OpenAI and its owner Microsoft have also received several lawsuits, accusing it of improperly scraping “300 billion words from the Internet, books, articles, websites and texts – including personal information obtained without consent”, and slurping source code. From the public repository to create an AI-pair programming tool GitHub Copilot.
A Google representative declined to clarify whether the ad and search giant would scrape copyrighted or public copyrighted information or social media posts to train its systems.
Now that people are better informed about how AI models are trained, some internet businesses have started charging developers for access to their data. For example, Stack Overflow, Reddit, and Twitter, this year introduced costs or new rules for accessing their content through APIs. Other sites like Shutterstock and Getty have chosen to license their images to AI modelers, and have partnered with the likes of Meta and Nvidia. ®
#Google #public #data #fair #game #training #AIs