Copyright |  A lawsuit targets the way AI is designed

Copyright | A lawsuit targets the way AI is designed

In late June, Microsoft introduced a new type of artificial intelligence (AI) technology capable of generating its own computer code.


Called Copilot, this tool was designed to speed up the work of professional programmers. As they typed on their laptops, ready-made blocks of computer code were offered that they could instantly add to their own code.

Many programmers loved the new tool or were at least intrigued by it. But Matthew Butterick, a Los Angeles-based programmer, designer, writer, and lawyer, wasn’t one of them. This month, he, along with a team of other attorneys, filed a lawsuit seeking to launch a class action lawsuit against Microsoft and the other big-name companies that designed and implemented Copilot.

A first challenge

Like many cutting-edge AI technologies, Copilot developed its skills by analyzing vast amounts of data. In this case, it was based on billions of lines of computer code posted on the Internet. Butterick, 52, likens this process to hacking, because the system doesn’t acknowledge its debt to existing work. In his lawsuit, he claims that Microsoft and its collaborators violated the legal rights of millions of programmers who spent years writing the original code.

It would be the first legal challenge to a design technique called “AI training,” which is a way of building AI and is set to shake up the tech industry. In recent years, many artists, writers, pundits, and privacy advocates have complained that companies that train their AI systems do so using data that doesn’t belong to them.

This remedy has echoes in the history of the technology industry. In the 1990s and 2000s, Microsoft battled the rise of free software (open source), seeing them as an existential threat to the future of the company. As the importance of this software grew, Microsoft adopted it and even acquired GitHub, a platform where free software developers build and store their code.

Nearly every new generation of technology, even online search engines, has faced similar legal challenges. Often, “there’s no law or case law that covers them,” said Bradley J. Hulbert, an intellectual property attorney dedicated to this increasingly important area of ​​law.

The complaint is part of a wave of concern about AI. Artists, writers, composers, and other creators are increasingly concerned that companies and researchers will use their work to create new technologies without their consent or compensation. So these companies are powering a wide variety of AI-based tools, including art generators, voice recognition systems like Siri and Alexa, and even self-driving cars.

OpenAI, at the forefront

Copilot is based on technology developed by OpenAI, an artificial intelligence lab in San Francisco funded to the tune of US$1 billion by Microsoft. OpenAI is at the forefront of increasingly widespread efforts to train AI technologies from digital data.

After Microsoft and GitHub unveiled Copilot, GitHub CEO Nat Friedman tweeted that using existing code to train the system constitutes “fair use” of the material under copyright law, an argument often used by companies and researchers who have built these systems. . But no court case has yet proven this argument.

“The ambitions of Microsoft and OpenAI go far beyond GitHub and Copilot,” Butterick said in an interview. They want to train on any data, anywhere, for free, without consent, forever. »

In 2020, OpenAI introduced a system called GPT-3. The researchers trained the system using massive amounts of digital text, including thousands of books, Wikipedia articles, chat logs (cats) and other data published on the Internet.

By detecting patterns in all of these texts, the system learned to predict the next word in a sequence. When someone types a few words into this “big language model,” they can complete the thought with entire paragraphs of text. In this way, the system could write its own Twitter messages, speeches, poems, and news articles.

To the surprise of the researchers who built the system, it could even write computer programs, apparently having learned from countless programs posted on the Internet.

So OpenAI went a step further by training a new system, Codex, on a new collection of data that specifically contained code. According to the lab, in a research paper detailing the technology, at least some of that code came from GitHub, a popular programming service owned and operated by Microsoft.

This new system became the underlying technology for Copilot, which Microsoft distributed to developers via GitHub. After being tested by a relatively small number of developers for about a year, Copilot was made available to all developers on GitHub in July.

Butterick defines himself as a free software developer, part of the community of developers who openly share their code with the world. Over the past 30 years, open source software has powered most of the technologies consumers use every day, including web browsers, smartphones, and mobile apps.

Although free software is designed to be shared freely between coders and companies, this sharing is governed by licenses designed to ensure that it is used in a way that benefits the entire programming community. Butterick believes that Copilot violated these licenses and that as it improves it will render free software coders obsolete.

Beginning of the legal process.

After months of publicly complaining about the problem, he made his case with a handful of other attorneys. The lawsuit is still in its infancy and the court has yet to grant class action status.

To the surprise of many legal experts, Butterick’s complaint does not accuse Microsoft, GitHub, and OpenAI of copyright infringement. She takes a different approach, arguing that the companies violated GitHub’s terms of service and privacy policies, while also violating a federal law that requires companies to display copyright information when using material. .

Butterick and another attorney behind the case, Joe Saveri, said the lawsuit could eventually address the copyright issue.

When asked about the company’s ability to discuss the lawsuit, a GitHub spokesperson declined to comment, before stating via email that the company is “committed to responsible innovation with Copilot from the beginning, and will continue to evolve the product to better serve developers around the world.” Microsoft and OpenAI declined to comment on the lawsuit.

This article was originally published in the New York Times.


#Copyright #lawsuit #targets #designed

Leave a Comment

Your email address will not be published. Required fields are marked *