Yeah, that’s basically it.
But I think what’s getting overlooked in this conversation is that it probably doesn’t matter whether it’s AI or not. Either new content is derivative or it isn’t. That’s true whether you wrote it or an AI wrote it.
One of the cofounders of partizle.com, a Lemmy instance primarily for nerds and techies.
Into Python, travel, computers, craft beer, whatever
Yeah, that’s basically it.
But I think what’s getting overlooked in this conversation is that it probably doesn’t matter whether it’s AI or not. Either new content is derivative or it isn’t. That’s true whether you wrote it or an AI wrote it.
If I created a web app that took samples from songs created by Metallica, Britney Spears, Backstreet Boys, Snoop Dogg, Slayer, Eminem, Mozart, Beethoven, and hundreds of other different musicians, and allowed users to mix all these samples together into new songs, without getting a license to use these samples, the RIAA would sue the pants off of me faster than you could say “unlicensed reproduction.”
The RIAA is indeed a litigious organization, and they tend to use their phalanx of lawyers to extract anyone who does anything creative or new into submission.
But sampling is generally considered fair use.
And if the algorithm you used actually listened to tens of thousands of hours of music, and fed existing patterns into a system that creates new patterns, well, you’d be doing the same thing anyone who goes from listening to music to writing music does. The first song ever written by humans was probably plagiarized from a bird.
It wouldn’t matter, because derivative works require permission. But I don’t think anyone’s really made a compelling case that OpenAI is actually making directly derivative work.
The stronger argument is that LLM’s are making transformational work, which is normally fair use, but should still require some form of compensation given the scale of it.
Her lawsuit doesn’t say that. It says,
when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works
That’s an absurd claim. ChatGPT has surely read hundreds, perhaps thousands of reviews of her book. It can summarize it just like I can summarize Othello, even though I’ve never seen the play.
I haven’t been able to reproduce that, and at least so far, I haven’t seen any very compelling screenshots of it that actually match. Usually it just generates text, but that text doesn’t actually match.
If you say “AI read my book and output a similar story, you owe me money” then how is that different from “Joe read my book and wrote a similar story, you owe me money.”
You’re bounded by the limits of your flesh. AI is not. The $12 you spent buying a book at Barns & Noble was based on the economy of scarcity that your human abilities constrain you to.
It’s hard to say that the value proposition is the same for human vs AI.
A better comparison would probably be sampling. Sampling is fair use in most of the world, though there are mixed judgments. I think most reasonable people would consider the output of ChatGPT to be transformative use, which is considered fair use.
No, it isn’t. There are enumerated rights a copyright grants the holder a monopoly over. They are reproduction, derivative works, public performances, public displays, distribution, and digital transmission.
Commercial vs non-commercial has nothing to do with it, nor does field of endeavor. And aside from the granted monopoly, no other rights are granted. A copyright does not let you decide how your work is used once sold.
I don’t know where you guys get these ideas.
The published summary is open to fair use by web crawlers. That was settled in Perfect 10 v Amazon.
Derivative and transformative are quite different though.
I very much agree.
The thing is, copyright isn’t really well-suited to the task, because copyright concerns itself with who gets to, well, make copies. Training an AI model isn’t really making a copy of that work. It’s transformative.
Should there be some kind of new model of renumeration for creators? Probably. But it should be a compulsory licensing model.
If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I’d get my ass sued. I have to buy the books and the scientific papers.
Well, if OpenAI knowingly used pirated work, that’s one thing. It seems pretty unlikely and certainly hasn’t been proven anywhere.
Of course, they could have done so unknowingly. For example, if John C Pirate published the transcripts of every movie since 1980 on his website, and OpenAI merely crawled his website (in the same way Google does), it’s hard to make the case that they’re really at fault any more than Google would be.
Yes. I do. And I’m right.
There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.
That’s part of the allegation, but it’s unsubstantiated. It isn’t entirely coherent.
these companies who have been using copyrighted material - without compensating the content creators - to train their AIs.
That wouldn’t be copyright infringement.
It isn’t infringement to use a copyrighted work for whatever purpose you please. What’s infringement is reproducing it.
You’re getting lost in the weeds here and completely misunderstanding both copyright law and the technology used here.
First of all, copyright law does not care about the algorithms used and how well they map what a human mind does. That’s irrelevant. There’s nothing in particular about copyright that applies only to humans but not to machines. Either a work is transformative or it isn’t. Either it’s derivative of it isn’t.
What AI is doing is incorporating individual works into a much, much larger corpus of writing style and idioms. If a LLM sees an idiom used a handful of times, it might start using it where the context fits. If a human sees an idiom used a handful of times, they might do the same. That’s true regardless of algorithm and there’s certainly nothing in copyright or common sense that separates one from another. If I read enough Hunter S Thompson, I might start writing like him. If you feed an LLM enough of the same, it might too.
Where copyright comes into play is in whether the new work produced is derivative or transformative. If an entity writes and publishes a sequel to The Road, Cormac McCarthy’s estate is owed some money. If an entity writes and publishes something vaguely (or even directly) inspired by McCarthy’s writing, no money is owed. How that work came to be (algorithms or human flesh) is completely immaterial.
So it’s really, really hard to make the case that there’s any direct copyright infringement here. Absorbing material and incorporating it into future works is what the act of reading is.
The problem is that as a consumer, if I buy a book for $12, I’m fairly limited in how much use I can get out of it. I can only buy and read so many books in my lifetime, and I can only produce so much content. The same is not true for an LLM, so there is a case that Congress should charge them differently for using copyrighted works, but the idea that OpenAI should have to go to each author and negotiate each book would really just shut the whole project down. (And no, it wouldn’t be directly negotiated with publishers, as authors often retain the rights to deny or approve licensure).
Okay, given that AI models need to look over hundreds of thousands if not millions of documents to get to a decent level of usefulness, how much should the author of each individual work get paid out?
Congress has been here before. In the early days of radio, DJs were infringing on recording copyrights by playing music on the air. Congress knew it wasn’t feasible to require every song be explicitly licensed for radio reproduction, so they created a compulsory license system where creators are required to license their songs for radio distribution. They do get paid for each play, but at a rate set by the government, not negotiated directly.
Another issue, who decides which works are more valuable, or how? Is a Shel Silverstein book worth less than a Mark Twain novel because it contains less words? If I self publish a book, is it worth as much as Mark Twains? Sure his is more popular but maybe mine is longer and contains more content, whats my payout in this scenario?
I’d say no one. Just like Taylor Swift gets the same payment as your garage band per play, a compulsory licensing model doesn’t care who you are.
You meet them online, but they’re a vocal minority. Especially when a smaller phone means a smaller battery and worse camera system, two of the consistently top priorities for consumers.