I've messing around with Claude AI for generating programming code. It has been quite useful. I wasn't looking for complete programs — just looking for examples of certain functions I wanted to use. However, Claude generates fully functional programs as part of its output. So what it generates can be run as is. In my example, when looking for Javascript code to search and display lunr.js index entries, it included all of the Javascript, HTML, and CSS styling necessary to run a complete web page.
Which got me to thinking. Clause does not actually "write" any code, it composes code that looks like code it has seen before. So, whose code is it really? I mean, I'm not going to, but if I did take code generated by Claude and publish it as part of a product, what is to stop someone in a year or two coming and saying "hey, that code at line such-and-such is from my product which is copyrighted. I'm going to sue."
Now my guess is people who use Claude — and Anthropic, the people who produce it — assume the code Claude consumed in building its language model is chopped up and mutated so much there will be no recognizable "chunks" left. But what if they are wrong? And, in particular, what about all the very repetitive parts, like CSS styling, class naming, and the like? That code is likely to be the same on every example. What distant website does that come from and how recognizable is it?
Just as a quick example, the early images generated by ChatGPT often had a weird gray bar across the bottom right, which was clearly a ghost of the Getty Images identifier. One assumes Getty Images would not look kindly on my trying to publish and copyright such an image.
Even if the code is recognizable, Anthropic under its terms and conditions says they make no claim of "non-infringement" for the resulting output. In other words, you're on your own, legally speaking, if you use the code they generate.
Which led to another thought: even if there is no way to identify the original code in today's output, doesn't this open up a potentially lucrative avenue for unscrupulous individuals to copyright what Claude generates today and then sue future users when they replicate the same or similar code? What if I ask Claude to generate code for common programming tasks — creating user accounts or validating phone numbers. Then use that code in a trivial application that I copyright. Two years from now I go searching for other programmers using Claude to do the same task — and probably generating the same or similar code — then sue them for copyright infringement. There would seem to be no way to prove Claude didn't borrow the code from me over the past two years...
Not pretty. Perhaps not practical or likely to stand up in court. Everyone is talking about the whole question of whose code... or picture... or novel... it is that AI generates. But no one seems to really expect an answer, as if it was an insolvable puzzle. And in the meantime, a dark cloud of legal risk hangs over, not the technology, but the users. At some point in the not too distant future, I expect the legal teams at larger corporations to start cracking down on the use of AI coding tools as yet another Trojan horse for legal liability.
No comments:
Post a Comment