@FaceDeer@kbin.social cover
@FaceDeer@kbin.social avatar

FaceDeer

@FaceDeer@kbin.social

Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.

Spent many years on Reddit and is now exploring new vistas in social media.

This profile is from a federated server and may be incomplete. View on remote instance

FaceDeer ,
@FaceDeer@kbin.social avatar

I find a ton of uses for quick Python scripts hammered out with Bing Chat to get random stuff done.

It's also super useful when brainstorming and fleshing out stuff for the tabletop roleplaying games I run. Just bounce ideas off it, have it write monologues, etc.

FaceDeer ,
@FaceDeer@kbin.social avatar

Content warning: this is a rant from a teenager who has strong opinions.

Okay...

However, it holds a monopoly on software.

You don't know what a "monopoly" is.

they could just go “Boop! You’re gone!” and there’s nothing I could do about it other than move forges.

Yeah, nothing you could do about it, other than moving to one of the many other git hosts. Monopoly!

And then after listing off a whole bunch of alternative git hosts...

Centralization is not bad by itself but it’s bad when there’s no other option. There just needs to be ways to contribute to code without having to use Github.

You have plenty of ways to do that, and you know that because you just listed them. Github is not a monopoly.

Also, I don't see the concept of open source mentioned at any point in this rant.

FaceDeer ,
@FaceDeer@kbin.social avatar

This isn't even a problem with historical awareness, OP knows that Github isn't a monopoly. They listed off a bunch of alternatives in their rant. I'm really not sure what they were even complaining about.

FaceDeer ,
@FaceDeer@kbin.social avatar

All of those issues would arise if you wanted to migrate an established project to Github as well.

FaceDeer ,
@FaceDeer@kbin.social avatar

Microsoft has developed many open-source projects. The view of Microsoft as some kind of anti-open-source crusader is 20 years out of date.

FaceDeer ,
@FaceDeer@kbin.social avatar

You're not "pretty fucked". Just use one of the many other git hosts out there. OP himself lists some of them in his rant.

FaceDeer ,
@FaceDeer@kbin.social avatar

Oh, that's what you meant. How do you contribute to a project on any git host if that git host won't let you? In what way is GitHub any different from that?

FaceDeer ,
@FaceDeer@kbin.social avatar

There's quite a series of leaps of logic here.

Because Google (not Microsoft) released a project under the BSD license (an open source license) but "everyone on Lemmy" doesn't think it's open source, therefore a hosting site owned by Microsoft (not Google) is not "open source."

I'm not even sure what is meant by GitHub being "open source." It's a hosting provider, not an actual piece of software. The site itself doesn't have a source license. The individual repositories can have licenses, which can be whatever the user who created the repository sets it to be - including open source licenses. Do you mean GitHub Desktop? Microsoft released that under the MIT license. And you don't need GitHub Desktop to use GitHub anyway.

FaceDeer ,
@FaceDeer@kbin.social avatar

So we've moved from "GitHub is not open source" to "GitHub has some support software for peripheral features that is not open-source?" I'm definitely failing to see the rant-worthiness of it at this point. It's certainly not monopolistic, platforms like GitLab and Bitbucket also provide these features. And I'd bet that some of them have their own proprietary software to support these things too.

FaceDeer ,
@FaceDeer@kbin.social avatar

"We" as in the conversation as a whole. You joined an ongoing thread.

FaceDeer ,
@FaceDeer@kbin.social avatar

You think Microsoft is the only "evil corporation" among these? That's very naive. Any hosting service will deplatform users when they can see a profit to be made from doing so.

FaceDeer ,
@FaceDeer@kbin.social avatar

Actually, you can do exactly that. Fork them.

You can't force the people who are using Github to follow you, of course. But that's every individual's choice.

FaceDeer ,
@FaceDeer@kbin.social avatar

Sounds like a purity spiral may be revving up.

FaceDeer ,
@FaceDeer@kbin.social avatar

Indeed. There are lots of proposals for perfectly portable decentralized user identities, subscriptions that transcend specific instances, and whatnot, but until those things actually arrive that's not the Fediverse we're dealing with. It's a hassle having to switch instances.

FaceDeer ,
@FaceDeer@kbin.social avatar

Indeed. Firefox already has "sponsored links" and such in the built-in homepage, I simply disable those when I first install it and get on with life.

Big projects like Firefox need big money to support it. If you don't want it to be beholden to Google it needs to find ways to earn some on its own.

FaceDeer ,
@FaceDeer@kbin.social avatar

It says "opt-out" in the title.

FaceDeer ,
@FaceDeer@kbin.social avatar

I'm in a campaign (with rotating GMs) where I'm playing a character who is literally an alien infiltrator that has infiltrated the party. Except he's really bad at it and it's obvious he's an alien infiltrator, and because he's bad at it he has no idea that it's obvious. The party's superiors told them to play along for now and try to find out what my character is up to.

It's been about four years now, going on five, and I practically had to spoon-feed them useful tidbits about his mission. I've finally just kidnapped them all and took them back to my homeworld, we're now running through the adventure where they escape. I had to put an alien diplomat in their cell to monologue information about them.

Still, I've been having fun so I don't mind. Just amusing how much PCs are willing to trust other PCs simply because they're PCs. :)

Sometimes it's different for NPCs, but not always - in another campaign just now the party encountered an Aboleth who told them that he was a good Aboleth that wasn't interested in mind control or manipulating anyone. And by the way, there's this list of quests he's working on and he'd appreciate some help. They jumped right in. He actually is on the level, but come on - Aboleth. If there's anyone to be instantly suspicious of it's someone like that.

FaceDeer ,
@FaceDeer@kbin.social avatar

Indeed. I frequently use LLMs as brainstorming buddies while working on creative things, like RPG adventure planning and character creation. I want the AI to come up with new and unexpected things that never existed before.

If I have need of the AI to account for "ground truths" then I use things like retrieval-augmented generation or database plugins that inject that stuff into the context.

FaceDeer ,
@FaceDeer@kbin.social avatar

Have you not experimented with LLMs? They come up with new things all the time.

Tumblr and Wordpress to Sell Users’ Data to Train AI Tools ( www.404media.co )

this could not be timed worse for Tumblr which is in huge hot water with its userbase already for its CEO breaking his sabbatical to ban a prominent trans user for allegedly threatening him (in a cartoonish manner), and then spending a week personally justifying it increasingly wildly across several platforms. the rumors had...

FaceDeer ,
@FaceDeer@kbin.social avatar

They're giving you services in exchange for your contents.

Does nobody even think about TOS any more? You don't have to read any specific one, just realize the basic universal truth that no website is going to accept your contents without some kind of legal protection that allows them to use that content.

FaceDeer ,
@FaceDeer@kbin.social avatar

Are you serious? We're speaking in the Fediverse right now. It's notable in its difference. Though instances have their own TOSes, so it'd be pretty trivial to set one up to harvest content for AI training as well.

FaceDeer ,
@FaceDeer@kbin.social avatar

Hardly. They earn money by being paid by their users, but they can earn more money by being paid by their users and also selling their users' data. The goal is more money, so it makes sense for them to do that. It's not crazy.

From the WordPress Terms of Service:

License. By uploading or sharing Content, you grant us a worldwide, royalty-free, transferable, sub-licensable, and non-exclusive license to use, reproduce, modify, distribute, adapt, publicly display, and publish the Content solely for the purpose of providing and improving our products and Services and promoting your website. This license also allows us to make any publicly-posted Content available to select third parties (through Firehose, for example) so that these third parties can analyze and distribute (but not publicly display) the Content through their services.

Emphasis added. They told you what they could do with the content you gave them, you just didn't listen.

I'm sorry if I'm coming across harsh here, but I'm seeing this same error being made over and over again. It's being made frequently right now thanks to the big shakeups happening in social media and the sudden rise of AI, but I've seen it sporadically over the decades that I've been online. So it bears driving home:

  • If you are about to give your content to a website, check their terms of service before you do to see if you're willing to agree to their terms, and if you don't agree to their terms then don't give your content to a website. It's true that some ToS clauses may not be legally enforceable, but are you willing to fight that in court? If you didn't consider your content valuable enough to spend the time checking the ToS when you posted it, that's not WordPress's fault.
  • If you give someone something and they later find a way to make the thing you gave them valuable, it's too late. You gave it to them. They don't owe you a "cut." Check the terms of service.
FaceDeer ,
@FaceDeer@kbin.social avatar

I wouldn't really trust that promise, frankly. I just checked their terms of service and it has the usual clause:

You must own all rights, title, and interest, including all intellectual property rights, in and to, the User Content you make available on the Services. ASSC requires licenses from you for that User Content to operate the Services. By posting User Content on the Services, you grant ASSC a royalty-free, perpetual, irrevocable, non-exclusive, sublicensable, worldwide license to use, reproduce, distribute, perform, publicly display or prepare derivative works of your User Content.

Which isn't really surprising, it's standard boilerplate for a reason. They don't want to be caught in a situation where they can't function legally any more. They say they won't sell the company or your data, and they might even believe that right now, but who knows what the future might bring? They have the ability to do so if the circumstances arise.

FaceDeer ,
@FaceDeer@kbin.social avatar

Well, a large part of my frustration stems from the "I've seen this for decades" part - longer than many of the people who are now raising a ruckus have been alive. So IMO it's always been this way and the "social contract we've adapted to" is "the social contract that we imagined existed despite there being ample evidence there was no such thing." I'm so tired of the surprised-pikachu reactions.

Combined with the selfish "wait a minute, the stuff I gave away for fun is worth money to someone else now? I want money too! Or I'm going to destroy my stuff so that nobody gets any value out of it!" Reactions, I find myself bizarrely ambivalent and not exactly on the side of the common man vs. the big evil corporations this time.

FaceDeer ,
@FaceDeer@kbin.social avatar

I'm just venting, really. I know it's not going to make a real difference.

I suppose if you go waaaay back it was different, true. Back in the days of Usenet (as a discussion forum rather than as the piracy filesharing system it's mostly used for nowadays) there weren't these sorts of ToS on it and everything got freely archived in numerous different places because that's just how it was. It was the first Fediverse, I suppose.

The ironic thing is that kbin.social's ToS has no "ownership" stuff in it either. For now, at least, the new ActivityPub-based Fediverse is in the same position that Usenet was - I assume a lot of the other instances also don't bother with much of a ToS and the posts get shared around beyond any one instance's control anyway. So maybe this grumpy old-timer may get to see a bit of the good old days return, for a little while. That'll be nice.

FaceDeer ,
@FaceDeer@kbin.social avatar

If it makes you feel better, the thing that annoys me most is not so much that this is happening but more how everybody is suddenly surprised by it and complaining about it. The data-harvesting itself doesn't really harm anyone.

FaceDeer ,
@FaceDeer@kbin.social avatar

A user's data still belongs to the user when they post it on sites like Reddit and such, too. The ToS doesn't take ownership away from them, at least not in any case that I've seen. It just gives the site the license to use it as well.

FaceDeer ,
@FaceDeer@kbin.social avatar

You could ask a lawyer, I suppose. But the basic gist of this is "we don't know what we might need to do with this data in the future, so we put 'we can do anything with this data' into the ToS so that we know that if the need arises we won't find ourselves unable to do what we need to do with it." Any website that doesn't do this could find itself unable to implement new features or comply with new laws they didn't think of when crafting the original ToS.

At the very minimum a ToS needs to have some way to update and apply retroactively to old data, which ends up being "we can do anything with this data" with extra steps.

FaceDeer ,
@FaceDeer@kbin.social avatar

No problem. I'm not a lawyer myself, mind you, but I've encountered issues like these enough times over the years that I feel I've got a pretty good layman's grasp. Plus I've actually read some of these ToSes and considered them from the perspective of the company running the site, which I suspect most people arguing about this stuff haven't actually done.

I wish the Fediverse sites running without rigorous ToSes well, of course, but I suspect failing to establish clear rights to use the content people post on them is likely to end up biting them in the long run. At least the bigger ones. Hobby-level websites get away with a lot because they don't have significant money on the line.

FaceDeer ,
@FaceDeer@kbin.social avatar

It's true, go ahead and read the ToS. It only grants a license to Reddit to use your content. It explicitly says:

You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:

And then goes on to enumerate what you're licensing them to do with it. There's also a section titled "Changes to these Terms" about how they can change the ToS going forward.

FaceDeer ,
@FaceDeer@kbin.social avatar

I use quotation marks there because what is often referred to as AI today is not whatsoever what the term once described.

The field of AI has been around for decades and covers a wide range of technologies, many of them much "simpler" than the current crop of generative AI. What is often referred to as AI today is absolutely what the term once described, and still does describe.

What people seem to be conflating is the general term "AI" and the more specific "AGI", or Artificial General Intelligence. AGI is the stuff you see on Star Trek. Nobody is claiming that current LLMs are AGI, though they may be a significant step along the way to that.

I may be sounding nitpicky here, but this is the fundamental issue that the article is complaining about. People are not well educated about what AI actually is and what it's good at. It's good at a huge amount of stuff, it's really revolutionary, but it's not good at everything. It's not the fault of AI when people fail to grasp that, no more than it's the fault of the car when someone gets into it and then is annoyed it won't take them to the Moon.

FaceDeer ,
@FaceDeer@kbin.social avatar

I didn't say that everything in Star Trek was AGI, just that you can find examples there.

FaceDeer ,
@FaceDeer@kbin.social avatar

Another more general property that might be worth looking for would be substantially similar posts that get cross-posted to a wide variety of communities in a short period of time. That's a pattern that can have legitimate reasons but it's probably worth raising a flag to draw extra scrutiny.

One idea for making it computationally lightweight but also robust against bots "tweaking" the wording of each post might be to fingerprint each post based on rare word usage. Spam is likely to mention the brand name of whatever product it's hawking, which is probably not going to be a commonly used word. So if a bunch of posts come along that all use the same rare words all at once, that's suspicious. I could also easily see situations where this gives false positives, of course - if some product suddenly does something newsworthy you could see a spew of legitimate posts about it in a variety of communities. But no automated spam checker is perfect.

FaceDeer ,
@FaceDeer@kbin.social avatar

That's why I was suggesting such a simple approach, it doesn't require AI or machine learning except in the most basic sense. If you want to try applying fancier stuff you could use those basic word-based filters as a first pass to reduce the cost.

Blocking AI crawlers on the fediverse ( fedia.io )

Given how Reddit now makes money by selling its data to AI companies, I was wondering how the situation is for the fediverse. Typically you can block AI crawlers using robot.txt (Verge reported about it recently: https://www.theverge.com/24067997/robots-txt-ai-text-file-web-crawlers-spiders). But this only works per...

FaceDeer , (edited )
@FaceDeer@kbin.social avatar

We're sick of closed walled-garden monoliths like Reddit! Let's move to an open federated protocol where anyone can participate and the APIs can't be locked down!

...wait, not like that!

Yeah. This is what you signed up for when you joined the Fediverse, the ActivityPub protocol broadcasts your content to any other servers that ask for it. And just generally, that's how the Internet works. You're putting up a public billboard and expecting to be able to control who gets to look at it. That's not going to work. Even robots.txt is just a gentleman's agreement, it's not enforceable.

If you really want to prevent AI from training on your content with any degree of certainty you're probably looking for a private forum of some kind that's run by someone you trust.

FaceDeer ,
@FaceDeer@kbin.social avatar

I really don't see how it would be physically possible to do that and still allow the content to be publicly seen by other humans.

FaceDeer ,
@FaceDeer@kbin.social avatar

Well, I hope my answer clarifies it. You can't prevent LLMs from being trained on your public posts.

FaceDeer ,
@FaceDeer@kbin.social avatar

Yup. There are dumps of Reddit's entire archive of comments and posts available via torrent, I suspect the only reason Reddit's getting paid for that stuff right now is that it's a legal ass-covering that's comparatively cheap. Anyone who's a little daring could use it to train an LLM and if they prep the data well enough it'd be hard to even notice.

FaceDeer ,
@FaceDeer@kbin.social avatar

And some of those hosts can decide to serve up their content to AI trainers. Some of those hosts can be run by AI trainers, specifically to gather data for training. If one was to try to prevent that then one would be attacking the open nature of the fediverse.

There have been many people raging about their content being used to train AIs without permission or compensation. I'm speaking to those people, not the "fediverse collectively". As you suggest, the fediverse can't say anything collectively.

FaceDeer ,
@FaceDeer@kbin.social avatar

Yeah, and as far as I'm aware they can respond to you too. I much prefer it over Reddit's approach, it was often used as a "Haha, I get the last word!" Button.

FaceDeer ,
@FaceDeer@kbin.social avatar

If you're talking about Glaze or Nightshade, those techniques are not proven to be particularly effective. Lots of people want them to work but that doesn't make it so.

FaceDeer ,
@FaceDeer@kbin.social avatar

Why? How does it harm you in any meaningful way?

FaceDeer ,
@FaceDeer@kbin.social avatar

I draw plenty of benefit from AI tools. There are open source models that anyone can run.

FaceDeer ,
@FaceDeer@kbin.social avatar

My original question remains unanswered. "It may help someone I don't like because they are richer than me" is a pretty weak concept of "harm."

FaceDeer ,
@FaceDeer@kbin.social avatar

I was directly addressing all of the points you raised.

You said it concentrates wealth, but open source does the opposite of that - it allows small companies and individuals to earn money using the technology without having to pay for its use.

You said it "harms everyone but the 0.1%." I am benefited by it, not harmed, and I am very much not part of the 0.1%.

FaceDeer ,
@FaceDeer@kbin.social avatar

No, I said things about AI and open source. I raised open source as part of my counter to your argument that this is "concentrating wealth."

Here, I'll explain in detail what's going on.

In response to an article about Reddit licensing your content to AI trainers, capt_wolf said "it's time to purge your account." Presumably as a way to stop that from happening. I asked why that was a bad thing, specifically how it harmed us in any meaningful way. You came in at that point and suggested:

  • It's a scheme to further concentrate wealth
  • It harms everyone but the 0.1%.

I raised open source as a counter to the "wealth concentration" point, because open source does the opposite - it spreads the wealth to any who want it. It puts these resources into the commons.

I also pointed out that I personally benefit from AI tools, so it does the opposite of harming me. As I am not part of the 0.1%, that's a counter to your second point.

FaceDeer ,
@FaceDeer@kbin.social avatar

Plenty of for-profit companies use open protocols and don't harm them in the slightest.

Almost any website you visit, for example.

FaceDeer ,
@FaceDeer@kbin.social avatar

I'd say it's not even really their place to be "examining the whole economic system." Each individual is just a regular Joe who put in their time at a job over their life and would now like to reap the rewards of their effort in retirement. It bothers me when people insult other people simply for being caught up in a systemic issue that's beyond their control.

The solutions for systemic problems need to be systemic as well. If we as a society don't want to see housing move over to an exclusively rent-based system then we'll need to address it through things like zoning changes and other legal reforms. When people oppose those things by voting against them, then we can start to apportion blame around.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • All magazines