It's not Just Data.

We used to be protected by friction. If someone wanted to know what you said, they had to be there. They had to listen. They had to care. That effort, the act of witnessing, was part of the social contract. You could speak freely because you trusted the weight of the moment would carry your meaning.

Researchers scraped 2 billion Discord messages from public servers without consent. Here’s why that matters, and why “anonymized” data isn’t harmless.

I'm sure you've seen by now that a group of researchers scraped over two billion Discord messages from thousands of public servers. They stripped away usernames and IDs, called it “anonymized,” and published the entire thing online as a dataset for anyone to download and use.

They didn’t ask permission from the people who wrote the messages. They didn’t notify the communities. They didn’t consider whether the people in those conversations ever expected their words to be stored forever, searchable, and stripped of their original context.

They said it's ok, it was public discords. They said it was anonymous. They said there was nothing to worry about.

“If you have nothing to hide, you have nothing to fear.”

That statement sounds reasonable to many. But it comes from a narrow, sheltered view of how people live and speak and create. It imagines that the only thing worth protecting is criminal behavior. It imagines that people are static, one-dimensional beings who should be comfortable with everything they’ve ever said being preserved, indexed, and analyzed by strangers.

To be okay with this kind of surveillance is a failure of imagination and empathy.

Empathy is the ability to recognize that someone else’s situation might be different than yours. Imagination is the ability to envision the consequences of your actions on people you don’t know. When you claim “it was public” as justification, you're not just dodging responsibility, you're revealing a narrow worldview where expression is reduced to raw material. You can't or won’t imagine that these were real people, not just text blobs in a dataset.

You prop up a broader pattern in tech and data science: a fixation on what can be done, with barely a glance toward what should be done. The capacity to scrape, store, and analyze at scale has outpaced the ethical frameworks needed to govern it. And instead of building those frameworks, they retreat into technical excuses: “It’s anonymized. It’s legal. It’s public.”

But legality isn't morality. Anonymity isn't safety. And public doesn’t mean invited.

When you say something in public, you are still speaking within a context. A conversation in a Discord server is like talking in a public park. You know someone might overhear you. That’s part of the risk you accept. But what you don’t expect is for every park you’ve ever been in to be wired with microphones. You don’t expect someone to record every word you’ve ever said, put it in a file, and hand it out to anyone who asks.

That’s the difference. That’s where this crosses the line.

We used to be protected by friction. If someone wanted to know what you said, they had to be there. They had to listen. They had to care. That effort, the act of witnessing, was part of the social contract. You could speak freely because you trusted the weight of the moment would carry your meaning.

Even if names are stripped away, the harm remains. A dataset of “anonymous” messages still contains people’s voices. It still holds their creativity, their pain, their uncertainty, and their process. People go to Discord to build stories, test jokes, ask for help, figure out who they are. It is a place for trial and error, for play and thought.

Now imagine someone looks at that database and finds a beautiful paragraph buried in the stream of chatter. A joke. A poem. A piece of dialogue from a story-in-progress. It was posted months or years ago by someone asking for feedback or sharing something unfinished.

The content survives, but the name doesn’t. It was anonymized to “protect” the speaker. But in doing so, it also erased any hope of recognition. There’s no way to credit the original writer. There’s no way to ask permission. The work floats free, stripped of ownership, and becomes fair game for anyone to reuse or claim.

When people share their thoughts, feelings, and creative endeavors in a shared space, even a public one, there's an implicit understanding of who they are in that moment and why they are sharing.

And what’s the alternative? To leave the names in and expose people to risk? To make it easier to trace personal expression back to real identities? There is no clean answer. Anonymizing protects people, but it also erases them. Leaving attribution preserves ownership, but it opens the door to harm.

You can’t reconcile those two goals. Not when the data was never meant to be extracted in the first place. The burden of ethical responsibility lies with those who collect and utilize data, not with those who generate it. The capacity to collect data should not dictate its collection.

We don’t share our thoughts in a public space because we want them preserved forever. We share them because we believe that space and time provide their own kind of boundaries.

Anonymizing isn’t foolproof either. Strip away the username and someone can still be recognized by the way they speak, the stories they tell, the personal details they mention without thinking twice. All it takes is one sentence that lines up with a blog post, a tweet, a photo, a memory.

And once someone connects that dot, the rest follows. A private struggle posted in what felt like a safe server becomes a screenshot in a group chat. A frustrated rant gets pulled up in a job application. A joke made in 2018 resurfaces in 2025 with no explanation, no tone, no context - just cold, isolated words.

What once made sense in the flow of a conversation now sits there like a trap, waiting to be misread. You don't get to explain yourself. You don’t get to delete it. You don’t even know it’s happening until someone else decides it matters. And at that point, the damage is already done.

If you think it’s fine because it’s public, then you’ve mistaken accessibility for consent. If you think people are overreacting, then you’ve never had to fear being misquoted, misunderstood, or re-identified. If you think anonymizing words makes them harmless, then you’ve never created something and watched it be stolen.

A true public forum is built for broadcast. You speak knowing the world might listen. But a shared space, even if anyone can access it, is not the same. It runs on friction. To overhear requires intent. To collect requires effort. That friction creates safety. It’s what lets people speak freely, in the moment. Scraping removes it. It turns presence into surveillance. It turns speech into data. Visibility is not consent. Availability is not permission.

Subscribe to Ink Harmony

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe