Discord Message-Scraping Service Claims Access to 1.8 Billion Messages

A newly advertised data-scraping service claims to index 1.8 billion Discord messages, 207 million voice sessions, and profiles from 35 million users. Researchers warn the tool could fuel harassment, doxxing, and privacy violations, echoing past scraping campaigns like Spy.Pet.
Discord Message-Scraping Service Claims Access to 1.8 Billion Messages
Table of Contents
    Add a header to begin generating the table of contents

    A new data-scraping service aimed at Discord users is advertising access to a vast archive of messages, voice sessions, files, and user profiles — and investigators say the numbers are staggering. The service’s forum post and public terms claim the ability to search 1.8 billion Discord messages from 35 million users, plus 207 million voice sessions and records from roughly 6,000 Discord servers. Security researchers and reporting teams warn the product appears designed to enable harassment, doxxing, and other abusive uses of scraped communications.

    The offering surfaced on a popular data-leak forum where cybercriminals trade or sell large datasets. The seller says the service allows subscribers to sift through billions of Discord records and that the index updates in near real time. Cybernews researchers analyzed the advertisement and the service’s public-facing materials but could not independently verify the dataset without paying for access. Their assessment notes the service combines at least two value streams: customers who pay to read others’ messages, and people who pay to have their own data excluded from the index.

    Claimed Scale, Storage Location, and Legal Posturing

    According to the service’s advertised terms, operations are run from Estonia, an EU member state with strict privacy rules. However, the service’s ToS also states that the stored data resides on servers in the Russian Federation. Cybersecurity researchers interpret that architecture as intentional: run the front-facing operation inside the EU while keeping the actual data where EU law is harder to enforce.

    The service’s claims in summary:

    • 1.8 billion Discord messages indexed.
    • 35 million users covered.
    • 207 million voice sessions archived.
    • 6,000 Discord servers included.

    Investigators also flagged the service’s alleged integrations: the ad claims ties to breached databases and FiveM servers, potentially expanding the scope and usefulness of the index for targeted campaigns.

    Historical Echoes: Spy.Pet and Prior Scraping Campaigns

    The new service follows a familiar pattern. In 2024 a site called Spy.Pet advertised a similarly huge scrape of public Discord messages and related account linkages. Discord banned Spy.Pet-linked accounts last year and publicly called scraping its services a violation of platform rules. Spy.Pet reportedly bundled Steam account data and offered an “enterprise option” for organizations that wanted to use scraped datasets for AI training or other large-scale analysis.

    Cybernews’ researchers said the new offering is “similar to Spy.Pet” but with deeper claimed integrations. As the researchers put it:

    “This is a tool that makes researching people for the sake of harassment or online arguments easier. It could be useful for people with certain intentions, but the same can be done without the service — it would just take much more manual labour.”

    The Spy.Pet episode also showed legal and policy limits: scraping services can violate GDPR rights such as the “right to be forgotten.” Storing scraped data in jurisdictions outside the EU may be an attempt to sidestep those rules, but researchers note that legal exposure remains possible.

    How Scraped Discord Data is Allegedly Used

    The advertised use cases are blunt. Purchasers can search message text, pull voice-session records, and view user profiles — all material that can be weaponized in several ways. Reported exploitation vectors include:

    • Targeted harassment campaigns that leverage private conversations and contextual details.
    • Doxxing and public disclosure of private information.
    • Tailored social engineering and phishing using context from past messages.
    • Training AI models on scraped chat logs for automated trolling or content generation.

    In prior incidents, scrapers have also bundled additional linked-account data, which increases the effectiveness of cross-platform harassment and account takeover attempts.

    Verification Challenges and Live-Update Claims

    Researchers emphasize a verification problem: without subscribing to the service, it is difficult to confirm the advertised dataset’s existence or freshness. The sellers claim the index is updated live — a claim that, if true, implies active scraping processes continually harvesting new content from public or misconfigured private servers. Cybernews said the only way to verify the scale would be to engage with the service, an option investigators declined.

    Independent reports earlier this year documented other large scraping operations. One group claimed to have indexed 348 million messages from nearly 1,000 public Discord servers. Those prior examples show the technical feasibility of large-scale scraping where defenders do not effectively block automated collection.

    Operational Footprint: Estonia Front End, Russia Data Storage

    The service’s public documents and ToS present a two-part posture: a front office in Estonia and backend storage in Russia. That arrangement is notable: Estonia’s regulatory environment would normally hold a service to strict privacy standards, but the backend storage location complicates enforcement. Security researchers interpret this as an attempt to enjoy the legitimacy or apparent compliance of an EU host address while placing data beyond EU reach.

    That architectural choice has implications for legal takedowns, cross-border cooperation, and user recourse. It also raises questions about whether the service can be compelled to remove data or comply with deletion requests under GDPR, even if its front-facing registration is in the EU.

    Platform Response History and Discord’s Enforcement Actions

    Discord has previously acted against scraping services. The platform banned Spy.Pet-linked accounts in 2024 after investigating the service’s scraping behavior. Discord’s policy stance is clear: automated scraping that violates terms of service is prohibited. The company has said it will act when large-scale scraping is found, but enforcement is resource-intensive and reactive once new services appear.

    Researchers say these scraping operations often continue in new forms because they remain profitable: there is demand from people seeking to harass, research, or otherwise exploit conversations at scale. The new service’s ad indicates a market willing to pay for searchable archives and for opt-out guarantees that exclude certain users from the index for a fee.

    Broader Privacy and Policy Implications

    Beyond immediate harassment risks, the scraping service spotlights larger privacy concerns for platforms that host community conversations. Public messages are crawlable, but private server content and voice sessions are not supposed to be freely indexed. Aggregated chat logs create rich profiles of users’ opinions, social ties, and behaviour patterns — data attractive to abusers and, in some reported cases, to actors seeking data for AI model training.

    Legal frameworks like the GDPR impose rights and obligations on data controllers and processors. The GDPR’s “right to be forgotten” and related rules create avenues for takedown demands, but jurisdictional evasions and the underground marketplace complicate enforcement.

    What Is Known, What Remains Unclear

    Known facts from public reporting and researcher analysis:

    • A service advertised on a data-leak forum claims indexing of 1.8 billion Discord messages.
    • The service advertises voice sessions, user profiles, and records from thousands of servers.
    • Operators claim an Estonian front and Russian-hosted storage.
    • Cybernews and other researchers cannot independently verify the full dataset without subscribing.
    • This service resembles Spy.Pet and earlier scraping operations that Discord shut down.

    Open questions that remain:

    • Whether the full, live dataset exists as advertised.
    • Whether private server data was scraped through misconfiguration, leaked credentials, or platform vulnerabilities.
    • If and how law enforcement or platform operators will compel takedown or identify the operators.

    The advertised Discord message-scraping service represents a worrying iteration of an ongoing abuse model: centralized search on decentralized conversations. The combination of claimed scale, voice-session archives, and cross-platform integrations makes the service a potential force multiplier for harassment and privacy abuse. Researchers and reporters will continue to monitor the situation and watch for signs that the advertised dataset is being used or sold in secondary markets.

    Related Posts