• 5 Posts
  • 21 Comments
Joined 1 month ago
cake
Cake day: March 9th, 2025

help-circle
  • I asked OpenAI the same. Only without the snark, and noted that US politicians and OpenAI pressers should not be considered evidence without facts:

    Great question—and a healthy dose of skepticism.

    If we assume U.S. lawmakers may be misrepresenting the situation or overstating the threat (intentionally or otherwise), then it’s worth asking: What counter-evidence or independent information exists that disputes the claims against DeepSeek?

    Here’s what we can currently analyze:

    1.

    OpenAI’s Role and Statements

    • The accusation is that DeepSeek used OpenAI data without authorization.
    • OpenAI has not released detailed technical evidence (as of now) proving DeepSeek “stole” data.
    • It is also unclear what “unauthorized use” means here—was it scraping publicly available content generated by ChatGPT? That’s not the same as hacking or violating terms.

    Counterpoint: Unless OpenAI provides forensic evidence (e.g., logs showing data scraping from banned accounts or circumvention of APIs), their claim is not independently verifiable.

    2.

    Open-source AI Training Practices

    • DeepSeek R1 and other models are trained on publicly available data, much like LLaMA, Falcon, Mistral, etc.
    • Many open-source LLMs train on similar corpora, including data from books, code repositories, and even internet crawls, which may contain ChatGPT outputs.
    • No proof has been provided that DeepSeek used proprietary datasets beyond what’s publicly available or scraped from the internet.

    Counterpoint: If DeepSeek simply trained on web data that happened to contain ChatGPT responses (which are abundant online), that wouldn’t constitute theft by typical AI research standards.

    3.

    DeepSeek’s Own Transparency

    • DeepSeek has published detailed model cards and training parameters for R1, a transparency move rare among Chinese AI firms.
    • This suggests a desire to appear cooperative or aligned with global AI openness norms—not necessarily what a covert “spy tool” would do.

    Counterpoint: Spyware tools or state-controlled AI systems don’t usually go open source and document their architectures.

    4.

    No Independent Audit or Third-Party Verification

    • As of now, there’s no independent audit of DeepSeek’s model, training corpus, or infrastructure.
    • The accusations rest heavily on political framing, not technical forensics released to the public.

    5.

    Broader Pattern of U.S. Tech Nationalism

    • U.S. lawmakers have a history of labeling Chinese tech firms as threats without providing concrete, public-facing evidence (Huawei, TikTok, etc.).
    • This doesn’t mean the threats are false, but it does mean we should demand proof beyond committee soundbites.

    TL;DR – Is there counter-evidence?

    Not conclusive counter-evidence, but there’s a lack of compelling public proof supporting the claim. The case against DeepSeek is built on:

    • OpenAI’s statements (without shared evidence),
    • Political concerns about China,
    • and suspicions rooted in strategic competition.

    That’s not enough to say DeepSeek is innocent—but it’s not enough to prove guilt either