Unisami AI News

How OpenAI’s bot crushed this seven-person company’s website ‘like a DDoS attack’

January 11, 2025 | by AI

pexels-photo-16629368

The Battle of Bots: How Triplegangers Survived a Scraping Storm

It was a typical Saturday when Oleksandr Tomchuk, CEO of Triplegangers, was caught off guard by an unexpected nightmare – his company’s e-commerce site had been crippled. The culprit? A supposed distributed denial-of-service (DDoS) attack, but upon closer inspection, it turned out to be a relentless bot from OpenAI. This bot was tirelessly attempting to scrape their massive site, which boasts over 65,000 product pages, each adorned with at least three photographs.

  • OpenAI’s bot flooded the server with tens of thousands of requests.
  • 600 IP addresses were used in the bot’s attempts to download countless images and descriptions.
  • The onslaught was comparable to a DDoS attack.

“Our website is our lifeline,” Tomchuk shared with TechCrunch. For over ten years, his team has painstakingly built the largest collection of “human digital doubles,” providing 3D image files to artists and game developers needing authentic human features. Based in Ukraine, with a U.S. license in Tampa, Triplegangers outlines strict terms against unauthorized bot scraping on its site. Yet, this alone wasn’t enough protection.

“Their crawlers were crushing our site,” Tomchuk emphasized. “It was basically a DDoS attack.”

{Tomchuk}

It turns out that implementing a well-configured robots.txt file is crucial for keeping bots like OpenAI’s GPTBot at bay. This file informs bots about which parts of your site they should avoid. However, even when configured correctly, it can take up to 24 hours for bots to recognize changes – a significant window for potential scraping.

Beyond the technical battle, Triplegangers faced financial repercussions due to increased AWS costs from the bot’s activities. The situation underscores a broader issue: compliance with robots.txt is voluntary for AI companies, and not all adhere strictly.

By Wednesday, after repeated attacks, Triplegangers fortified its defenses with an updated robots.txt file and enlisted Cloudflare’s help to block GPTBot along with other bots like Barkrowler and Bytespider. Although this brought temporary relief, Tomchuk remains concerned about the data already scraped and the lack of communication from OpenAI.

The ordeal highlights an unsettling loophole exploited by AI companies: unless businesses proactively adjust their robots.txt files with specific tags, their data is vulnerable. This places the burden on business owners to safeguard their digital assets actively.

“They should be asking permission, not just scraping data,” Tomchuk asserts.

{Tomchuk}

With AI-driven crawlers causing an 86% increase in non-human traffic in 2024 alone, as reported by DoubleVerify, vigilance is more crucial than ever for online businesses. As Tomchuk advises fellow entrepreneurs, staying alert and monitoring log activities can help identify unwanted bot activity before it wreaks havoc.

As we navigate this new digital frontier reminiscent of a mafia shakedown—where protection against unsolicited data grabs becomes essential—the need for clearer regulations and ethical practices in AI data collection grows ever more urgent.

Image Credit: Sanket Mishra on Pexels

RELATED POSTS

View all

view all