Which AI Bots Are Really Crawling Websites in 2026? Real Data From 1.4 Million Tracked Events

Published by The Laughing Professor

The Difference Between Assumptions and Reality

There is no shortage of articles discussing AI bots, AI crawlers, search engines, indexing systems, and machine learning models.

Most of those articles are based on:

  • Industry reports
  • Surveys
  • Third-party datasets
  • Speculation
  • Vendor marketing

This article is different.

The statistics below come from a single website using my own Traffic Intelligence platform, collecting both server-side and JavaScript-confirmed visitor data over 277 days.

The website being analysed is:

https://laughingprofessor.net

This is not client data.

This is not aggregated data.

This is not purchased data.

This is real traffic collected from one independent website.

Current Traffic Dataset

AI crawler traffic statistics collected from 1.4 million website events

Over the tracking period, the website recorded:

Metric

Total

Total Tracked Events

1,423,640

Server Requests

1,299,330

Unique IP Addresses

117,263

Confirmed Human Page Views

27,441

Bot Requests

354,416

AI Requests

79,845

Successful Requests (200)

976,630

4XX Errors

12,268

5XX Errors

2,560

Days Active

277

The Most Important Statistic

Many website owners still assume:

Traffic = Visitors

The data tells a very different story.

Over 1.4 million tracked events were recorded, yet only 27,441 page views were confirmed through JavaScript beacon tracking.

This demonstrates something most analytics dashboards hide:

Not every request represents a human visitor.

A large percentage of website activity comes from:

  • Search engines
  • AI crawlers
  • Monitoring systems
  • Scrapers
  • Automated scanners
  • Security probes
  • Bots

Understanding the difference matters.

Traffic intelligence dashboard showing human, bot and AI traffic analysis

Which AI Crawlers Are Visiting?

The following AI-related crawlers were identified through server-side tracking.

AI Traffic Breakdown

AI Crawler

Requests

Unique IPs

Amazonbot

1,503

414

ChatGPT

234

155

Meta AI

223

137

ClaudeBot

210

31

Timpibot

89

1

CommonCrawl

85

3

OpenAI

42

10

PerplexityBot

22

7

Bytespider

5

3

YouBot

5

2

GPTBot

5

1

Perplexity

3

3

MistralBot

1

1

Google-Extended

1

1

which ai bots are most active?

The Biggest Surprise

Many people assume OpenAI-related crawlers dominate the web.

Based on the data collected from this website, that was not the case.

Amazonbot generated significantly more requests than any other identified AI crawler.

This does not necessarily mean Amazon is using more content than other providers.

It simply means that on this website, Amazonbot generated more observable activity than any other AI-related crawler.

That distinction is important.

What Can We Learn From AI Bot Behaviour?

The most interesting insight isn’t how many requests a bot makes.

The interesting question is:

Which pages are they visiting?

Because every request includes:

  • URL visited
  • Date and time
  • IP address
  • User agent
  • Frequency of visits

Patterns quickly emerge.

Some pages receive repeated visits.

Others receive a single crawl and are never revisited.

Some content categories appear to attract significantly more AI crawler activity than others.

Repeated Crawling May Indicate

  • Content refresh monitoring
  • Search indexing updates
  • Knowledge graph updates
  • Retrieval system checks
  • Ranking evaluations
  • Content quality reviews

What it does NOT automatically prove is:

  • Citation
  • AI training usage
  • Content inclusion in an AI model

Those conclusions require additional evidence.

However, repeated crawler visits do demonstrate ongoing interest in a page.

Human Visitors vs Bots

One of the biggest misconceptions in website analytics is the belief that every recorded request equals a visitor.

This dataset demonstrates why that assumption is dangerous.

Many requests were generated by:

  • Search engines
  • AI crawlers
  • Monitoring systems
  • Vulnerability scanners
  • Spam bots

Without server-side tracking, much of this activity remains invisible.

Without JavaScript beacon tracking, many analytics systems struggle to separate genuine visitors from automated traffic.

The result is inflated traffic statistics that often tell only part of the story.

Why I Built My Own Traffic Intelligence Platform

The data used in this article comes from my own Traffic Intelligence platform.

Unlike traditional analytics systems, the platform combines:

Server-Side Tracking

Capturing:

  • Bots
  • AI crawlers
  • HTTP headers
  • Status codes
  • Security events
  • IP intelligence

JavaScript Beacon Tracking

Capturing:

  • Human page views
  • Sessions
  • User journeys
  • Real visitor engagement

The combination allows activity to be analysed from both perspectives.

Learn More

Traffic Intelligence Platform:

https://laughingprofessor.net/user-tracking/

Subscription Details:

https://laughingprofessor.net/website-developer/traffic-intelligence

Security Insights From Real Traffic

Tracking visitor behaviour also reveals patterns that standard analytics ignore.

Examples identified during the monitoring period included:

  • Fake account registrations
  • Email spam activity
  • Root file probing
  • WordPress login probing
  • Automated scraping attempts

Rather than automatically blocking visitors, activity is logged and reviewed before action is taken.

This creates a historical intelligence database rather than a simple firewall log.

What Happens Next?

The AI landscape is changing rapidly.

The most valuable information may not be how many AI bots exist.

It may be understanding:

  • Which bots visit your website
  • Which content they repeatedly request
  • How often they return
  • Which pages attract the most attention
  • How AI traffic changes over time

As more data is collected, future reports will compare crawler behaviour month-by-month to identify emerging trends.

Final Thoughts

The conversation around AI crawlers is often driven by assumptions.

The numbers above come from a single website with over 1.4 million tracked events collected over 277 days.

What they show is simple:

AI bots are active.

Bot traffic remains substantial.

Human visitors represent only part of the overall activity occurring on a modern website.

And if you’re only looking at page views, you’re probably missing most of the story.

Data source: First-party tracking data collected from LaughingProfessor.net using the Traffic Intelligence Platform. Statistics reflect the tracking period shown and will continue to evolve as additional data is collected.