Analyzing Website Traffic for Security: How to Protect Your Website and Data

Q: Should I automatically block every suspicious IP address?

Auto-blocking reactive rules can cause collateral damage — for example, blocking shared IPs used by legitimate users, or blocking cloud infrastructure ranges that also serve real visitors. A better approach is to log and monitor suspicious activity first, building an intelligence database. This lets you identify patterns, confirm malicious intent, and make informed blocking decisions. Automatic blocking is reactive; logged intelligence is strategic.

Q: Can I block Microsoft Azure or Amazon AWS IP ranges if attacks come from there?

Technically yes, but it is generally inadvisable to block entire cloud provider IP ranges. Both Azure and AWS also serve legitimate businesses, SaaS tools, CDN nodes, and users working through corporate VPNs. The better approach is to block specific IP addresses with confirmed malicious patterns, not entire infrastructure ranges. Document the reason and the hit count before blocking, as this data helps justify and review blocking decisions over time.

Website Security & Traffic Intelligence

Analyzing Website Traffic for Security:
How to Protect Your Website and Data

Most website owners monitor analytics for visitors. Very few monitor for threats. Real data from 277 days and 1.4 million tracked events shows what is actually happening on your server — and whether your defenses are working.

June 2026 The Laughing Professor 9 min read First-party data

1.4M Server Requests

354K Bot Requests

83 Active IP Blocks

14,768 Hits — Top Blocked IP

2.1% Confirmed Human Traffic

The Gap Between What You See and What Is Really Happening

If your analytics shows 500 sessions a week, it feels like 500 people visited your website. That is a reasonable assumption. It is also wrong.

Over 277 days of server-side monitoring on LaughingProfessor.net, the Traffic Intelligence platform recorded 1,299,330 server requests. JavaScript beacon tracking confirmed 27,441 of those as genuine human page views. That is approximately 2.1% of all server activity.

The remaining 97.9% was automated — bots, crawlers, scrapers, security scanners, and in some cases, deliberate attacks.

Key Insight

If you are only monitoring page views, you are blind to the vast majority of what is actually hitting your server. Standard analytics tools like Google Analytics are deliberately designed to filter out non-human traffic — which means they are filtering out your threat data too.

Understanding the security picture requires looking at the full server request log, not just the human-facing analytics layer.

What the HTTP Status Code Data Reveals

HTTP status codes are the fastest way to understand the nature of traffic hitting your server. They tell you not just who visited, but what happened when they did.

200 25,784 OK — Page loaded

301 13,680 Permanent redirect

302 373 Temporary redirect

404 285 Not Found

400 33 Bad Request

502 5 Server error

At first glance, 285 total 404 errors and 33 bad requests sounds manageable. But the security relevance is in who generated them and from which paths.

The 400 Bad Request — A Bot Fingerprint

A 400 error means the server received a malformed request. Humans almost never generate these through normal browsing. When your logs show 400 errors from the same IP address repeatedly, or targeting paths that do not exist on your site, it is a reliable signal of automated probing.

The Traffic Intelligence data shows 4 unique URLs generating 400 errors from 31 unique IPs. That pattern — multiple IPs generating identical malformed requests — points to coordinated scanning activity rather than a single bad actor.

The 404 Pattern and WordPress Probing

LaughingProfessor.net does not run WordPress. Yet the server logs show repeated requests to paths like /wp-login.php and /wp-admin/. Every one of these generates a 404 — the file does not exist. But the scanning continues anyway.

Why This Happens

Automated attack scripts operate from pre-compiled lists of known vulnerable paths. They do not check which CMS you are running before probing. They hit every site in their target range with the same list. A 404 response tells them to move on. An unexpected 200 response tells them to escalate. The volume of WordPress probing against non-WordPress sites is a measurement of how industrialized automated attacks have become.

Real Blocked IP Data: What the Threats Actually Look Like

The Traffic Intelligence platform currently maintains 83 active IP blocks across 85 total records. The top entries in the block list tell a clear story about the nature of threats facing a mid-sized niche website.

Threat Type	Reason Logged	Source	Hits in Log	Block Type
Page Spam	Spamming /ada-accessibility-widget	Imported list	14,768	Permanent
Page Spam	Spamming /white-label-seo-tools	Imported list	8,714	Permanent
Root Probe	WP root access attempt	Microsoft Azure	1,371	Permanent
Root Probe	Probing root files	Microsoft Azure	1,119	Permanent
Email Abuse	Email spamming	Admin review	858	Permanent

Several things stand out in this data that are worth examining closely.

14,768 Hits From One IP — Before or After Blocking?

The top blocked IP accumulated 14,768 hits in the log, with a last-seen date of February 16 — well before the April 1 block date shown in the system. This is important. That hit count represents the intelligence gathered before the block was applied, not hits that got through a block.

This is the deliberate strategy behind the Traffic Intelligence approach: log first, block when confirmed. The result is a documented evidence trail for every blocked IP, making the block decision auditable and reversible.

Microsoft Azure as an Attack Source

Two of the top five blocked IPs originate from Microsoft Azure datacenters. This is not unusual — cloud infrastructure is routinely used to launch automated attacks because it is cheap, scalable, and provides geographic distribution. The challenge this creates for blocking strategy is significant: you cannot block Azure wholesale without potentially blocking legitimate SaaS traffic, CDN requests, or business users on Azure-hosted corporate networks.

The right approach is exactly what is shown here: block the specific IP, document the source as Azure, and monitor whether new Azure IPs exhibit the same behavior.

How Effective Is IP Blocking? The Honest Answer

IP blocking is widely recommended as a first line of defense. The reality is more nuanced. Effectiveness depends almost entirely on where in the request chain the block is applied.

CDN / Edge Firewall (e.g. Cloudflare)

Request is blocked before it ever reaches your hosting server. No server resources consumed. Most effective against volumetric attacks and known bad IPs.

Highest Effectiveness

Hosting Provider Firewall / WAF

Block applied at the server level before the web application processes the request. Effectiveness varies by host — ask your provider where their firewall sits in the stack.

High Effectiveness

.htaccess Rules (Apache)

Processed by Apache after the TCP connection is established. On many shared hosts, the server has already accepted the request before .htaccess is read — meaning server resources are consumed regardless of the block outcome.

Medium Effectiveness

Application-Level Blocking (PHP / CMS Plugin)

The request reaches your application entirely before being rejected. All PHP execution, database queries, and server load have already occurred. Stops the response but does not stop the resource cost.

Lowest Effectiveness

The .htaccess Reality on Shared Hosting

Many shared hosting customers are told that adding Deny from [IP] to .htaccess will block attacks. Technically, this is true — the response will be blocked. But on most shared hosting configurations, the TCP connection, SSL handshake, and initial request parsing have already happened before .htaccess is consulted. The attacker gets a 403 response, but your server already paid the cost of handling the connection. For low-volume targeted blocking it still has value. Against high-volume attacks (like the 14,768 hit example above), it is not enough on its own.

The Case for Logging Before Blocking

Auto-blocking rules — where suspicious behavior triggers an automatic ban — are tempting because they feel proactive. They carry real risks:

Shared IPs can block legitimate users on the same network
Cloud infrastructure IPs rotate — a block becomes stale quickly
False positives are hard to diagnose without a log trail
You lose visibility into attack patterns you have not yet analyzed

The approach used here — log everything, block deliberately, document reasons — builds an intelligence database over time. Patterns that only become visible across weeks or months of data would be lost under an aggressive auto-block configuration.

How Effective Is Your Hosting Provider's Security?

This is the question most hosting providers would prefer you did not ask too directly. The honest answer is: it depends on what tier you are on, and almost nobody tells you explicitly where their firewall sits.

The right questions to ask your provider:

Does your firewall operate before or after requests reach the server?
Is DDoS mitigation included, and at what traffic threshold?
Are malicious IPs blocked at the network edge or at the application layer?
Do you provide access to raw server logs including bot and crawler traffic?
Is mod_security or a WAF (Web Application Firewall) active on my account?

What Good Hosting Security Looks Like

The strongest hosting configurations combine network-level DDoS protection, a WAF that processes rules before the application layer, raw log access for your own monitoring, and the ability to push custom IP blocks to the firewall level rather than relying solely on .htaccess. If your provider cannot answer the questions above, that is itself useful information.

The Gap That First-Party Tracking Fills

Even good hosting provider security does not give you visibility into what is being allowed through. The data in this article — every bot, every probe, every blocked IP with its full hit history — comes from the Traffic Intelligence Platform running independently of whatever the hosting provider does or does not block.

Hosting security and first-party tracking are not alternatives to each other. They operate at different layers and provide different information. You need both.

A Practical Security Monitoring Framework

Based on the data collected across this 277-day period, a practical approach to website security monitoring looks like this:

Step 1: Establish a Baseline

Before you can identify threats, you need to know what normal looks like. Deploy server-side tracking and run it for at least 30 days before drawing conclusions. Document your typical request volume, common user agents, and expected 200/301 distribution.

Step 2: Separate the Traffic Layers

Use the combination of server-side logging and JavaScript beacon tracking to cleanly separate human visits from automated requests. This two-layer approach is what makes the 2.1% human traffic figure meaningful — without both layers, you cannot calculate it.

Step 3: Flag and Review, Then Block

Set thresholds for review rather than automatic blocking. An IP generating 50+ requests in an hour targeting non-existent paths deserves a flag. An IP with 500 hits across two weeks hitting the same tool page is worth a permanent block. Document your reasoning in the block record.

Step 4: Export and Escalate When Appropriate

The block list in Traffic Intelligence exports directly to .htaccess snippet format, plain IP lists, and AbuseIPDB CSV. For IPs that represent serious or persistent threats, reporting to AbuseIPDB contributes to the shared intelligence network that other site owners rely on.

Frequently Asked Questions

How do I know if my website is being attacked?

Most website owners only see human page views in Google Analytics — which can represent as little as 2% of actual server activity. To detect attacks, you need server-side tracking that logs every HTTP request, including IP addresses, user agents, and HTTP status codes. Signs of attack include repeated 400-series status codes from the same IP, probing of paths that do not exist on your site (such as /wp-login.php on a non-WordPress site), and sudden spikes in server requests not reflected in your human page views.

Does blocking an IP address in .htaccess actually stop attacks?

It depends on where your hosting provider processes .htaccess rules. On many shared hosting setups, .htaccess is processed at the application layer, meaning the server has already received and begun handling the request before the block takes effect — consuming server resources regardless. A firewall-level block at the hosting provider or CDN level is significantly more effective because it stops the request before it ever reaches your server.

Should I automatically block every suspicious IP address?

Auto-blocking reactive rules can cause collateral damage — blocking shared IPs used by legitimate users, or blocking cloud infrastructure ranges that also serve real visitors. A better approach is to log and monitor suspicious activity first, building an intelligence database. This lets you identify patterns, confirm malicious intent, and make informed blocking decisions. Automatic blocking is reactive; logged intelligence is strategic.

Can I block Microsoft Azure or Amazon AWS IP ranges if attacks come from there?

Technically yes, but blocking entire cloud provider IP ranges is generally inadvisable. Both Azure and AWS also serve legitimate businesses, SaaS tools, CDN nodes, and users working through corporate VPNs. The better approach is to block specific IP addresses with confirmed malicious patterns, not entire infrastructure ranges. Document the reason and the hit count before blocking — this data helps justify and review blocking decisions over time.

What do HTTP 400 Bad Request errors mean for website security?

HTTP 400 errors indicate malformed requests — and in a security context, they almost always indicate bots probing your site. When a 400 error comes from the same IP repeatedly, or when it targets paths that should not exist on your server, it is a strong indicator of automated scanning. Tracking which IPs generate 400 errors alongside the specific paths they requested gives you actionable intelligence for blocking decisions.

Why does my server get WordPress login attempts when I don't use WordPress?

Automated attack scripts do not check what CMS you are running before probing. They hit millions of sites with the same list of known vulnerable paths — /wp-login.php, /wp-admin/, xmlrpc.php — regardless of whether WordPress is installed. These are opportunistic mass-scans. If your server returns a 404 for those paths, they move on. If anything responds, the scanner escalates. Logging these attempts helps you identify persistent sources worth blocking.

How effective is my hosting provider's built-in security?

Hosting provider security varies enormously. Entry-level shared hosting typically offers minimal automated protection, relying on you to manage .htaccess rules manually. Mid-tier managed hosting often includes basic firewall rules and DDoS mitigation. Higher-tier providers and dedicated servers with WAF (Web Application Firewall) configurations offer the most effective protection. The critical question to ask your host: at what layer does your firewall operate — before or after the request reaches the server?

What is the difference between AI crawler traffic and malicious bot traffic?

Legitimate AI crawlers such as ClaudeBot, GPTBot, and ChatGPT identify themselves via their user agent string and generally follow robots.txt directives. They are not security threats — they are data collection systems operating predictably. Malicious bots typically spoof or omit user agent strings, ignore robots.txt, probe non-existent paths, and generate high request volumes from distributed IPs. The key difference is intent and behaviour: legitimate crawlers read publicly available content in recognisable patterns; malicious bots probe, scrape, spam, or attempt to exploit vulnerabilities.

The Bottom Line

Over 277 days, 83 IPs were blocked, 354,000+ bot requests were logged, and the top attacking IP alone generated nearly 15,000 hits before being permanently blocked. None of that appeared in standard analytics.

Protecting your website starts with seeing it accurately. Standard analytics tools show you one thin slice of your server activity. Server-side tracking with IP intelligence shows you the rest — which is where the threats live.

The data in this article was collected using the Traffic Intelligence Platform. All statistics reflect first-party tracking on LaughingProfessor.net across the 277-day period referenced in this series.

Analyzing Website Traffic for Security: How to Protect Your Website and Data

Analyzing Website Traffic for Security:
How to Protect Your Website and Data

The Gap Between What You See and What Is Really Happening