How to prevent your website from crashing

Preventing+website+Crashes Crowdhandler+header

If you’ve ever run an on-sale or product drop and seen that moment where your site reaches traffic capacity and crashes, you know exactly how damaging it can be. It’s not just about lost revenue; crashes damage your brand’s reputation, decrease your customers’ loyalty, and taint any future product drop or on-sales you run.

Website crashes are much more likely to happen at the most critical times, where traffic is at its highest: product drops, season renewals, and ticketing on-sales.

We’ve seen a lot of websites struggle with traffic (solving that is what we do) and we could easily shortcut this article, saying “install a virtual waiting room, it can be ready on your site before tomorrow!” and leave it at that. But it’s important to understand why a website crashes before you start seeking a solution.

1) Identify your site’s weaknesses

There are dozens of factors that can influence a crash happening. But it typically isn’t a simple case of server capacity. Look for specific bottlenecks in your infrastructure and code. These are the limiting factors that can constrain your system’s performance. As we talked about in a recent article on WooCommerce, for some platforms and sites, it doesn’t really matter how big your servers are if your site’s architecture can’t handle it.

2) Have a backup in place

When analyzing your site’s performance, the most important thing to look out for is single points of failure. Points where, if they crash, they’ll take your entire website down with them. Identifying and addressing these critical failure points is essential for building a resilient and crash-resistant website. That doesn’t mean just reinforcing these points as much as possible, that means introducing strategies that eliminate the possibility of your site having any single point of failure.

How to remove single points of failure from your website

  1. When building or improving your site, add in redundant systems and components so that if one fails, there is a backup system to take it over. This could be additional web servers, database clusters, load balancers, or network connections.
  2. Configure your critical systems, like your database and caching layers, to run in a high availability (HA) configuration.
    • High Availability setups use techniques like replication and failover to maintain availability when individual components fail.
  3. By using cloud providers like Amazon Web Services (AWS), Google Cloud, and Microsoft Azure, you get access to a range of high availability, redundant services that can help eliminate these single failure points within your site’s infrastructure. This includes things like load balancing, content delivery networks, and managed database services.
  4. If you can, distribute your site’s supporting services across multiple geographic regions or availability zones. This will protect you from localised outages or failures that might only impact a single data center.
  5. Keep monitoring and analyzing your site. As you introduce new features and systems, potential failure points can arise – discovering and resolving these quickly is key to stopping your website from crashing.

3) Stress test your site

This should play into the analysis of your systems and infrastructure, but when it comes to preventing website crashes, running through load tests is such a crucial step to finding where your limits are.

This doesn’t mean sending millions of bots to your website at a random time on a random weekday; you should think about when your normal traffic will be least affected by a load test. But it’s also important to consider that the value of your test results depend heavily on the accuracy of your user modelling and the relevance of the metrics you track.

It’s not just about traffic hitting your site, it’s about users interacting with your site.

How to run an effective load test

The foundation of any effective load test is a realistic representation of your actual user traffic patterns and behavior. That includes:

  • A full range of user journeys
  • Conversion funnels
  • Peak and off-peak variations

It is a challenge to accurately replicate real-world user activity. However, you will get a much more accurate result by taking the time to research user behaviour on your site and building systems that can replicate those behaviors. That means going beyond the data showcased in applications like Google Analytics. Peak traffic events have a very different traffic profile than the typical background traffic on your website. Users engaging in highly-motivated booking or purchasing behavior during a sale will interact with your site in ways that casual visitors wouldn’t.

4) Eliminate bots

We’ve spoken in a previous article about how damaging bots can be to your product drops. But an often overlooked cause of website crashes is much more malicious than scalpers trying to make a quick buck. Distributed Denial of Service (DDoS) attacks have happened to the likes of Google, Facebook, GitHub, even AWS, and with tens of millions of these attacks happening every year, it’s not hard to imagine that your site might end up in a malicious actor’s crosshairs. Implementing robust security measures to identify and block this kind of bad traffic is crucial for maintaining your site’s availability and stability for your users.

How to eliminate bots on your site

Use a specialized DDoS protection service or content delivery network (CDN) that automatically detects and filters out DDoS attack traffic before it reaches your origin servers. These services include: Cloudflare, AWS Shield, and Google Cloud Armor.

  1. Employ your own bot detection and mitigation techniques to identify and block automated traffic, whether it’s from malicious bots or well-meaning web scrapers. We recommend:
    • CAPTCHA challenges
    • IP/user-agent blocklists
    • Behavioral analysis
  2. Closely monitor your website’s traffic patterns and identify any anomalies that could indicate a potential bot activity or a DDoS attack and if you can, set up alerts to notify your team of potential issues as soon as they arise.
  3. Implement rate limiting at the application, network, or infrastructure level. This restricts the number of requests a single IP address or user account can make within a given time period. This can help protect against both malicious and legitimate traffic spikes. You can think of CrowdHandler as a user-friendly rate limiter.
  4. Make sure that all of your software components, frameworks, and dependencies are up –to date with the latest security patches. This massively reduces the risk of your systems being compromised and used for malicious purposes.
  5. And finally, develop a clear, documented process for how to quickly identify, mitigate, and recover from DDoS attacks or other malicious traffic incidents. Ensure your team is trained and prepared to execute this plan.

5) Use cloud servers

Cloud hosting allows you to leverage the scalability and redundancy of cloud infrastructure to implement effective autoscaling that can help you dynamically adjust your website’s capacity to handle fluctuating traffic patterns.

If you’re using cloud hosting with autoscaling, it’s still important to consider the potential weaknesses your site might have. To maximize the benefits that come with cloud hosting means implementing thresholds and metrics that will trigger your autoscaling processes. These thresholds could be:

●        CPU Utilization

●        Response times from your site

●        Concurrent users or user sessions

Base these off your understanding and analysis of your website’s performance. Make sure when you implement these processes you consider how gradual your website’s capacity autoscales. You should increase and decrease resources gradually rather than moving up and down in large, abrupt steps – this will help avoid any overcorrection that could put your site’s performance at risk.

Because of this, you can’t rely on autoscaling alone to handle extreme traffic spikes, if you run product drops or on-sales, it’s vital to make sure you have additional, non-autoscaled capacity available as a buffer.

Handling the transition as your infrastructure scales is vital if you expect your site to host an extreme high traffic event like an on-sale or product drop – that’s why taking a hybrid approach with a virtual waiting room[2]  will keep the experience smooth for end users.

Handle that crowd (Install a Virtual Waiting Room)

And of course, we can’t go without diving into the best solution. You’re here on the CrowdHandler website so you’re already halfway there! Virtual Waiting Rooms help you control the level of traffic entering your site, so whether you need it to handle 100 users or 100,000 users, we offer a range of sizes to suit any website.

Virtual waiting rooms are THE best way to prevent website crashes at high traffic peaks. They work by keeping all the traffic hitting your website separate from your main servers and then letting through an amount set by you (or automatically with our Autotune feature). They act as a buffer, ensuring your systems aren’t overloaded.

Some of the key benefits of installing a virtual waiting room on your site include:

  • Preventing site overloads and crashes during high-traffic events
  • Maintaining a positive user experience by avoiding long wait times or error messages
  • Allowing you to scale your infrastructure without risk and in line with demand, rather than having to overprovision
  • Providing detailed analytics on visitor behavior and traffic patterns

Take a look at our Pricing page to find out more about the many features we offer to our users or take advantage of our 30-day free trial to get a hands-on look at our features.

Sign up