Outdated business continuity plans are putting IT infrastructure at risk
By Christophe MaisonnaveAs the world scrambles to identify ways to operate more sustainably, experts are raising concerns about how much energy will be required to run the data-intensive, AI-powered global economy unfolding before us. Much work has been put into addressing the environmental footprint of one of IT’s worst offenders: the power-hungry data centre.
Thankfully, we are seeing more organisations adopt sustainable IT solutions that are successfully improving the energy efficiency rates of these facilities. Yet, despite these gains, too many organisations have overlooked how efficiency best practices can impact their data centre’s disaster recovery and business continuity plans (BCPs), ultimately exposing well-meaning IT executives to costly financial, regulatory, and reputational damages.
Reducing the cooling requirements of data centres has been a primary tactic employed to improve their energy efficiency. Researchers have credited factors such as best-practice air-flow management and better cooling designs for global improvements in average data centre Power Usage Effectiveness (PUE), which thankfully has dropped down from 2.5 in 2007 to between 1.55 and 1.59 in recent years.
Singapore, for its part, introduced new regulations in 2022 that require a 1.3 PUE for new data centres, and issued a new standard in 2023 that supports data centre operators to gradually increase operating temperatures to 26°C and above, up from 22°C or below temperatures commonly used.
Several innovations are helping to shrink electricity footprints by enabling data centres to run hotter than before. However, if a cooling system fails for some reason, the warmer ambient temperature in the building means that IT teams have less time to react before the rising temperature reaches a point that triggers an automatic shut-off or system failure. For example, a system cooled to 21°C rather than 16°C would reach the maximum temperature that triggers a system shut-off sooner.
Considering that experts identified cooling failures as one of the “common causes” of outages, a cooling failure in a warmer data centre should absolutely be factored into today’s BCPs. Yet, many data centre operators and enterprise IT teams are still working with old BCPs that do not factor in the reduced time-to-failure of warmer data centres.
Recent high-profile outages linked to cooling system failures impacting DBS and Citibank in Singapore and Google Cloud services London serve as stark reminders of how quickly things can go wrong when data centres heat up. Rising outdoor temperatures due to climate change-induced heatwaves, combined with rising inside temperatures adjusted to combat climate change, add a nuanced complexity to cooling system analysis and disaster planning that we have not seen before.
Given rapidly evolving data centre technologies, BCPs need to be meticulously reviewed to ensure they’re keep pace with data centres’ growing workload intensities and updates to cooling systems, equipment, and practices.
An effective business continuity plan
Preparing adequately for data centre disasters is becoming increasingly business critical, with experts at the Uptime Institute pointing out that over two-thirds of all outages globally cost more than US $100,000, a figure that has grown over time, and is predicted to trend upwards as dependency on digital services increases. When it comes to sustainable IT,
enterprises stand at a pivotal juncture where they must reconsider and modernise their BCPs, laying the groundwork for a more resilient coexistence of resource-efficient IT and disaster preparedness.
To update their BCPs, enterprises should first ensure they have conducted their disaster simulations assuming increased baseline operating temperatures to properly understand system response and recovery nuances.
Secondly, enterprises should avoid making assumptions, and should instead ask data centre operators directly whether the operator has recently changed their temperature criteria or KPIs. Doing this can help ensure the enterprise’s BCPs accurately factor in relevant temperature increases within overall data centre dynamics.
Additionally, enterprises should not simply ask data centre operators what in-row temperatures they plan to use. They should independently verify on-site temperatures themselves because real-world temperatures frequently differ to planned.
Furthermore, enterprises need to stay abreast of rapidly evolving advancements in data centre Mechanical, Electrical, and Plumbing (MEP) solutions, so as not to overlook opportunities to benefit from new technologies like direct or immersion cooling.
Another crucial step is to find out what systems their neighbouring data centre tenants are running because system failures next door have been known to pose risks that have gone totally accounted for in BCP and disaster planning.
Lastly, enterprises need to update training programs to instil a holistic and comprehensive understanding of sustainable data centre operations across IT and disaster recovery teams, all while remaining vigilant of the evolving regulations to mitigate compliance risks.
As more IT teams find themselves with less buffer for disaster recovery, closer alignment is needed with data centre facility experts to ensure BCPs are realistic. Getting disaster recovery planning right in a carbon-constrained world turns out to be an incredibly difficult balancing act, with experts striving for lower environmental footprints while still needing to protect vital IT.
It’s true that building in more redundancy may improve disaster recovery, but doing so can worsen a system’s environmental footprint by duplicating workloads for backups. Indeed, finding the proverbial “sweet spot” requires heightened collaboration between IT teams and data centre operators, and demands an overall up-levelling of sustainable data centre design and operations expertise.
As businesses journey towards a future where sustainability is integral to stability, and where data centre outages can make or break operations, environmentally responsible resilience is not just an option, it's a strategic imperative.