The integral value of proactive performance monitoring when implementing AI

by | Feb 21, 2018

Anyone seeking to educate him or herself about the evolution of artificial intelligence – its exploding acceptance and promise and its impact on large organizations – need not look far. It is the hottest topic in business today. Big data and cognitive computing solutions, such as IBM Watson, are manifestly transforming organizations at operational and […]

Anyone seeking to educate him or herself about the evolution of artificial intelligence – its exploding acceptance and promise and its impact on large organizations – need not look far. It is the hottest topic in business today. Big data and cognitive computing solutions, such as IBM Watson, are manifestly transforming organizations at operational and strategic levels, affording unprecedented visibility into processes and data, revealing customer attitudes and behaviors, identifying patterns and trends and so much more.

However, among IT managers and system administrators, conversations turn quickly to the increased demand AI places on their infrastructures. It’s certainly one of the most pressing issues they face and raises questions about planning and the ability of technology teams to visualize and mitigate the impact of heavy data workloads on their current infrastructure. Leaving us with the questions… how can organizations best adapt or augment their current infrastructure to accommodate AI?

Without the appropriate infrastructure, organizations run the risk of adding more problems than insights. Latency, downtime, poor application performance and decreased productivity and ROI are among the pitfalls that must be avoided.

Taking the first step to a successful AI implementation

AI is not a single, monolithic application; it’s merely an umbrella label that encompasses a huge diversity of discrete applications for planning, learning, problem solving, analyzing and processing. Even though the concept of AI is relatively new, it doesn’t warrant special treatment from an infrastructure performance monitoring standpoint. IT organizations should gauge the impact specific AI applications have on their infrastructure no differently than they would any other application.

That said, those AI applications are unquestionably demanding. Most organizations deploying AI will, in fact, hit the proverbial performance wall with their current infrastructure. The solution lies in anticipating the hit and minimizing the damage.

That’s where proactive performance monitoring comes in, acting as both a collision avoidance system and, in the event of a collision, an airbag to soften the blow.

The role of proactive performance monitoring

An IT manager poised to navigate their organization through the AI journey faces two fundamental questions:

  • How well do I know my current infrastructure – its performance and utilization?
  • What impact will AI workloads have on my present environment?

Mitigating impact of AI applications requires that the infrastructure team analyze the performance and utilization of its current infrastructure, including critical elements such as network, disk, CPU, storage and memory. Assuming a proactive monitoring solution had already been deployed, the team would be able to access years of historical data, granular metrics, trends and the kind of broad, deep, predictive analytics that can alert them to potential bottlenecks in the infrastructure and provide a roadmap to successful AI application deployment.

Subsequently, in a test environment (prior to deploying the AI application into production) teams can use a proactive monitoring solution to collect performance thresholds, identify very specific benchmarks and observe how changes in the utilization of the application impacts critical infrastructure elements. IT teams can then leverage insights to:

  • Optimize infrastructure elasticity as AI application demands expand or shrink
  • Improve overall management of IT assets
  • Prevent over-provisioning
  • Right-size the IT environment in anticipation of AI application deployment

Of course, the same performance monitoring capabilities are just as important as the application moves production, providing actionable insights into distinct utilization and capacity, which they can then compare to projections. An infrastructure component tagging feature would enable the IT team to create manual or rules-based ‘tags’ of all nodes (storage, servers, databases, SAN and cloud – across operating systems) related to the AI application, review their performance and run analytics over time, all through a single dashboard.

The key to a successful AI deployment and to avoiding the infrastructure performance pitfalls is the ability to understand what impact AI workloads will have on your current infrastructure. Visibility into literally thousands of granular infrastructure performance metrics, both during the test phase and in the production environment across virtually every infrastructure component, is critical. Furthermore, to maximize the proactive and predictive value of performance metrics, there should be limitless access to historical data with various timeframes where those metrics can be collected.

Confidently explore your AI deployment options.

Because complex IT environments make performance management a difficult and expensive task, enterprises need a single solution that provides intelligent predictive analytics to inform and pinpoint potential problems before they occur and to empower better decision-making. Experimentation with server infrastructure to run these new workloads is quick and intuitive with Galileo Performance Explorer, yielding data for specific configurations, CPU characteristics and I/O capabilities.

Galileo is a division of the ATS Group, LLC (an IBM Gold Partner) and supports IBM’s aggressive push in the AI space with Watson and the Power8 and Power9 processors. Simply put, IBM delivers machine learning “in a box” with their Power servers. Power9 is specifically engineered and optimized for machine learning and AI workloads, delivering high throughput, reduced latency, optimal I/O and four GPUs in the server. With Galileo, IT teams can easily monitor and manage all versions of IBM Power.

Gain the competitive advantage without jeopardizing performance. 

Galileo Performance Explorer supports the efforts of IT administrators to re-imagine their technology stack and optimize it for AI workloads. With Galileo’s SaaS-based solution, organizations can:

  • Perform trial and error testing across all assets (servers, storage, database and SAN), both on-premises and in the cloud
  • Determine if current assets will accommodate AI workloads
  • Comprehensively plan for AI workloads and help structure the appropriate infrastructure transformation

Organization must take a proactive approach to their AI infrastructure. With Galileo, IT teams can optimize utilization, increase agility and meet time-to-market pressures. In doing so, these organizations can effectively balance IT maintenance with innovation and drive the digital initiatives that propel business growth and the competitive advantage.

Related Articles