The Role of Observability in Managing Cloud Complexity

Introduction: The Growing Complexity of Cloud

The modern enterprise operates within a cloud landscape vastly different from its predecessors. We have transitioned from isolated systems to interconnected, distributed environments. 91% of enterprises today use hybrid (public and private) cloud services, and 87% employ multi-cloud strategies (Flexera 2023). Increasing workloads, microservices, and distributed applications, which add to cloud complexity, have rendered traditional monitoring methodologies inadequate to address modern-day cloud challenges.

According to the Cloud Security Alliance (CSA), fewer than 25% of organizations reported full visibility into their cloud environments. This lack of comprehensive insights directly correlates with performance degradation, heightened security vulnerabilities, and operational inefficiencies.

Therefore, the strategic imperative is not merely to adopt a new approach but to implement one that provides the necessary clarity. That approach is what we call observability.

What is Observability?

Observability goes beyond monitoring by providing deeper insights into system behavior. It is built on three key pillars:

Logs – to capture historical events for troubleshooting.
Metrics – to measure system performance in real-time.
Traces – to track requests across distributed services for root cause analysis.

Traditional monitoring provides surface-level alerts, like high CPU or service latency, signaling "what" the symptom is, such as a red dashboard light. In contrast, observability enables deep dives by tracing user requests across microservices to pinpoint bottlenecks, such as slow database queries, and correlating these with log errors. It reveals the "why"- the root cause, such as a recent code deployment introducing a poorly optimized query causing cascading latency, even identifying the problematic code line.

Why Observability is Crucial in Cloud Environments

Given the increasing reliance on cloud services, downtime is more than an inconvenience, it is a significant financial risk. Gartner estimates that cloud outages cost businesses an average of $300,000 per hour. Additionally, IDC's survey showed downtime's top impact areas were: customer experience, financial/regulatory penalties, and employee productivity. Without full-stack visibility, IT teams remain reactive rather than proactive. Observability enables:

Real-time anomaly detection to prevent failures before they escalate.
End-to-end visibility across multi-cloud and hybrid cloud setups.
Faster Mean Time to Resolution (MTTR), reducing downtime by up to 60%.

Key Benefits of Cloud Observability

For a CIO, the goal is not just operational stability, but strategic agility. Observability empowers this, shifting from reactive IT operations to a proactive business enabler, offering the following key benefits:

Improved Performance and Reliability: Companies with strong observability report “multiple X” faster issue resolution. Predictive analytics can reduce downtime, helping businesses maintain seamless operations.
Enhanced Security and Compliance: Cyber threats often go undetected due to a lack of visibility. Cloud breaches occur because organizations fail to monitor their environments effectively. Observability tools help detect anomalies, insider threats, and ensure continuous compliance by identifying misconfigurations and policy deviations in real time.
Cost Optimization and Operational Efficiency: Cloud waste is a growing concern and up to 30% of cloud spend is wasted due to overprovisioning. Observability tools pinpoint underutilized resources and optimize cloud costs.
Data-Driven Decision Making: With AI-driven observability, businesses can automate troubleshooting, enforce governance, and optimize cloud strategy.

Implementing Observability in Cloud Operations

A successful observability strategy requires the right tools and best practices:

Selecting tools based on specific needs: For example, Datadog for real-time metrics, New Relic for application performance, Splunk for deep log analysis, or AWS CloudWatch for native AWS integration.
Leveraging AI/ML for insights: Automate anomaly detection and predictive maintenance using AI/ML, identify patterns in logs, predict resource needs, and automate root-cause analysis.
Defining key Service Level Objectives (SLOs): Establish clear, measurable SLOs for latency, uptime, and error rates; regularly monitor and adjust based on business impact.
Adopting AIOps for automated issue detection and resolution: Automate alert correlation, incident management, and remediation; use AI to optimize resource allocation and predict potential issues.
Ensuring full-stack observability: Integrate monitoring for servers, networks, databases, applications, and user interfaces; enable end-to-end data correlation for holistic visibility.

The Future of Observability in Cloud Management

As cloud environments grow more complex, observability is becoming smarter and more proactive:

AIOps will reduce alert fatigue by 50%: ML-driven alert correlation will suppress redundant alerts, allowing IT to focus on high-impact incidents, like a network outage causing multiple database errors.
Observability-driven FinOps will optimize cloud costs: Real-time dashboards showing microservice costs, coupled with performance data, will enable automated instance downsizing for underutilized resources.
Edge computing and IoT integration will enhance visibility: Lightweight agents will enable real-time monitoring of edge devices and applications, providing a centralized dashboard for distributed edge-to-cloud architectures.

Conclusion

Logz.io's 2024 Observability Pulse survey states that only 10% of organizations have fully implemented observability, indicating that this practice is still emerging. It is not a supplementary feature, but a foundational requirement for operational integrity. A well-executed observability strategy directly enhances efficiency, security, and reliability, while simultaneously mitigating downtime and controlling costs. As cloud ecosystems evolve and become complex, observability will remain key to seamless and undisrupted operations.

For enterprises seeking to optimize their cloud environments, strategic partnerships are essential. Tech Mahindra, with its deep understanding of modern IT architectures and focus on innovative cloud solutions, enables businesses to leverage the full potential of observability.

Tech Mahindra’s SMART Observability is an AI-powered monitoring and resolution tool designed to provide full-stack visibility across cloud environments. With one-click deployment, pre-configured dashboards, and seamless AWS integration (CloudWatch, X-Ray, Prometheus, Grafana), businesses can improve cloud observability, enhance DevOps efficiency, and ensure a more resilient IT environment.

TAGS: Cloud and Infrastructure Services Artificial Intelligence

Endnotes

Flexera. (Mar 2024). Cloud computing trends and statistics: Flexera 2023 State of the Cloud Report. Flexera.com
Cloudtech. (Feb 2024). Why companies continue to struggle with cloud visibility – and code vulnerabilities. Cloudtech.com
Manage Engine. (July 2024). Surviving the next downtime with proactive IT operations—Part 1. ManageEngine.com
IDC. (n.d). The Cost of Downtime in Datacenter Environments: Key Drivers and How Support Providers Can Help. IDC.com
Logz.io (Mar 2024). Observability Adoption Remains Nascent With Only 10% of Orgs Using ‘Full Observability’ — Logz.io Survey, Logz.io

About the Author

Girish Visweswaran

AVP, Cloud & Infrastructure Services, MEA, Tech Mahindra

Girish is a seasoned technology and business leader with over 25 years of experience in Cloud and Infrastructure Services, ITSM, and Global Delivery Management. As Vice President and Regional Head of MEA at Tech Mahindra, he specializes in driving revenue growth, strategic IT operations, and customer success.

Girish’s earlier stints at Wipro, Injazat Data Systems, and HCL Infosystems establish him as a noteworthy business leader in the IT Services space. His expertise spans P&L management, key account strategy, large-scale infrastructure service delivery, and vendor management. With a strong focus on process optimization, risk mitigation, and capability building, he enables enterprises to achieve operational excellence and digital transformation. Passionate about innovation and customer-centric solutions, he is committed to delivering scalable, efficient, and high-impact IT strategies.

Read Less

Author(s)

Girish Visweswaran

AVP, Cloud & Infrastructure Services, MEA, Tech Mahindra

Know More

Related Insights

The Intelligence Fabric – Connecting Enterprise Systems for Smarter, Faster Decision Making

February 16, 2026

Human-agent Collectives (HAC): What Next After AI? Redefining Work Through Autonomous Agents

January 07, 2026

Semantic Layer: The Missing Link for Business-Centric Analytics in the AI Era

December 24, 2025

BIAN as the Strategic Architecture for Banking: From Standardization to Intelligent Ecosystems

December 24, 2025

Event

Meet Tech Mahindra & IBM at an Exclusive Executive London Roundtable

Scale Your Cloud Infrastructure Efficiently

See our hybrid cloud, data center modernization, and workplace services for the agile enterprise.

Know More

Cut Through the Noise

Get real-world insights from thought leaders and experts building the future of enterprise tech.

Join S/N Newsletter

From Assisted Intelligence to Autonomous ExecutionA supply chain exception arises at 2 a.m. A pricing anomaly appears in a key market. A customer churn risk is detected mid-journey.Traditionally, each of these would trigger dashboards, alerts, and human escalation. Today, agentic AI systems are beginning to resolve them autonomously—analyzing context, selecting actions, and executing decisions in real time. According to Gartner, agentic AI is a leading strategic technology trend for 2025,1 and by 2028, 33% of enterprise software applications will include agentic AI, enabling 15% of daily tasks to be completed autonomously.2The shift in AI applications from assistance to autonomy introduces new and complex security challenges. When introducing these new systems, the organizations need to align agentic AI with their values. Organizations also need to adapt and address security challenges related to operational integrity, mitigation, and the prevention of harmful autonomous actions.Agentic AI will only scale when autonomy is paired with governance, traceability, and security by design.Key Areas of Difference between AI Security and Agentic AI SecurityAutonomy: Agentic AI systems operate independently and can execute complex tasks without human input to achieve their goals. Whereas generative AI focuses on a single model to create content (such as text or images) in response to a specific prompt and does not initiate further actions independently.Decision Complexity: Agentic AI is capable of adaptive and context-driven decisions. These actions are taken in real-time or in response to novel scenarios with minimal human oversight. Typical AI systems execute simple, task-oriented automations within a closed scope or defined boundaries.Proactive Risk Exposure: The attack surface of agentic AI systems expands due to their adaptive, interconnected behavior. Compared to conventional AI, they can create unpredictable threat vectors.Identity and Access Challenges: Traditional AI operates with transient authentication within clearly defined service boundaries. On the other hand, agents may maintain state across interactions. They can impersonate users or other digital entities with stored credentials. Agents can make autonomous access decisions based on goal-directed reasoning and potentially escalating privileges. Such capabilities require robust ephemeral, delegated, and cross-domain identity controls.Continuous Governance and Auditability: Agentic systems require real-time and dynamic policy enforcement. They also require traceability and continuous audit for self-directed workflows. In contrast, regular AI usually relies on more static security oversight and accountability frameworks.Agentic AI Security Risks and ChallengesAgentic AI systems introduce various security threats impacting the application layer, APIs, and ML/LLMs. OWASP also highlighted the security threats and mitigation measures for agentic AI. Below are key security threats to consider when building an agentic AI system.Goal Integrity and Alignment AttacksGoal integrity and alignment attacks target the reasoning processes that determine agent objectives. Attackers use techniques such as prompt injections and reward manipulation. They also poison the training data or exploit system gaps. Such attacks can cause agents to pursue unintended or harmful outcomes.Agent Hallucinations and Factual ErrorsAgent hallucinations occur when AI agents generate false information or act on incorrect or fabricated data. This can negatively affect business operations and cause minor issues or major system failures. These attacks can be mitigated by using human oversight, verification checkpoints, and confidence thresholds.Memory and Data PoisoningAttackers attempt to compromise AI decision-making by injecting malicious content into an agent's contextual memory. Attackers also insert harmful content in an attempt to manipulate training data and poison Retrieval-Augmented Generation (RAG). To mitigate these attacks, AI, data, and MLOps teams must validate all data ingested into agent memory and RAG pipelines through automated and human-in-the-loop controls.Tools and API MisuseAgents often rely on external tools and APIs to perform tasks. Without proper safeguards, these AI agents can misuse these tools. Agents can exceed their authorization boundaries or manipulate external systems. They can overutilize resources by making too many requests, potentially triggering a denial-of-service attack.Non-Human Identities (NHIs) often lack proper session oversight. This makes them vulnerable to token abuse or credential leakage. Implementing robust key management and limiting API usage helps in mitigating these risks. Applying least privilege principles and regularly monitoring for anomalous activity can prevent abuse or data leaks.Identity Spoofing and Privilege EscalationAttackers can manipulate agents to impersonate users or services and perform unauthorized actions. This risk increases as agents often access multiple systems with different permissions. Attackers exploit the agent’s trust or escalate privileges using malicious prompts. Clear boundaries, strict privilege isolation, and monitoring of agent activity can help reduce these risks.Agent-to-Agent Security and Supply Chain AttackAgent-to-agent interactions involve multiple autonomous AI systems that communicate, exchange information, and make collaborative decisions. In a multi-agent environment, agents might share unintended data, manipulate results to benefit themselves, or even deceive one another. When agents do not coordinate, their actions can clash and waste resources. Agents also depend on external models, libraries, or plugins. If not managed properly, they can also introduce vulnerabilities. It is important to secure supply chain interactions and control data flow between agents. If one agent is compromised, it can trigger cascading effects across multiple systems, resulting in widespread impact beyond the initial point of compromise. To mitigate these risks, it is important to verify agent identities, use secure messaging, and monitor for anomalous behavior.In interconnected agent ecosystems, a single breach can propagate across workflows, amplifying impact far beyond the initial point of compromise.Agent Clone and Rogue AgentCloning is the unauthorized duplication of an agent, which can lead to serious security breaches and trust violations. Attackers create ‘evil twins’ of trusted AI agents. They use reverse engineering or replicate training datasets to create clones. They also intercept and mimic API responses to conduct sophisticated social engineering attacks. Clones can harvest sensitive information from users who believe they are interacting with the legitimate system. They can also damage brand reputation through harmful outputs associated with a trusted AI identity.Similarly, a rogue AI agent can deviate from its intended purpose. Due to goal misalignment, malicious compromise, or software defects, an agent can operate outside authorized parameters. This can cause widespread damage, as agents may execute unauthorized transactions and leak sensitive information. They can also manipulate the connected systems or consume excessive resources.AI Agent Lifecycle SecuritySecuring agentic AI is an ongoing process that spans the agent’s entire lifecycle. Below is the entire lifecycle:Design and Development PhaseOrganizations should establish clear security requirements and guardrails defining agents' operational boundaries and permissible actions. Employing secure coding practices and regularly reviewing codes helps identify vulnerabilities. Access controls should be designed using the principle of least privilege. To safeguard sensitive information and data, implement encryption, data minimization, and privacy controls. Additionally, developers should implement input/output validation and secure configuration management. It is also important to maintain detailed documentation of all security decisions.Deployment PhaseThe deployment phase of agentic AI systems includes establishing a secure CI/CD pipeline with integrity verification, code signing, and vulnerability scanning. It also involves hardening the runtime environment through proper network segmentation, container security, and endpoint protection. Secure all APIs with proper encryption, rate limiting, and access controls. Conduct final pre-deployment security validation through penetration testing and compliance checks.Operation PhaseDuring the operation phase, continuously monitor agents to detect anomalous patterns and deviations. Implement input/output data validations to prevent injection attacks and prompt manipulation. Also, ensure responses are secure and comply with ethical guidelines. Organizations should also keep detailed logs of agent actions for security checks and compliance. Regularly monitor system performance to detect attacks or slowdowns caused by resource overuse. Set up automated actions to address security issues quickly.Maintenance PhaseThe maintenance phase of agentic AI systems focuses on keeping the agent secure while the agent is active. Regularly update all parts of the system with security patches and test the changes. Organizations must conduct security checks, such as penetration testing and vulnerability scanning, to identify new risks. Analyze security incidents and continually improve defenses based on new threat intelligence. Regularly review access privileges and configuration changes. Also, keep documentation up to date to track all security controls.Decommissioning PhaseAI platform, security, and IT operations teams should jointly plan the agent decommissioning stage. This includes formally approving shutdowns, terminating all agent processes, preventing unauthorized restarts, and securely wiping or destroying sensitive data in accordance with governance and compliance requirements. This includes training data, operational logs, authentication credentials, and cached content across all storage locations. Systematically remove all agent credentials, API keys, certificates, and permissions from every system it previously interacted with. The final steps include comprehensive documentation of the decommissioning process for audit purposes. Conduct a post-decommissioning security assessment to confirm no residual access or data remains.Securing agentic AI is not a one time control but a continuous discipline, spanning design, deployment, operation, maintenance, and decommissioning.AWS Services for Agentic AI Risk MitigationAWS provided the Agentic AI Security Scoping Matrix,3 a framework for securing autonomous AI systems. Based on the scope of application, it helps identify the appropriate security controls.Amazon Bedrock and Amazon Bedrock AgentCoreAmazon Bedrock Guardrails help in content filtering by defining custom boundaries for AI agent behavior and responses.Amazon Bedrock AgentCore Identity secures AI agents by managing their access to AWS resources and third-party services. It uses authentication protocols such as OAuth 2.0 and provides secure credential storage for OAuth tokens and API keys, ensuring that only authorized agents can access specific resources.Amazon Bedrock AgentCore tracks agent operations in real time through its identity and observability capabilities, enabling rapid issue troubleshooting and performance optimization to ensure reliable execution.Amazon Bedrock helps track the entire AI decision-making chain for audit purposes.Systematically test models for hallucinations and security vulnerabilities.Amazon SageMakerAmazon SageMaker Model Monitoring helps detect drift in model performance that may indicate compromiseAmazon SageMaker Clarify helps detect bias and explain predictions in machine learning modelsAmazon SageMaker Model Cards help document model characteristics and limitations to improve risk assessmentAWS IAM (Identity and Access Management)AWS IAM ensures fine-grained permission policies for AI agent actions.AWS IAM enables service control policies to limit agent capabilities.AWS IAM allows permission boundaries to prevent privilege escalation.AWS IAM offers role-based access control for different AI agent functions.AWS KMSEncrypt sensitive data used or generated by AI agents.Amazon MacieAutomatically discover and classify sensitive data, such as PII, financial data, and intellectual property. Amazon GuardDutyIt is machine-learning-powered threat detection that identifies abnormal agent behavior.It continuously monitors the system to identify potential security issues.It provides AWS Lambda integration for automated remediation.AWS CloudTrailProvide comprehensive API logging for all agent interactions.Immutable audit trails for compliance and investigation.Insights to detect unusual patterns in agent behavior.Amazon CloudWatchReal-time monitoring of agent activities and resource usage.Anomaly detection to identify potential security incidents.Custom dashboards for security operations visibility.AWS ConfigTrack configuration changes in AI systems.Enforce compliance rules for agent deployments.Assess overall security posture continuously.Amazon InspectorThe tool assists in assessing vulnerabilities for AI infrastructure.It facilitates security assessment of environments where agents operate.AWS Security HubIt provides a centralized view of security alerts across AI systems.The tool helps with compliance checks against security standards.It provides integration with other security services for comprehensive protection.ConclusionAs agentic AI applications and use catch up and grow, it is critical to adapt and address emerging challenges. Organizations will need to develop systems and cultivate a decision-making culture that prioritizes proactive security rather than being reactionary, with a focus on defending against known attacks. AWS provides a comprehensive suite of services that help build secure foundations for AI initiatives. Organizations can harness the power of autonomous AI systems while minimizing associated risks by understanding the unique security challenges they pose and implementing appropriate safeguards.

Today’s competitive environment of enterprise cloud migrations, traditional proposal development methods no longer meet the demanding timelines and accuracy requirements of modern businesses. As solution architects at Tech Mahindra, we worked across multiple RFX proposals using AWS Transform to simplify and speed up our migration proposal process. This blog explores how we adapted our approach when a customer needed a quick, directional business case for their cloud migration journey. The challenge was to deliver fast, actionable insights that would help the customers understand the value proposition of cloud migration within a compressed timeline. This solution helped set new standards for efficiency and precision in cloud migration planning while meeting the urgent needs of our customers’ decision-making process.Understanding the Cloud Migration JourneyCloud migration proposals traditionally require analysis of existing infrastructure, technical assessments, and direction for business case development. Our team faced the challenge of responding to an RFP to migrate more than 2,000 servers across three data centers to AWS. The client required directional cost projections, instance-sizing recommendations, licensing strategies, TCO analysis, and migration timelines, all elements that traditionally needed weeks of analysis and multiple team members’ involvement.Challenges in the Traditional ApproachThe conventional proposal development process was often inefficient and risky. Teams spent two to three weeks conducting manual analyses, wrestling with complex spreadsheet calculations, and making numerous assumptions. This approach not only consumed valuable resources but also introduced the risk of human error in calculations. Generic TCO calculators based on industry averages often failed to capture client-specific nuances, making it difficult to build compelling business cases that would resonate with stakeholders.While tool-based approaches offer significant advantages in streamlining these processes, their successful implementation hinges on effective coordination across the organization's technology landscape. This requires close coordination across multiple IT teams, including application owners, database administrators, security teams, and network specialists, to manage different aspects of the infrastructure. Each group typically maintains its own tools, methodologies, and approval processes, creating a complex web of dependencies that can impact data collection timelines. Additionally, varying security protocols and access restrictions across departments often introduce additional layers of complexity to the assessment phase, making it crucial to establish clear communication channels and data collection protocols from the project's outset.Implementing AWS TransformTo overcome the challenges of the traditional approach, AWS Transform provides a framework that combines assessment methodologies, proven migration strategies, and business case development capabilities to accelerate the proposal process. It delivers data-driven strategic plans that reflect a thorough understanding of the client's environment and outline practical pathways to effective cloud adoption.AWS Transform helps in:AI-Powered Analysis for InfrastructureI used AWS Transform to generate migration assessments from data collected from on-premises environments. It also helped in estimating the cost of running on-premises servers on AWS.To automate assessment, we create an AWS Transform job and upload the on-premises server data using an AWS MPA .xlsx or RVTools export file (max 10MB, up to 30,000 servers per assessment). We then specify either a region or a multi-region deployment for the assessment.Data Analysis and RefinementIt is easy to interact with AWS Transform Assessment using natural language to provide detailed workload characteristics. The process takes a detailed look at multi-availability zone strategies, multi-region architecture options, disaster recovery needs, and compliance requirements, resulting in migration proposals that address resilience needs with technically sound and cost-effective solutions. In our case, AWS Transform Assessment supported the proposal through rightsizing, cost modeling, and migration strategies for each workload. Recommendations provided by AWS Transform include:EC2 Instance Mapping AWS Transform Assessment uses AI algorithms to analyze current server configurations and usage patterns, automatically identifying optimal EC2 instance types based on workload characteristics. It maps current on-premises servers to right-sized EC2 instances using performance metrics rather than just specifications, and provides recommendations for specialized instance types such as compute-optimized, memory-optimized, etc.Storage Optimization The solution analyzes current storage utilization, access patterns, and performance requirements and recommends appropriate storage services – such as EBS, S3, EFS, and FSx – based on workload requirements. It also identifies opportunities for storage tiering to optimize costs and calculates potential savings from migrating to AWS storage solutions with specific configurations.License Mobility Assessment The solution identifies existing software licenses that can be transferred to AWS and analyzes BYOL (Bring Your Own License) opportunities versus AWS license-included options. This provides a cost comparison between license mobility scenarios and new licensing models, along with recommendations for optimizing licenses during migration.Reserved Instance Opportunities AWS Transform uses AI-driven forecasting to identify stable workload components suitable for Reserved Instances (RI) and analyzes potential savings across different RI term lengths and payment options. It also identifies workload patterns that could benefit from savings plans and provides optimization recommendations for commitment-based discount opportunities.Developing a Data-Driven Business CaseAWS Transform helped us create the business case within days by automating 80% of the calculations. The business case is based on the current on-premises environment understanding and provides accurate TCO projections. It also identified license mobility opportunities, RI benefits, and storage optimization options. The assessment revealed immediate cost-saving opportunities that meet performance requirements.The assessment further strengthens the business case through customized RI strategies that analyze workload stability patterns and usage consistency to recommend optimal commitment terms, payment options, and coverage levels that balance upfront investments against long-term savings. All these elements helped in creating a compelling financial narrative that demonstrates both immediate cost reductions and long-term optimization opportunities. The AWS Migration business case created by AWS Transform presents decision-makers with a clear, data-validated economic rationale for cloud migration that addresses both capital expense reduction and operational cost optimization. Automated generation of business cases with both hard and soft benefits helps us in making the migration proposal more persuasive.Table 1 presents a three-year TCO comparison for different AWS pricing and tenancy options, showing total costs, potential savings versus on-demand pricing, and annual costs. This analysis helps stakeholders understand the financial impact of various licensing and instance strategies, providing comparisons for cost-optimization opportunities.3-Year TCO ComparisonPricing Option3-Year Total CostSavings vs. On -DemandAnnual CostOn-Demand (Shared Tenancy)$730,513.02-$243,504.341-Year NURI (Shared Tenancy)$567,361.5622.33%$189,120.521-Year NURI (Mixed Tenancy, Windows Server pre-2022 BYOL)$1,080,290.91-47.88%$360,096.973-Year NURI (Shared Tenancy)$478,262.0434.53%$159,420.68Key Benefits in the RFP Process Observed by TechMUsing AWS Transform reduces proposal development time by approximately 60%.Reduced analysis time from 2-3 weeks to 5 days with 80% automated technical calculations.Eliminated manual calculation errors and generated consistent recommendations across workloads.Provided data-driven recommendations for rightsizing, BYOL, and optimal RI mix.Provided more consistent quality across proposals.Freed up team members for proposal narrative development and support more clients simultaneously.ConclusionAWS Transform accelerated TechM responses to client migration proposals, enabling teams to deliver higher-quality recommendations in less time. The AWS Transform Assessment delivers comprehensive business case justification through detailed cost breakdowns that illuminate financial implications across multiple dimensions. AWS Transform’s tools help avoid over-provisioning by recommending appropriate instance types based on actual workload requirements. By leveraging AWS Transform, TechM solutions architects now help clients realize the full benefits of the cloud, including increased scalability, cost savings, and enhanced security.

The technology sector has regularly experienced mergers, acquisitions, and divestitures. These shifts can often alter IT strategies and infrastructure budgets. For instance, Broadcom's 2023 acquisition of VMware had a significant impact on virtualization and cloud computing, prompting organizations to reassess and adjust their IT plans and vendor partnerships.Understanding the Impact: What’s Changed?The VMware Broadcom acquisition and subsequent changes have affected various aspects of IT Operations and Budgets. I have summarized the impact into six key themes.Subscription-Based Licensing: VMware has changed its licensing approach from a one-time purchase to a subscription model. This shift has led to increased license costs for many customers, sometimes up to 7 times their current budget. Simplification of Product Portfolio: VMware has unified its extensive and modular product suite into four comprehensive bundles: VMware Cloud Foundation (VCF), VMware vSphere Foundation (VVF), VMware Standard, and VMware Essentials. This approach limits the flexibility to acquire only the necessary standalone products, potentially leading customers to purchase more costly bundles with features that may be unnecessary.Socket-driven Licensing Metric: VMware has transitioned from socket-based licensing to a core-based licensing model, mandating a minimum purchase of 72 cores per product instance, effective April 2025, an increase from the previous minimum of 32 cores. This transition has notably increased costs for small and medium-sized businesses (SMBs) and customers operating high-core-density workloads.Service Provider Impact: Broadcom has restructured VMware’s channel ecosystem, resulting in the exclusion of several existing partners. Consequently, customers are put in a position where they must explore new alternative partnerships and establish new agreements with the remaining limited partners.Support to Enterprises: There are concerns about declining support quality and increased resolution times, reportedly due to internal organizational restructuring within VMware, which is affecting business uptime for VMware workloads.Sunsetting of Products: Several products, such as VMware Horizon, VMware on Cloud (AWS/Azure), and VxRail, are being divested, de-emphasized, or phased out. This has created urgency for organizations to revisit their hybrid cloud strategies and evaluate alternative solutions to reduce vendor lock-ins.Assessment: Evaluating Financial and Technical ImplicationsOrganizations need to understand the current VMware dependencies and assess the financial and technical impact of these changes.Figure 1: Assessment AreasPlan Roadmap: Strategizing for Reduced VMware DependencyOrganizations must build a balanced roadmap to address risk, flexibility, and reduce cost impact.Evaluate Virtualization Alternatives: Consider enterprise-grade virtualization options, such as Azure Stack, AWS Outpost, Nutanix AHV, Red Hat KVM, and OpenStack. Conduct proof-of-concepts (POC) or Pilot programs at an early stage to assess feasibility.Adopt a Hybrid Cloud Strategy: Migrate workloads to hyperscalers using an IaaS/PaaS model, such as AWS, Azure, GCP, or OCI.Adopt Containerization: Transition suitable workloads to containerized environments using platforms such as Kubernetes.Upskill Teams: Invest in training programs to equip IT teams with the necessary skills to effectively manage new platforms and technologies.Leverage Strategic Partners: Engage IT service providers with experience in these programs for assessment and migration.How are We Engaging with Our Customers?At Tech Mahindra, we are helping our customers navigate the following challenges. Our VMware Exit and Hybrid Cloud Advisory Services include:Assessment of current VMware usage, licensing, and spendIdentifying and benchmarking viable alternativesDefining a phased migration roadmap aligned with business prioritiesSetting up CoE and governance to manage the change.Our goal is to reduce technical risk, reduce/minimize the cost impact, and provide flexibility without disrupting your business operations.The acquisition has forced enterprises to revisit their IT roadmap and hybrid cloud strategy. By planning and evaluating alternatives, organizations can navigate this transition effectively, ensuring higher resilience and reduced budget impact in their IT operations.

Our Promise

Featured Report

Featured Press Release

Featured White Paper

Featured Event

Featured Case Study