Sharan Chenna
Site Reliability Engineersharan@sre:~$ curl -s -O https://www.sharanch.dev/assets/Sharan_SRE_DevOps_Resume.pdf
sharan@sre:~$ cat summary.txt
Results-driven Site Reliability Engineer with 4 years of experience in managing large-scale distributed systems. Specializing in improving system reliability, implementing robust monitoring and alerting solutions, and automating operational tasks. Skilled in incident response, performance optimization, and fostering a culture of blameless postmortems.
sharan@sre:~$ ls -l work-experience
Cloud Operations Engineer | OCI Compute (Control & Data Plane)
Oracle (Feb 2024 - Sep 2025)
- > Hypervisor Fleet Management: Managed lifecycle for a fleet of 8,000+ KVM/libvirt hypervisors, utilizing automated health checks to ensure 99.9% availability for critical OCI compute services.
- > Infrastructure Automation: Automated provisioning for thousands of nodes using Terraform and Ansible, reducing manual configuration time by 40% and ensuring consistent state enforcement across global regions.
- > Observability & Monitoring: Engineered comprehensive Grafana dashboards using custom telemetry, reducing Mean Time to Detection (MTTD) for critical incidents by 25% through proactive alerting.
- > Incident Management: Led troubleshooting for high-severity incidents, improving SLA compliance to 99.9%+ by standardizing root cause analysis (RCA) and reducing recurring issues by 20%.
- > Process Improvement: Developed automated JIRA dashboards and runbooks, reducing on-call administrative toil by 30% and streamlining the incident tracking lifecycle.
- > CI/CD Pipeline Optimization: Implemented robust CI/CD pipelines via OCIbuild, increasing deployment frequency by 2x while maintaining a 0% failure rate in production during peak traffic windows.
- > Pre-Production Validation: Executed rigorous region-based testing strategies to validate new features, catching 15+ critical bugs per quarter before production promotion.
- > Security & Compliance: Enforced security best practices across infrastructure by implementing granular access controls, achieving 100% compliance with internal security audits and regulatory standards.
- > Collaboration & Knowledge Sharing: Worked closely with development, operations, and support teams to share insights, document best practices, and improve incident response processes, fostering a culture of reliability and continuous learning
Associate Engineer | Linux COE
CtrlS Datacenters (Apr 2021 - Dec 2023)
- > Multi-Cloud Infrastructure: Provisioned and managed a hybrid fleet of 2500+ servers (RHEL, CentOS, SUSE, AIX, Ubuntu) across on-premise and multi-cloud environments, maintaining 99.9% system availability.
- > Virtualization Management: Administered enterprise virtualization clusters using VMware, Hyper-V, and Nutanix, optimizing resource allocation for virtual instances and Network Attached Storage (NAS) to reduce hardware overhead by 15%.
- > High Availability & Web Serving: Configured SUSE HA clusters and high-traffic web servers (Apache, Nginx) with NIC Bonding and automated SSL renewal, ensuring zero downtime and robust network resilience.
- > System Automation: Developed advanced Bash and Python scripts to automate routine administrative tasks and user provisioning, reducing manual operational toil by 40%.
- > Identity & Access Management: Integrated Active Directory for centralized user management and automated sudo privilege auditing, ensuring strictly least-privilege access and enhancing audit readiness.
- > Security & Compliance: Led critical Patch Management cycles and server hardening initiatives, achieving 100% compliance with security frameworks and resolving high-severity vulnerabilities ahead of SLA deadlines.
- > Monitoring & Diagnostics: Architected a centralized Zabbix monitoring solution, automating agent deployment across the fleet to achieve 100% infrastructure visibility and reduce incident detection time by 30%
- > Database Administration: Deployed and optimized MySQL and MongoDB Master-Slave architectures, tuning queries to improve database performance and reliability for business-critical applications
- > Technical Account Management (TAM): Served as the Technical SPOC for key enterprise accounts, driving resolution for complex technical issues and improving Customer Satisfaction (CSAT) scores by 20% through proactive service management.
- > ITIL Process Management: Championed ITIL best practices for Incident, Change, and Problem management, ensuring 95%+ adherence to SLAs for all critical service requests
sharan@sre:~$ cat skills.txt
Infrastructure Automation
- > Terraform, Ansible, Chef
- > Docker, Kubernetes, OCI Build
- > AWS CLI, OCI, GitLab CI/CD
- > TeamCity, Jenkins (CI/CD pipelines)
Observability & Monitoring
- > Prometheus, Grafana, Alertmanager
- > OpenTelemetry, Blackbox Exporter
- > Zabbix, Node Exporter
- > Log analysis, RCA tooling
Scripting & Ops Engineering
- > Python, Bash, SaltStack
- > Disk usage analysis, batch ops
- > Custom CLI tooling for on-call
- > Incident coordination & postmortems
sharan@sre:~$ ls -l projects/
Monitoring Stack Deployment
Proof Of Concept project for Terraform for Infrastructure Provisioning, AWS Provider, Ansible Automation, Docker for portability, Grafana Dashboards, Prometheus for PromQL and Node Exporter for metrics
view →Jenkins CI/CD
Proof Of Concept project that uses Jenkins for CICD, This deploys a web app using flask and uses GitHub webhook payloads for triggering the Pipeline.
view →sharan@sre:~$ ls social
I'm currently open to new opportunities. Whether you have a question or just want to say hi, my inbox is always open.
Say Hello