Job #: 1790
Title: Site Reliability Engineer – Nashville, TN
Cloud Site Reliability Engineer
Your role will require you to:
– ensure production systems run reliably at all times, that availability, performance and business process SLAs are met or exceeded
– manage Cloud services that span storage, security, networking, and compute cloud capabilities
– spend 50% of time on operational activities and 50 % on improvements that deliver engineering solutions that improve instrumentation, ease of deployment, service orchestration and other aspects of production support – reduce the burden of manual work involved as systems and user volumes scale.
– take responsibility for all aspects of application production support, deployment and monitoring and develop tools to support these activities
– support mission critical applications and associated platforms, ensuring the highest levels of availability, security, performance and stability are maintained at all times
– design and build tools and solutions with a strong bias towards automating as many aspects of support as possible to reduce or eliminate trivial support activities
– ensure newly deployed systems / services can be integrated into the existing monitoring and management tools so that the performance of the service and deviations are easily anticipated and instrumented
Cloud technologies bring a myriad of technical agility which are actively embracing. As such, you’ll be working as a Cloud Site Reliability Engineer in the newly formed Cloud Operations Support team in one of our global hubs. In partnership with your global colleagues, you’ll provide follow-the-sun Cloud reliability support to enhance our customer’s experiences. Microsoft is the primary technology used within this role.
Your experience and skills
– A strong expertise in Microsoft Azure. Your addition to our team is expected to “Raise the Bar” of our team’s Azure capabilities.
– significant development and operations / engineering experience with the ability to apply that knowledge in order to solve complex problems
– subject matter expertise of Azure Resource Manager, Monitor, Alerts, Security Centre, DevOps, Azure Policy, RBAC and application Source code such as Java/C++/C#
– a blend of skills, including sysadmin, security, automation and the ability to code with a strong knowledge of OS and Application Source Code, Expert Level ARM Templating , Container Fabrics, Networking , Alerting and Monitoring
– a complex understanding of each service across the full IT lifecycle and be ready to take requests for infrastructure services, applications and environments
– experience designing solutions (Monitoring / process orchestration / capacity management / deployment) that can scale and potentially be leveraged by other parts of the organization
– hands on experience working in both Agile and DevOps development methodologies
– Expect to demonstrate these capabilities through our selection process which will include technical tests, peer interviews and client interviews.
– confident in interacting with developers and deep diving into both Application and Infrastructure code
– willing to challenge the status quo and introduce new ideas that will remove or reduce manual effort in relation to operating large production systems at scale
– resolute, pragmatic, articulate and determined
– able to work to tight deadlines in high pressure environments
– a skilled communicator, able to explain complex technical issues and resolutions