Principal Site Reliability Engineer

hace 7 días


Lima, Perú Groupon A tiempo completo

Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms uniquely committed to helping local businesses succeed on a performance basis. Groupon is on a radical journey to transform our business with relentless pursuit of results. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation, rewards risk-taking and celebrates success. The impact here can be immediate due to our scale and the speed of our transformation. We're a "best of both worlds" kind of company. We're big enough to have the resources and scale, but small enough that a single person has a surprising amount of autonomy and can make a meaningful impact. **Principal Site Reliability Engineer** **Role Overview**: Are you ready to take your expertise to the next level and make a meaningful impact on the reliability and scalability of mission-critical systems? As a Principal Site Reliability Engineer (SRE Level V/VI), you will play a central role in ensuring the performance, availability, and resilience of our platforms. In this position, you will go beyond maintaining systems by leading initiatives that redefine operational excellence. You will collaborate with diverse teams to implement cutting-edge technologies and best practices, foster a culture of reliability, and mentor others in their growth as engineers. This is an exceptional opportunity for someone passionate about solving complex challenges and shaping the future of platform reliability in a high-impact role. **Key Responsibilities**: - Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher. - Drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools. - Create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery. - Build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack. - Collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs. - Lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues. - Design and execute performance testing, capacity planning, and scalability strategies for evolving workloads. - Proactively identify and resolve bottlenecks, increasing system performance and developer efficiency. - Mentor junior engineers, fostering a collaborative and growth-oriented team environment. - Guide architectural decisions that drive innovation and enhance system reliability. **Qualifications**: - 10+ years in systems engineering, with at least 5+ years in SRE or DevOps roles. - Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker). - Proficiency in programming and scripting languages like Python, Go, and Bash. - Advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible. - Deep understanding of networking, DNS, load balancing, and security principles. - Proven track record of managing high-availability systems in demanding environments. - Exceptional analytical and problem-solving skills. **Preferred Qualifications**: - Certifications in cloud or container technologies (e.g., AWS/GCP/Azure, Kubernetes CKA). - Experience in industries like eCommerce, FinTech, or SaaS. - Familiarity with Agile development processes and frameworks. **What We Offer**: - The opportunity to work with cutting-edge technologies in a transformative environment. - A collaborative and innovative work culture that values your expertise and contributions. - Professional growth and leadership development pathways tailored to your aspirations. - A chance to leave a lasting impact by shaping the future of reliable and scalable systems. **Join us to push the boundaries of platform reliability and drive meaningful change in a fast-evolving digital world



  • Lima, Perú Careers at SunDevs A tiempo completo

    **Descripción del puesto**: Como Site Reliability Engineer en SunDevs, colaborarás con otros ingenieros de software senior y Platform Engineers para diseñar y desarrollar sistemas y plataformas en la nube altamente disponibles, escalables, seguras y mantenibles para resolver grandes desafíos. Brindarás asesoramiento y guía a nuestros ingenieros de...


  • Lima Metropolitan Area, Perú OpenLoop A tiempo completo

    OpenLoop is looking for a Senior Site Reliability Engineer to join our team in Lima, Peru.About the RoleCross-Functional CollaborationPartner with engineering teams to improve system reliability and deployment practices.Engage with teams on SRE guidelines and best practices for automation and infrastructure.Work with security teams to implement secure,...


  • Lima, Perú Canonical - Jobs A tiempo completo

    **Site Reliability Engineer**: To become a member of this team, you need to be a software engineer fluent in Python, you need a genuine interest in the full open source infrastructure stack from metal to containers, and you need the ability to work in a high pressure operations environment with mission-critical services for global brand name customers. As a...


  • Lima, Perú Scotiabank A tiempo completo

    Hola! Felicitamos y valoramos tu interés por seguir creciendo dentro del Grupo Scotiabank, nos encontramos en búsqueda de talento que aporte con sus conocimientos y experiência a la posición y sobre todo con OPTIMISMO. **Purpose**: As a member of the Global Systems Reliability team,the Global System Reliability Engineer (SRE) will work in collaboration...


  • Lima, Perú Willis Towers Watson A tiempo completo

    **The Role** We are a group of passionate engineers who have built the largest private Medicare marketplace in the United States. We focus on the continuous improvement of our systems and culture. We improve and maintain a platform that provides the best possible experience to shop for insurance plans, and allows our insurance carriers to be be confident...


  • Lima Metropolitan Area, Perú Nearsure A tiempo completo

    Explore the Nearsure experience Join our close-knit LATAM remote team:Connect through fun activities like coffee breaks, tech talks, and games with your team-mates and management. Say goodbye to micromanagementWe champion autonomy, open communication, and respect for diversity as our core values.Your well-being matters:Our People Care team is here from day...


  • Lima, Perú Hunt Consolidated, Inc. A tiempo completo

    **ROLES AND RESPONSIBILITIES**: - Monitoring and calculation of reliability KPI (RAM, MTBF, etc). - Analyze predictive alerts from machine learning software ( for Rotaing and Mechanical assets) - Identify threats and opportunities for Plant production and manage them in MTO (mitigate Threats and Opportunities) process. - Analyze data and perform reliability...


  • Lima, Perú Groupon A tiempo completo

    Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms...

  • Network Site Engineer

    hace 4 días


    Lima, Perú Tech Source Managed Services A tiempo completo

    **Role Description** This is a part-time on-site role for a Network Support Engineer located in Peru. The Network Support Engineer will be responsible for network administration, network engineering, technical support, troubleshooting, and network security. **Qualifications** - Network Administration and Network Engineering skills - Technical Support and...


  • Lima, Perú DIGITALHUB SAC A tiempo completo

    **DIGITALHUB** es una empresa peruana de outsourcing de **servicios de BPO y TI.** Nuestra visión es un futuro en el que cada persona pueda encontrar el mejor empleo y donde nuestros partners puedan descubrir lo mejor del talento latinoamericano. En esta oportunidad, nos encontramos buscando un **"Databricks Administrator and Site Reliability Engineer"**...