Principal Site Reliability Engineer
hace 2 días
Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms uniquely committed to helping local businesses succeed on a performance basis. Groupon is on a radical journey to transform our business with relentless pursuit of results. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation, rewards risk-taking and celebrates success. The impact here can be immediate due to our scale and the speed of our transformation. We're a "best of both worlds" kind of company. We're big enough to have the resources and scale, but small enough that a single person has a surprising amount of autonomy and can make a meaningful impact. **Principal Site Reliability Engineer** **Role Overview**: Are you ready to take your expertise to the next level and make a meaningful impact on the reliability and scalability of mission-critical systems? As a Principal Site Reliability Engineer (SRE Level V/VI), you will play a central role in ensuring the performance, availability, and resilience of our platforms. In this position, you will go beyond maintaining systems by leading initiatives that redefine operational excellence. You will collaborate with diverse teams to implement cutting-edge technologies and best practices, foster a culture of reliability, and mentor others in their growth as engineers. This is an exceptional opportunity for someone passionate about solving complex challenges and shaping the future of platform reliability in a high-impact role. **Key Responsibilities**: - Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher. - Drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools. - Create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery. - Build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack. - Collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs. - Lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues. - Design and execute performance testing, capacity planning, and scalability strategies for evolving workloads. - Proactively identify and resolve bottlenecks, increasing system performance and developer efficiency. - Mentor junior engineers, fostering a collaborative and growth-oriented team environment. - Guide architectural decisions that drive innovation and enhance system reliability. **Qualifications**: - 10+ years in systems engineering, with at least 5+ years in SRE or DevOps roles. - Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker). - Proficiency in programming and scripting languages like Python, Go, and Bash. - Advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible. - Deep understanding of networking, DNS, load balancing, and security principles. - Proven track record of managing high-availability systems in demanding environments. - Exceptional analytical and problem-solving skills. **Preferred Qualifications**: - Certifications in cloud or container technologies (e.g., AWS/GCP/Azure, Kubernetes CKA). - Experience in industries like eCommerce, FinTech, or SaaS. - Familiarity with Agile development processes and frameworks. **What We Offer**: - The opportunity to work with cutting-edge technologies in a transformative environment. - A collaborative and innovative work culture that values your expertise and contributions. - Professional growth and leadership development pathways tailored to your aspirations. - A chance to leave a lasting impact by shaping the future of reliable and scalable systems. **Join us to push the boundaries of platform reliability and drive meaningful change in a fast-evolving digital world
-
Site Reliability Engineer
hace 18 horas
Lima, Perú Rappi A tiempo completoIt is time for you to join us to show the world that we are the company that is coming to change paradigms, where we revolutionize hours, minutes and seconds. Because in Rappi WE SEE OPPORTUNITIES where others see problems. WE SEE CLOSENESS where others see distance. WE SEE ADRENALINE where others see pressure. Join a team where we are all capable of...
-
Senior Site Reliability
hace 2 semanas
Lima Metropolitana, Perú Canonical A tiempo completoSenior Site Reliability / Gitops Engineer Join to apply for the Senior Site Reliability / Gitops Engineer role at Canonical. Canonical is a leading provider of open-source software and operating systems to global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science,...
-
Site Reliability Engineer
hace 18 horas
Lima, Perú WTW A tiempo completoWe have spent many years growing and fostering a DevOps culture by bridging the divide between our Software and Infrastructure Engineering departments. We want the cross-functional teams that we are building to include Site Reliability Engineers. We operate in a complex, multi-tenant, hybrid cloud and on-premises infrastructure that spans both the Windows...
-
Site Reliability Engineer
hace 2 semanas
Lima, Perú Willis Towers Watson A tiempo completo**Overview** We have spent many years growing and fostering a DevOps culture by bridging the divide between our Software and Infrastructure Engineering departments. Our cross-functional teams include Site Reliability Engineers to help us build, maintain and monitor a complex, multi-tenant, hybrid cloud and on-premises infrastructure that spans both Windows...
-
Senior Site Reliability Engineer
hace 15 horas
Lima Metropolitana, Perú OpenLoop A tiempo completoJoin to apply for the Senior Site Reliability Engineer role at OpenLoop - Partner with engineering teams to improve system reliability and deployment practices - Engage with teams on SRE guidelines and best practices about automation and infrastructure - Work with security teams to implement secure, compliant infrastructure - Operational Excellence - Ensure...
-
Site Reliability Engineer
hace 6 días
Lima Metropolitana, Perú FullStack A tiempo completoSite Reliability Engineer - Remote - Latin America 1 week ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. About FullStack FullStack is the most transparent IT talent network, connecting highly skilled individuals with top global companies and Silicon Valley startups for remote, on-demand projects. We focus...
-
Senior Site Reliability Engineer, Americas
hace 1 semana
Lima, Perú Canonical - Jobs A tiempo completo**Site Reliability Engineer**: To become a member of this team, you need to be a software engineer fluent in Python, you need a genuine interest in the full open source infrastructure stack from metal to containers, and you need the ability to work in a high pressure operations environment with mission-critical services for global brand name customers. As a...
-
Site Reliability Engineer
hace 3 días
Lima, Perú OpenLoop A tiempo completoAbout the RoleAbout the Role:Cross-Functional CollaborationPartner with engineering teams to improve system reliability and deployment practicesEngage with Openloop teams on SRE guidelines and best practices about automation and infrastructureWork with security teams to implement secure, compliant infrastructureOperational ExcellenceEnsure 24/7 system...
-
Site Reliability Engineer
hace 6 días
Lima Metropolitan Area, Perú Nearsure A tiempo completoExplore the Nearsure experience Join our close-knit LATAM remote team:Connect through fun activities like coffee breaks, tech talks, and games with your team-mates and management. Say goodbye to micromanagementWe champion autonomy, open communication, and respect for diversity as our core values.Your well-being matters:Our People Care team is here from day...
-
Senior Site Reliability Engineer
hace 1 semana
Lima Metropolitana, Perú Canonical A tiempo completo2 days ago Be among the first 25 applicants Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our customers include the world's...