AVP, Senior SRE Engineer, Group Consumer Banking and Big Data Analytics Technology, Technology & Operations
DBS Bank Limited
Location: Singapore, Singapore
Type: Full Time
Internal Number: 12036829
Business Function At DBS, we see ourselves as a start up, leveraging start up thinking while relying on the latest innovation to design and develop technology solutions for our customers and people. With a strong culture of innovation, experimenting with new technology and collaboration with the FinTech community, we aim to simplify payment so we can help others Live More, Bank Less. With such ambition, we invented DBS PayLah!, one of our digital Lifestyle apps.DBS PayLah!, which is more than just Singapore's favorite payments app. It's your everyday app for booking a ride, ordering lunch, scoring seats to a show and finding all your favorite DBS/POSB Cards rewards and deals. You can even track and redeem your DBS/POSB Cards points and enjoy personalized rewards, all on PayLah! Anyone of any age or from any bank can enjoy the convenience of PayLah! Students under 16 can now also register for their first digital wallet with parental consent.There's an ever-growing list of partners and over 180,000 acceptance points like hawker centers, retail outlets sports centers, and restaurants. Use PayLah! to discover a world of DBS/POSB Cards rewards and exclusive deals on food, shopping, transport, movies and more! In our payment digital transformation journey ahead, all of DBS Lifestyle apps are adopting common services, platforms, architectural principles, and design patterns. DBS PayLah! tech team intend to build loosely coupled but tightly aligned components that are built expecting to be reused while anticipating change. The common set of parameters and tools for software development provides a consistent approach to security, maintainability and reliability. In addition, architectural agility has a causal relationship with potential strategic and operational benefits.
Responsibilities Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.This position is for a Site Reliability Engineer responsible for the development and implementation of processes necessary to improve application / system reliability along with operational support. The position would comprise of approximately equal focus on both software development and operation disciplines.This position will also develop software to automate operational processes along with coding for the shared engineering backlog deliverables.
Establish SLI, SLO for enterprise applications, calculate error budgets, MTTD, and MTTR. Educate and implement observability culture in Dev community and assist them identifying golden signals
Responsible for the availability, performance, change management, monitoring, and capacity management of their services.
Incident manage, troubleshoot business critical incidents, conduct post-postmortems and ensure permanent closure of the incidents.
Analyze patterns of production incidents, develop permanent remediation plans, and implement automation to prevent future incidents from occurring through software engineering
Implement and integrate micro service application with monitoring/logging tools like ELK, Grafana, AppDynamics, Alog and etc.
Engage with both the development and support teams throughout the life cycle to help build for reliability. Close working collaboration with them to maintain and improve the service against established Service Level Objectives by applying software engineering principles.
Contribute to design and architecture towards a highly resilient open source stack based micro service application. Enhance, optimize and migrate to new solutions if required.
Manage the efforts to split between manual operational work and engineering work.
Work with partner organizations and vendors to provide solutions to current business issues.
Participate in a shift model covering 24x7x365 support.
At least a Degree in Computing / Computer Science / Engineering from a reputed University
Minimum 10 years experience in IT/software development on open source software stack, and 4 years experience on SRE role, with a good track record in a leadership role with a culture of collaboration and teamwork
Technical Lead experience in all aspects of technology like business applications, middle-ware, database technology, best practices, quality improvements and productivity improvements
Experience in designing and architecting a highly resilient open-source stack based micro service application in PCF or public cloud, AWS certification is preferred
Working experience in production support and improvement, incident management, and automation is a must.
Experience in identifying golden signals, defining SLI, SLO for enterprise applications, calculate error budgets, MTTD, and MTTR.
Experience with CI/CD pipelines and tool sets like bitbucket, Jenkins, SonarQube, JIRA, Nexus, etc.; and blue/green, feature toggling, ACL, and other deployment methods to mitigate change risk and address special needs.
Strong Problem-Solving skills and ability to solve unstructured problem and challenge status quo
Must be comfortable working in an extremely fast paced environment, with an ability to priorities accordingly to meet deadlines
Strong communication and interpersonal skills. Self-driven, committed, and reliable team player. Ability to contribute to discussions on design and strategy.
Apply Now We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.