Senior Site Reliability Engineer

AI Financial Technology Limited is a FinTech start-up specializing in insurance products. We differentiate ourselves from typical insurance firms through the use of digital technology to deliver a unique consumer experience. We are looking to build a team to develop the technology platform.
We are an energetic and entrepreneurial team founded by members with vast start up/ Fortune Global 500/ BAT (Baidu/ Alibaba/ Tencent) experiences.

Successful Candidate:
You take pride in your work, with strong attention to details.
You are bright, creative and have entrepreneurial spirit.
You can work individually and thrive in a team environment.
You would like to step up and tackle bigger roles.
You have strong integrity.
You are eager to participate in the development of innovative insurance products and technologies.

Job Summary:
We are looking to hire a passionate, experienced Site Reliability Engineer and motivated technologist who possess a balance of technical depth and strong interpersonal skills to lead, build, and run fault-tolerant, distributed systems to support our business.

Job Description:
  • Implement and continuously improve system reliability, availability, scalability, latency/performance, and efficiency through monitoring, alerting, and automation.
  • Partner with engineering team to improve reliability and operational efficiency throughout the entire SDLC.
  • Partner with engineering team on change management using modern automation mechanism (esp. CI/CD) to enable progressive rollouts, speeding up problem detection as well as automate safe and quick rollback when problems occur.
  • Collaborate with the engineering team in system design, capacity planning, quality assurance and production readiness through operational readiness drills.
  • Solve problems relating to mission critical services and create automated preventive measures.
  • Conduct knowledge sharing, reviews & update documents in knowledge base.

  • Experience in the full SDLC of software.
  • A passion for automating problem solving and strategic thinking and a desire to own and execute.
  • Understanding in mobile app technologies and its ecosystem (Crash reporting, Usage analysis)
  • Experience in implementing CI/CD
  • Experience in infrastructure automation.
  • Advanced knowledge in monitoring solutions (e.g. the likes of ELK, Nagios), alerting, auto-log interpretation, and ticketing system management.
  • Experience in support/troubleshooting/operations.
  • Strong Linux/Unix background.
  • Knowledge of network protocols and their applications.
  • Good oral and written technical and interpersonal communication skills.

Nice to Have:
  • Experience using & managing caching and queuing technologies (Hazelcast, RabbitMQ).
  • Ability to perform reliability analysis using statistical methods.
  • Familiar with SCRUM.

Compensation is highly competitive with annual performance bonus of up to 6 months’ salary.
Stock option incentive provided after completion of 1 year of service.

Resume Dropbox: