Staff Platform Engineer – Data

Overview

The SBSEG QuickData Platform Engineering team is enabling data-driven decisions at the highest levels of the company, and were using big data to personalize user experiences in our products. We are in the early stages of data migration to AWS and looking for Staff/Sr Enginner to join our QuickData Platform Team. We are looking for a strong Engineer with a passion for automation, ready for an opportunity to tackle the complex problems of scale which are unique to Intuit, while using your expertise in AWS, EMR, automation, big data and large-scale systems. We are looking for someone who can design, code and maintain high performance and available infrastructure.

Responsibilities

  • Automate various RTB jobs using scripts/tools to speed execution on the Vertica platform
  • Application Deployment Plan and implementation
  • Specifications for onboarding new offerings, including trouble shooting, patch processes, cross organizational incident management processes, security breach response plans, etc.
  • Configuration of monitoring agents at the software layer, and the development of meaningful alerts and the escalation procedures
  • Responses to monitoring alerts according to defined playbooks and procedures
  • Participation in Root Cause Analysis (RCA) processes
  • Implementation of business operations standards
  • Suggestions for process improvements and enhanced operational efficiencies
  • Implementation of monitoring agents
  • Incident management reports, including initial problem analysis, management status, resolution, and follow up defect reporting
  • Technical documentation on supported applications & operational tools
  • Management of application deployment processes
  • Implementation of improved operational processes
  • Real Time Application Dashboards showing overall health of the system
  • Code reviews of operational solutions
  • Review and development of performance and capacity plans (operational capacity and load requirements)
  • Specifications for onboarding new offerings, including trouble shooting, patch processes, cross organizational incident management processes, security breach response plans, etc.
  • Implementation plans for application disaster recovery, migration, roll-back plans, expansion, routine deployments, and system upgrades
  • Metrics reporting on applications performance, availability, reliability, etc.
  • Contributions to Operational Standards and Requirements
  • Develop run books for problem diagnosis, resolution and escalation.

Qualifications

  • B.S. or higher in Computer Science or equivalent knowledge and experience
  • Experience managing infrastructure in AWS.
  • Hands on experience with administering some of the following: HBase, Impala, Spark, EMR, Hive on Tez or Presto
  • Experience operating large scale Hadoop clusters running Cloudera distribution.
  • Operational mindset with ability to do Problem, SLA and Incident Management.
  • Troubleshoot issues and participate in 24×7 on-call support, ensuring the stability of the production environment.
  • Strong Linux/unix background and advanced scripting knowledge in Python/Perl/Go.
  • Hands on experience with administering HBase, Impala & Spark & EMR
  • Experience installing and managing Kafka is good to have.
  • Strong critical thinking ability to assess complex problems, analyze options, navigate diverse perspectives and develop optimal/acceptable solutions
  • Strong in Java, Python any of the programming language.
  • Ability to work independently with minimal supervision

To apply for this job please visit topspotjobs.com.