Apple logo

AIML- Software Development Engineer, Machine Learning Platform & Infrastructure

Apple
Full-time
On-site
Seattle, Washington, United States
AI and Machine Learning
Apple is a place where individual imaginations converge, committed to the principles that drive outstanding work! Every product we develop, service we create, and Apple Store experience we deliver is the result of us amplifying each other’s ideas. This is because we share a belief that we can create something extraordinary and share it with the world, positively impacting lives. The diversity of our people and their perspectives fuels the innovation that permeates everything we do. When we bring everyone together, we can achieve our best work. Here, you will not merely join a team; you will contribute something meaningful.

Description

The AI/ML Production Engineering team seeks an outstanding Software Development Engineer with expertise in Machine Learning (ML) infrastructure, services, and applications. This engineer will collaborate with global teams focused on projects for ML compute platforms and model lifecycles. The engineer will possess a proven track record as a technical leader, adept at identifying and interpreting problems from various fronts, including customer support and backend infrastructure in a multi-cloud environment. The engineer will be responsible for identifying and resolving scaling issues for various AIML-focused services. This individual will have the opportunity to design and build architectural solutions through cross-organizational collaboration, accurately addressing requirements, and solving complex problems. This role carries significant impact and is crucial to delivering the highest quality user experience that Apple’s internal and external customers expect and cherish. This role will encompass the entire lifecycle of ML platform reliability engineering. The engineer will address user queries, triage and mitigate issues, work to improve process, and automate manual tasks. They will address ML platform scalability and stability, assessing impact on development velocity and resource utilization. This role will undertake the following daily duties: * Ensure timely response to user queries and tickets * Evaluate current visibility for state and performance of the system * Define and monitor system key performance indicators * Design and implement operational tools * As a technical leader drive communication and results

Minimum Qualifications

  • Ability to analyze problems in depth, determine root causes, articulate ideas clearly, and propose solutions
  • Solid understanding of distributed system architecture, large-scale ML services and computational platform operations
  • Proficiency in coding with scripting and programming languages, including but not limited to Bash, Python, and Golang
  • Knowledge of ML training and production workflows
  • Ability to identify and define dependencies
  • 7+ years of experience in software development for compute infrastructure or its operational stack operating in an environment of hybrid/multi cloud platforms

Key Qualifications

Preferred Qualifications

  • Knowledge of ML, including LLMs
  • Knowledge of analytics methods and pipelines, capable of utilizing it for visualization of platform KPIs
  • Experience of designing and implementing systems to support ML applications
  • Proven experience in orchestrating large-scale services and job deployments utilizing Kubernetes and cloud services for complex projects.
  • Expertise in system observability

Education & Experience

Additional Requirements

Pay & Benefits

  • Apple is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.