Machine Learning Engineer

Rethink recruit

Full-time

On-site

Palo Alto, United States

About FEDML

FEDML is an AI research and development company with a focus on Generative AI Platform at Scale. FEDML provides the generative AI platform and foundation models to enable developers and enterprises to build and commercialize their own generative AI applications easily, scalably, and economically. Its flag product, FEDML Nexus AI, provides unique features in enterprise AI platforms, model deployment, model serving, AI agent APIs, launching training/Inference jobs on serverless/decentralized GPU cloud, experimental tracking for distributed training, federated learning, security, and privacy.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

FEDML’s HQ is located at The Lucky Building, 165 University Ave, Palo Alto, CA. Join us in shaping the future of technology.

Our Benefits

Floating holidays

Medical, Dental and Vision for employees and family members

Truly competitive salary and equity

401(k) (U.S. specific)

Machine Learning Engineer

Responsibilities

Solve complex problems at scale around model training, fine-tuning and inference

Play a key role in the end-to-end design and implementation of our product which is a platform for powering use cases across training and serving of generative AI models

Work closely with both ML researchers in the company and customers to identify key areas of development for our generative AI platform

Have strong end-to-end product ownership, translating product requirements into user interfaces and backend distributed system design as well as own the implementation of these designs

Design and build the core platform infrastructure that supports our customer-facing product features

Ensure the reliability, security, and scalability of the backend distributed systems that power all aspects of our product

Collaborate with product managers and cross-functional teams to drive technology-first initiatives that enable novel business strategies and product roadmaps

Facilitate our user community through documentation, talks, tutorials, and collaborations

Contribute to the broader AI community by publishing research, presenting at conferences, and actively participating in open-source projects.

We Look For

Hands on experience with the internals of deep learning frameworks (e.g. PyTorch, TensorFlow) and GenAI models (e.g. GPT, StableDiffusion, Mistral, etc.)

Experience with large scale, distributed training on GPUs

Strong sense of design and usability

Effective communication skills and the ability to articulate complex technical ideas to cross-disciplinary internal and external stakeholders

Prior history of contributing to or developing open source projects is a bonus but not a requirement

Apply now

Share this job

Twitter Facebook Linkedin Email