About FEDML
FEDML is an AI research and development company with a focus on Generative AI Platform at Scale. FEDML provides the generative AI platform and foundation models to enable developers and enterprises to build and commercialize their own generative AI applications easily, scalably, and economically. Its flag product, FEDML Nexus AI, provides unique features in enterprise AI platforms, model deployment, model serving, AI agent APIs, launching training/Inference jobs on serverless/decentralized GPU cloud, experimental tracking for distributed training, federated learning, security, and privacy.
We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.
FEDML’s HQ is located at The Lucky Building, 165 University Ave, Palo Alto, CA. Join us in shaping the future of technology.
Our Benefits
Floating holidays
Medical, Dental and Vision for employees and family members
Truly competitive salary and equity
401(k) (U.S. specific)
Responsibilities
Solve complex problems at scale around model training, fine-tuning and inference
Play a key role in the end-to-end design and implementation of our product which is a platform for powering use cases across training and serving of generative AI models
Work closely with both ML researchers in the company and customers to identify key areas of development for our generative AI platform
Have strong end-to-end product ownership, translating product requirements into user interfaces and backend distributed system design as well as own the implementation of these designs
Design and build the core platform infrastructure that supports our customer-facing product features
Ensure the reliability, security, and scalability of the backend distributed systems that power all aspects of our product
Collaborate with product managers and cross-functional teams to drive technology-first initiatives that enable novel business strategies and product roadmaps
Facilitate our user community through documentation, talks, tutorials, and collaborations
Contribute to the broader AI community by publishing research, presenting at conferences, and actively participating in open-source projects.
We Look For
Hands on experience with the internals of deep learning frameworks (e.g. PyTorch, TensorFlow) and GenAI models (e.g. GPT, StableDiffusion, Mistral, etc.)
Experience with large scale, distributed training on GPUs
Strong sense of design and usability
Effective communication skills and the ability to articulate complex technical ideas to cross-disciplinary internal and external stakeholders
Prior history of contributing to or developing open source projects is a bonus but not a requirement