Back to all roles

Member of Engineering – Pre-training, Data Engineering

Remote-first Full-time Now hiring

Job Description:

  • Build and maintain high-performance pipelines for trillions of tokens.
  • Deliver diverse and high quality datasets for pre-training foundation models.
  • Closely work with other teams such as Pretraining, Posttraining, Evals and Product to to ensure alignment on the quality of the models delivered.

Requirements:

  • Strong background in building production-grade, distributed data systems for machine learning, with experience in:
  • Orchestration: Slurm, Airflow, or Dagster
  • Observability & Reliability: CI/CD, Grafana, Prometheus, etc.
  • Infra: Git, Docker, k8s, cloud managed services
  • Batched inference (ex: vLLM)
  • Performance obsession, especially with large-scale GPU clusters and distributed pipelines
  • Expert-level python knowledge and ability to write clean and maintainable code
  • Strong algorithmic foundations
  • Proficiency with libraries like Polars, Dask, or PySpark
  • Nice to have:
  • Experience in building trillion-scale SOTA pretraining datasets
  • Experience translating research to production at scale
  • Experience with OCR, web crawling, or evals
  • Prior experience pre-training LLMs

Benefits:

  • Fully remote work & flexible hours
  • 37 days/year of vacation & holidays
  • Health insurance allowance for you and dependents
  • Company-provided equipment
  • Wellbeing, always-be-learning and home office allowances
  • Frequent team get togethers
  • Great diverse & inclusive people-first culture

Apply To This Job

More remote roles

SQL Database Administrator - Advanced for Remote Work

Remote-first Full-time

Remote - SAP Oracle DBA $80/hr Srinivasa Kandi

Remote-first Full-time

Lead MSSQL Developer (DBA Engineer)

Remote-first Full-time

Prompt Engineer, Agent Prompts & Evals

Remote-first Full-time

Machine Learning Engineer - Computer Vision

Remote-first Full-time

Fresher - NLP Engineer

Remote-first Full-time

Backend Developer

Remote-first Full-time

Remote Full Stack Developer - AI-Enhanced & Cloud-Native

Remote-first Full-time

Php Developer – Wordpress

Remote-first Full-time

FullStack DotNet Developer - Remote / Telecommute

Remote-first Full-time

Fractional CMO Needed for B2B SaaS Growth Strategy

Remote-first Full-time

UX Content Analyst

Remote-first Full-time

Remote Human Resources Teacher

Remote-first Full-time

Data Entry Operator II – Google Site Maintenance & Digital Content Management Specialist (Remote Position with Travel)

Remote-first Full-time

Experienced Customer Service Representative – Wholesale Industry – Remote Optional

Remote-first Full-time

Installation Quality Analyst (100% Remote)

Remote-first Full-time

Immediate Hiring: Bilingual and Non-Bilingual Customer Service Representative I at arenaflex

Remote-first Full-time

Experienced Full Stack Data Entry Specialist – Remote Data Management for arenaflex

Remote-first Full-time

Experienced Chat Support Agent – Beginner Level, Work from Home Opportunity with arenaflex

Remote-first Full-time

#21626: Prior Authorization Specialist – Remote (Must Be a NYS Resident)

Remote-first Full-time