Back to Jobs
AI/ML Engineer – Data Simulation & Synthetic Data Generation
Full-Time
Hyderabad
On-site
Apply Now
Overview
Key Responsibilities
- Research, evaluate, and benchmark generative and diffusion-based models (Stable Diffusion, Sora-like models, GANs, NeRFs) for simulation and synthetic data generation.
- Build pipelines to replicate images and videos across new environments, lighting conditions, scenes, poses, and object variations.
- Develop multimodal prompt-based simulation workflows including:
- Text → Image
- Image → Image
- Video → Video transformations
- Fine-tune models for domain-specific simulation tasks such as:
- Texture transfer
- Background replacement
- Camera simulation
- Noise injection
- Motion variation
- Create automated pipelines to scale image, video, audio, and text simulation across large datasets.
- Evaluate realism, fidelity, annotation consistency, and domain-adaptation effectiveness of generated synthetic data.
- Work closely with ML researchers to integrate synthetic data into training loops and improve downstream model performance.
- Collaborate with backend and data teams to design scalable storage, sampling, and dataset versioning strategies for simulation workflows.
- Develop metrics and QA processes for simulation quality, drift detection, and dataset reliability.
- Support training pipelines, experiment tracking, and dataset versioning as simulation infrastructure scales.
Preferred Experience
- Experience with multimodal generative models for image, video, and text-prompted generation.
- Familiarity with dataset versioning and experiment tracking tools such as DVC, Weights & Biases (W&B), or MLflow.
- Understanding of domain adaptation and synthetic-to-real generalization techniques.
Qualifications
- 3–6 years of experience in applied machine learning or generative AI.
- Strong Python programming skills with hands-on experience in PyTorch or TensorFlow.
- Practical experience working with generative models including diffusion models, GANs, video synthesis models, and NeRFs.
- Familiarity with data augmentation, image/video transformations, and synthetic data generation workflows.
- Experience building scalable pipelines using FastAPI, Airflow, or custom orchestration frameworks.
- Understanding of GPU-based training, inference optimization, and model performance tuning.
- Practical knowledge of Git, Docker, Linux, and cloud platforms such as AWS, GCP, or Azure.