
Senior Data Scientist
- Islamabad
- Permanent
- Full-time
- ML Data Pipeline Development: Create and uphold scalable, reliable, and secure data infrastructure essential for Machine Learning model training, validation, and inference in production settings.
- Feature Store Collaboration: Partner with Data Scientists to design, build, and maintain a centralized Feature Store, ensuring consistency, reusability, and low-latency access to features for model training and deployment.
- GenAI Data Handling: Develop and execute data ingestion and processing workflows for text and unstructured data, including methods for vectorisation and populating vector databases to support Retrieval-Augmented Generation (RAG) architectures.
- Core Data Engineering: Build and maintain robust ELT/ETL processes within Snowflake and AWS to integrate diverse, large-scale data sources, ensuring excellent data governance, quality, and performance for general business intelligence.
- MLOps Partnership: Collaborate closely with ML Engineers to operationalise data flows to and from MLOps platforms (e.g., SageMaker, MLflow), focusing on automating pipelines and managing data versioning.
- Query and Compute Efficiency: Optimize data storage and computing resources across the platform to speed up data science experiments and model training workloads, while managing costs effectively.
- Data Security and Compliance: Implement advanced security measures to safeguard sensitive data used in AI datasets, maintaining data integrity and confidentiality in line with governance standards.
- Data Engineering Expertise (5+ Years): Extensive experience in a core Data Engineering role, including 5+ years working with Snowflake and strong skills in SQL and Python for data manipulation and engineering.
- Data Science Enablement: Demonstrated experience building and optimizing data pipelines specifically for Machine Learning model training and deployment (e.g., feature engineering, time-series data handling, complex schema management).
- Cloud Computing: Solid practical experience with AWS services (S3, Lambda, EC2, ECS) for creating reliable, scalable cloud data solutions.
- Distributed Processing: Familiarity with distributed computing frameworks such as PySpark or Dask for large-scale data transformation.
- Data Security: Strong knowledge of RBAC security models and secure data connection protocols.
- Communication: Excellent communication and teamwork skills, with a proven history of effective collaboration with Data Science and Business teams.
- MLOps / MLOps Platforms: Hands-on experience integrating data systems with MLOps platforms (e.g., MLflow, Amazon SageMaker) for production model deployment.
- Generative AI Data: Experience with vector databases (e.g., Pinecone, ChromaDB) and familiarity with data preparation for LLMs and RAG architectures.
- Feature Store Expertise: Practical or theoretical knowledge of Feature Stores (e.g., Feast, Tecton) and their role in the ML lifecycle.
- Advanced Data Orchestration: Experience with modern orchestrators like Apache Airflow, Prefect, or Dagster for managing complex, data-science-focused workflows.
- Experience with Cortex AI or comparable cloud AI tools.
- Certifications in Snowflake, AWS Data Analytics, or related areas.
Posted On: 2025-08-27
Location: Islamabad, Pakistan