Staff/Senior Data Engineer - Afternoon Shift
10Pearls
- Lahore, Punjab
- Permanent
- Full-time
- Extract, clean, and prepare data from relational databases for input into models.
- Build, deploy, and maintain deep learning models using Amazon Sagemaker that can train and execute personalization models, as well as integrate with Inspire products.
- Build strong working relationships with the engineering team and management to understand project needs and deliverables.
- Identify new data sources and assist with the development of new tables and data pipelines in the data warehouse.
- Follow the latest developments in machine learning, recommender systems, NLP, and data science research to inform the management of new opportunities
- Unity Catalog will be a major element of this project Utilize extensive experience in developing, optimizing, and maintaining applications using Apache Spark, applying expertise in Spark's core functionalities and libraries for effective big data processing.
- Hands-on experience with Amazon Elastic MapReduce (EMR), including the ability to set up, configure, and manage EMR clusters for the efficient execution of Spark applications.
- Migrating AWS Lambda functions to Databricks.Worked on Unity Catalog
- Identify and address performance bottlenecks in data processing pipelines to optimize overall system efficiency.
- Work closely with cross-functional teams, including data scientists, analysts, and other developers, to understand data requirements and implement effective solutions.
- Mentor and guide data and information engineers within the company.
- Participate in any other initiatives running under the umbrella of Engineering like training, talks, and estimates in the Data Engineering domain.
- Influence data engineering best practices within the team proficiency in Apache Spark, Pyspark and a strong understanding of cloud-based data processing tools such as EMR and AWS.
- Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
- Hands-on experience with Amazon EMR and AWS Glue.
- Strong programming skills in languages such as Python, Scala, or Java, with a focus on PySpark for Apache Spark development.
- Knowledge of Unity Catalog and experience in migrating AWS Lambda functions to Databricks.
- Ability to identify and optimize performance bottlenecks in data processing pipelines.
- Experience in developing and maintaining data warehouses, including creating new tables and data pipelines.
- Version Control: Proficient in Git or other version control systems.
- Strong problem-solving and analytical skills.
- Excellent communication and collaboration skills.
- Ability to work effectively in a collaborative team environment.
- Databricks certification would be a plus