Senior Data Engineer

Global-Talent-Exchange

United Arab Emirates
Full time
5 - NA Yrs
- INR
span 1

Required Skills:

Python

Pandas

Pyspark

Scrap

Beautifulsoup

Selenium

sql

Google Cloud Platform

Aws

Azure

Scikit-Learn

Google TensorFlow

Pytorch

Apache Airflow

Perfect Attendance

Dagster

Python

Pandas

PySpark

Scrapy

Beautiful Soup

Selenium

SQL

Google Cloud Platform

AWS

Azure

Scikit-learn

TensorFlow

PyTorch

Apache Airflow

Prefect

Dagster

About the Role

We are seeking a versatile Data Engineer to build the foundational data systems that power our AI platform. In this role, you will be responsible for designing, constructing, and maintaining robust data pipelines that ingest, process, and organize massive volumes of structured and unstructured data from diverse sources. Your work will directly feed our predictive models and LLM-driven analytics, enabling us to generate unique insights into global property markets.

  • Design and Build Scalable Data Pipelines: Architect, develop, and manage reliable ETL/ELT processes to handle large-scale, real-time and batch data.
  • Lead Large-Scale Data Acquisition: Develop and maintain advanced scraping and data ingestion systems to collect vast amounts of public and proprietary real estate data from web sources, APIs, and databases.
  • Support ML & LLM Initiatives: Build and optimize data infrastructure to facilitate efficient data labeling, feature engineering, model training, and evaluation for our Machine Learning and Large Language Model projects.
  • Ensure Data Quality and Reliability: Implement processes for data validation, cleansing, and monitoring to ensure the integrity and availability of our data assets.
  • Collaborate with AI/ML Engineers: Work closely with data scientists and ML engineers to understand data requirements, provide clean, structured datasets, and operationalize data-driven features.
  • Own Data Infrastructure: Manage and optimize data storage solutions (data warehouses, data lakes) and processing frameworks for performance and cost-effectiveness.

Requirements

  • Proven experience (5+ years) as a Data Engineer or in a similar role, with a strong portfolio of building and maintaining data pipelines.
  • Expertise in Python and core data libraries (e.g., Pandas, PySpark).
  • Hands-on experience with large-scale web scraping frameworks (e.g., Scrapy, Beautiful Soup, Selenium/Playwright) and managing associated challenges (e.g., anti-bot measures, rate limiting).
  • Solid understanding of data modeling, data warehousing concepts, and SQL. Experience with cloud data platforms (Google Cloud Platform - BigQuery, Dataflow; AWS - Redshift, Glue; or Azure equivalents).
  • Demonstrable experience in supporting ML projects: building training datasets, feature stores, and working with ML frameworks (e.g., Scikit-learn, TensorFlow, PyTorch).
  • Familiarity with the full lifecycle of LLM projects, including data collection for pre-training, fine-tuning, and RAG (Retrieval-Augmented Generation) pipeline construction.
  • Experience with workflow orchestration tools (e.g., Apache Airflow, Prefect, Dagster).

Bonus Points (Nice-to-Have)

  • Direct experience in PropTech, FinTech, or a data-intensive real estate/ financial domain.
  • Experience with vector databases (e.g., Pinecone, Weaviate, Chroma) and implementing RAG systems.
  • Knowledge of MLOps principles and tools for model deployment and monitoring.
  • Experience with real-time data processing (e.g., Apache Kafka, Apache Flink).

About Company

Global-Talent-Exchange
https://globaltalex.com/
Discover high-impact roles Worldwide
10-20 Employees
Information Technology & Services