Data Engineer (m/f/d) | Legal AI Tech Start-up | Full Remote

Global-Talent-Exchange

Germany

Full time

2 - 5 Yrs

Job Openings: 1

Required Skills:

Python

XML

ElasticSearch

Qdrant

Neo4j

Huggingface

Vision Transformers

NumPy

Pandas

Pydantic

Fastapi

Azure Openai

Pytorch

Docker

Otc Marketing

AWS

Apache Airflow

Atlassian Jira

Github

Github Actions

Confluence

Slack

Ms365

Python

XML

ElasticSearch

Qdrant

Neo4j

Amazon Neptune

HuggingFace

Transformers

NumPy

Pandas

Pydantic

FastAPI

OpenAI

PyTorch

Docker

OTC

AWS

Apache Airflow

Atlassian JIRA

GitHub

GitHub Actions

Confluence

Slack

MS365

Why Join Us?

Mission & Vision

As a Data Engineer (m/f/d), you support the development of our legal data and search infrastructure end to end. You will design, maintain and optimize robust ETL pipelines that clean and normalize XML-based legal data from multiple jurisdictions while developing scalable data models and metadata enrichment strategies to maximize searchability, semantic relevance and usability of legal information for downstream AI agents and products. A key part of your role will be leveraging generative AI where appropriate to enhance data processing and metadata generation, while continuously benchmarking and tuning database and search performance to ensure efficient, low-latency querying at scale. Working closely with product teams, AI researchers, and legal domain experts, you will deliver high-quality, reliable data solutions that unlock the value of complex, multilingual legal content.

Your Team

You will join our Data Team, working closely with a group of approximately 5 data experts. This highly collaborative team focuses on pushing the boundaries of generative AI, natural language processing, and privacy-preserving machine learning legal solutions.

Our Benefits

Working hours: Flexible working hours
Vacation: 26 days + December 24th & 31st off, +1 day for each year of service (up to a maximum of 30 days)
Remote: 100% remote work possible (given a EU residence), with the flexibility to use our offices in Berlin, Fribourg (Switzerland), Munich, Paris or Zagreb
Home Office Setup Budget: €1,000 paid with your first salary to create your ideal remote workspace
Equipment: Laptop (Lenovo or Mac)
Discounts: e.g. Urban Sports Club Membership

Your Responsibilities

Design, build and optimize end-to-end ETL pipelines for legal data from multiple jurisdictions including ingestion, validation, cleaning, transformation, chunking, embedding, and ingestion into vector databases.
Work extensively with XML-based legal data feeds: parse, validate, normalize, and transform complex XML structures into scalable internal schemas and unified document formats.
Develop and maintain data models and storage schemas that support continuously updated datasets while ensuring consistency, scalability, and accuracy across diverse datasets and large amounts of data.
Coordinate data handover and integration from multiple internal and external data providers, including official sources, APIs, and web scraping pipelines, ensuring reliable and timely updates.
Implement and continuously refine metadata enrichment strategies to maximize searchability, ranking quality, and relevance of legal information in vector databases.
Build and maintain high-performance search and retrieval infrastructure enabling agent-based systems to call search functions and retrieve the most relevant legal information efficiently.
Explore and integrate generative AI techniques to enhance data processing workflows such as structured field extraction, metadata generation and document normalization.
Experiment with different embedding and chunking strategies including evaluation.
Conduct database performance benchmarking and tuning to ensure efficient query execution and scalability.
Collaborate with product, AI, and legal domain experts to deliver high-quality, reliable data solutions.

Our Tech Stack

Programming Languages: Python
Data format: XML, parquet
Vector Search: ElasticSearch, Qdrant
Graph Databases: Neo4j, Amazon Neptune
Libraries: HuggingFace, Transformers, NumPy, Pandas, Pydantic, FastAPI, OpenAI & PyTorch
Deployment Tools: Docker
Cloud Infrastructure: OTC, AWS
Pipeline Orchestration: Apache Airflow
Ticket System: Atlassian JIRA
Repository: Github
CI/CD System: GitHub Actions
Documentation: Confluence
Communication: Slack
Office Application: MS365

Your Profile

Requirements:

Working Permission: EU
Residence: must be within the EU
Language: English proficiency at C2 level.
Experience: in AI development or data engineering with successfully deployed projects
Data: Expertise in data processing, filtering, and augmentation
Databases: Expertise in vector databases, data embedding, benchmarking and management
Programming: Strong Python skills and experience with AI pipelines

Optional:

Experience in deploying graph databases
RAG Systems: Experience in building up AI specific RAG pipelines
NLP & Generative AI: Familiarity with developing and deploying NLP, generative AI models
Familiarity with Kubernetes deployments
Legal background knowledge

About Company

Global-Talent-Exchange

https://globaltalex.com/

Discover high-impact roles Worldwide

10-20 Employees

Information Technology & Services

Send me jobs like this

This one's a match? We'll send more your way

Keep me posted!

Similar Jobs

Site Reliability Engineer (DevOps)

Celigo

Hyderabad, India

Full time

5 - 10 Years

Senior DevOps Engineer

Celigo

Hyderabad, India

Full time

5 - 10 Years

DevOps Architect

Celigo

Hyderabad, India

Full time

12 - 20 Years

Design Automation Engineer, Scribe Design Non-Array

Micron Technology

Hyderabad, India

Full time

8 - 20 Years

Staff DevOps Engineer

Celigo

Hyderabad, India

Full time

8 - 12 Years

Cloud Security engineer (Devops)

Celigo

Hyderabad, India

Full time

5 - 10 Years

K3S with J2ME developer

Cyient

Bangalore Urban, India

12 - 18 Years

SDX- IVI, SBC with Container, Qnx, Linux, Qt, Android

Cyient

Bangalore Urban, India

Full time

3 - 8 Years

Embedded CUDA

Cyient

Hyderabad, India

Full time

3 - 8 Years

Embedded Software Engineer

Cyient

Bangalore Urban, India

Full time

3 - 8 Years

Apply Now