🚀 Autonomous Data Engineering Squad

Executive Summary

This project demonstrates a virtual Data Engineering Department built using Microsoft AutoGen (AG2). It automates the end-to-end lifecycle of production data pipelines—from initial architecture and deterministic quality governance to multi-cloud infrastructure deployment (IaC).

By implementing a Self-Healing Feedback Loop, the system ensures every script meets strict enterprise standards for performance and cost-optimization before it reaches production.

🏗️ The Agentic Architecture

The squad utilizes a multi-layered approach to ensure code reliability and governance:

The Data Architect (Agent): A Senior Engineer persona specialized in generating high-scale PySpark logic.
The Local Quality Gate (Deterministic): A high-speed validation layer that enforces "Senior Standards" (Partitioning, explicit schemas, and date derivation) using Python regex to ensure 100% compliance.
The Cloud Architect (Agent): A DevOps specialist that translates approved code into Terraform (AWS Glue) and YAML (Azure Databricks).
The Admin (Orchestrator): Manages the workflow, handles agent hand-offs, and controls the self-correction logic.

🛠️ Key Technical Features

1. The Self-Healing Feedback Loop

Unlike standard code generators, this system audits its own work. If the Architect fails a quality check (e.g., missing partitioning on a 500GB+ dataset), the system triggers an automatic rewrite cycle. The specific failure logs are fed back to the agent until the code is 100% compliant.

2. Multi-Cloud Infrastructure as Code (IaC)

Once the PySpark logic is validated, the system automatically generates:

AWS Glue (Terraform): Optimized with G.2X workers and IAM trust policies.
Azure Databricks (YAML): Configured for Standard_DS3_v2 clusters and automated CI/CD triggers.

3. Resource & Cost Governance

Engineered for production efficiency under resource constraints:

Sliding Context Window: Limits conversation history to prevent "Token Snowballing."
Token Insurance: Hard limits on session tokens to ensure cost-effective operations within a daily budget.

🚀 Getting Started

1. Prerequisites

Python 3.10+
An API Key for a supported LLM Gateway (e.g., Euron/OpenAI)

2. Installation

git clone [https://github.com/YOUR_USERNAME/Autonomous-Data-Engineering-Squad.git](https://github.com/YOUR_USERNAME/Autonomous-Data-Engineering-Squad.git)
cd Autonomous-Data-Engineering-Squad
python -m venv venv
# Windows: venv\Scripts\activate | Mac/Linux: source venv/bin/activate
pip install -r requirements.txt

3. Setup

Create a .env file in the root directory:

EURI_API_KEY=your_actual_key_here

4. Run the Squad

python production.py

📊 Verification of Outputs

The system generates three primary artifacts in your project root:

Artifact	File Name	Description
Audit Log	`*_full_squad.txt`	A complete record of the "thinking," review, and correction process.
Production Code	`approved_script.py`	The final, validated, and partition-aware PySpark script.
Infra Config	`infra_config.txt`	Terraform and YAML deployment logic generated by the Cloud Architect.

🧠 Core Competencies Demonstrated

This project serves as a showcase of my ability to:

Orchestrate Multi-Agent Systems to solve complex engineering bottlenecks.
Enforce Data Governance (partitioning, schema-on-read, compute sizing) programmatically.
Bridge AI Frameworks with deterministic logic to eliminate hallucinations in production code.
Manage AI OpEx by optimizing token usage and context windows.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
samples		samples
.gitignore		.gitignore
Readme.md		Readme.md
production.py		production.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Autonomous Data Engineering Squad

Executive Summary

🏗️ The Agentic Architecture

🛠️ Key Technical Features

1. The Self-Healing Feedback Loop

2. Multi-Cloud Infrastructure as Code (IaC)

3. Resource & Cost Governance

🚀 Getting Started

1. Prerequisites

2. Installation

3. Setup

4. Run the Squad

📊 Verification of Outputs

🧠 Core Competencies Demonstrated

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 Autonomous Data Engineering Squad

Executive Summary

🏗️ The Agentic Architecture

🛠️ Key Technical Features

1. The Self-Healing Feedback Loop

2. Multi-Cloud Infrastructure as Code (IaC)

3. Resource & Cost Governance

🚀 Getting Started

1. Prerequisites

2. Installation

3. Setup

4. Run the Squad

📊 Verification of Outputs

🧠 Core Competencies Demonstrated

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages