Skip to content

ChityShema/sdk-python

 
 

Repository files navigation

ETL Migration Assistant

Intelligent automation for migrating legacy ETL workflows to AWS Glue

License Python versions

The ETL Migration Assistant is an AI-powered tool that automates the migration of legacy ETL workflows to AWS Glue. It supports analyzing and converting ETL jobs from various platforms including Informatica PowerCenter, IBM DataStage, SQL Server Integration Services (SSIS), and Talend.

Key Features

  • Intelligent ETL Analysis: Automatically parse and analyze ETL code from multiple platforms
  • AWS Glue Code Generation: Convert legacy ETL jobs to optimized AWS Glue PySpark code
  • Data Lineage & Dependencies: Extract and preserve data lineage and job dependencies
  • Validation & Optimization: Validate migrations and suggest performance optimizations
  • Security & Compliance: Generate appropriate IAM policies and security configurations

Quick Start

# Install the package
pip install strands-agents
from strands.tools.etl_migration import ETLMigrationAgent, ETLPlatform

# Initialize the migration agent
agent = ETLMigrationAgent()

# Analyze an Informatica PowerCenter workflow
with open('workflow.xml', 'r') as f:
    source_code = f.read()
    
analysis = agent.analyze_etl_code(source_code, ETLPlatform.INFORMATICA)

# Generate AWS Glue job
glue_job = agent.generate_glue_job(analysis)

# Validate the migration
validation = agent.validate_migration(analysis, glue_job)

# Get optimization suggestions
optimizations = agent.optimize_glue_job(glue_job, validation)

# Generate required IAM policies
iam_policies = agent.generate_iam_policies(analysis)

Installation

Ensure you have Python 3.10+ and AWS credentials configured, then:

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate

# Install the package
pip install strands-agents

AWS Credentials Setup

Before using the ETL Migration Assistant, configure your AWS credentials using one of these methods:

  1. Environment Variables:
export AWS_ACCESS_KEY_ID='your_access_key'
export AWS_SECRET_ACCESS_KEY='your_secret_key'
export AWS_SESSION_TOKEN='your_session_token'  # If using temporary credentials
export AWS_DEFAULT_REGION='your_region'
  1. AWS Credentials File: Create or edit ~/.aws/credentials:
[default]
aws_access_key_id = your_access_key
aws_secret_access_key = your_secret_key
aws_session_token = your_session_token
region = your_region

For detailed credential configuration and security best practices, see AWS Credentials Guide

Features in Detail

ETL Code Analysis

The system supports parsing and analyzing ETL code from multiple platforms:

from strands.tools.etl_migration import ETLMigrationAgent, ETLPlatform

agent = ETLMigrationAgent()

# Analyze different types of ETL jobs
informatica_analysis = agent.analyze_etl_code(informatica_code, ETLPlatform.INFORMATICA)
datastage_analysis = agent.analyze_etl_code(datastage_code, ETLPlatform.DATASTAGE)
ssis_analysis = agent.analyze_etl_code(ssis_code, ETLPlatform.SSIS)
talend_analysis = agent.analyze_etl_code(talend_code, ETLPlatform.TALEND)

AWS Glue Code Generation

Convert legacy ETL jobs to optimized AWS Glue PySpark code:

# Generate Glue job with custom configuration
glue_job = agent.generate_glue_job({
    "job": etl_job,
    "config": {
        "worker_type": "G.1X",
        "number_of_workers": 5,
        "timeout_minutes": 60
    }
})

Validation and Optimization

Ensure accurate migration and optimal performance:

# Validate with sample data
validation = agent.validate_migration(
    original_job=analysis,
    glue_job=glue_job,
    sample_data_path="s3://bucket/sample-data/"
)

# Get optimization suggestions
optimizations = agent.optimize_glue_job(glue_job, validation)

Security Configuration

Generate required IAM policies and security configurations:

# Generate IAM policies
iam_policies = agent.generate_iam_policies(analysis)

# The policies include:
# - Glue job execution role
# - Source data access
# - Target data access
# - CloudWatch logging permissions

Documentation

For detailed guidance & examples, see:

Contributing

We welcome contributions! See our Contributing Guide for details on:

  • Reporting bugs & features
  • Development setup
  • Contributing via Pull Requests
  • Code of Conduct

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Security

See CONTRIBUTING for more information.

About

A model-driven approach to building AI agents in just a few lines of code.

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.4%
  • Jupyter Notebook 0.6%