ETL Migration Assistant

Intelligent automation for migrating legacy ETL workflows to AWS Glue

The ETL Migration Assistant is an AI-powered tool that automates the migration of legacy ETL workflows to AWS Glue. It supports analyzing and converting ETL jobs from various platforms including Informatica PowerCenter, IBM DataStage, SQL Server Integration Services (SSIS), and Talend.

Key Features

Intelligent ETL Analysis: Automatically parse and analyze ETL code from multiple platforms
AWS Glue Code Generation: Convert legacy ETL jobs to optimized AWS Glue PySpark code
Data Lineage & Dependencies: Extract and preserve data lineage and job dependencies
Validation & Optimization: Validate migrations and suggest performance optimizations
Security & Compliance: Generate appropriate IAM policies and security configurations

Quick Start

# Install the package
pip install strands-agents

from strands.tools.etl_migration import ETLMigrationAgent, ETLPlatform

# Initialize the migration agent
agent = ETLMigrationAgent()

# Analyze an Informatica PowerCenter workflow
with open('workflow.xml', 'r') as f:
    source_code = f.read()
    
analysis = agent.analyze_etl_code(source_code, ETLPlatform.INFORMATICA)

# Generate AWS Glue job
glue_job = agent.generate_glue_job(analysis)

# Validate the migration
validation = agent.validate_migration(analysis, glue_job)

# Get optimization suggestions
optimizations = agent.optimize_glue_job(glue_job, validation)

# Generate required IAM policies
iam_policies = agent.generate_iam_policies(analysis)

Installation

Ensure you have Python 3.10+ and AWS credentials configured, then:

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate

# Install the package
pip install strands-agents

AWS Credentials Setup

Before using the ETL Migration Assistant, configure your AWS credentials using one of these methods:

Environment Variables:

export AWS_ACCESS_KEY_ID='your_access_key'
export AWS_SECRET_ACCESS_KEY='your_secret_key'
export AWS_SESSION_TOKEN='your_session_token'  # If using temporary credentials
export AWS_DEFAULT_REGION='your_region'

AWS Credentials File: Create or edit ~/.aws/credentials:

[default]
aws_access_key_id = your_access_key
aws_secret_access_key = your_secret_key
aws_session_token = your_session_token
region = your_region

For detailed credential configuration and security best practices, see AWS Credentials Guide

Features in Detail

ETL Code Analysis

The system supports parsing and analyzing ETL code from multiple platforms:

from strands.tools.etl_migration import ETLMigrationAgent, ETLPlatform

agent = ETLMigrationAgent()

# Analyze different types of ETL jobs
informatica_analysis = agent.analyze_etl_code(informatica_code, ETLPlatform.INFORMATICA)
datastage_analysis = agent.analyze_etl_code(datastage_code, ETLPlatform.DATASTAGE)
ssis_analysis = agent.analyze_etl_code(ssis_code, ETLPlatform.SSIS)
talend_analysis = agent.analyze_etl_code(talend_code, ETLPlatform.TALEND)

AWS Glue Code Generation

Convert legacy ETL jobs to optimized AWS Glue PySpark code:

# Generate Glue job with custom configuration
glue_job = agent.generate_glue_job({
    "job": etl_job,
    "config": {
        "worker_type": "G.1X",
        "number_of_workers": 5,
        "timeout_minutes": 60
    }
})

Validation and Optimization

Ensure accurate migration and optimal performance:

# Validate with sample data
validation = agent.validate_migration(
    original_job=analysis,
    glue_job=glue_job,
    sample_data_path="s3://bucket/sample-data/"
)

# Get optimization suggestions
optimizations = agent.optimize_glue_job(glue_job, validation)

Security Configuration

Generate required IAM policies and security configurations:

# Generate IAM policies
iam_policies = agent.generate_iam_policies(analysis)

# The policies include:
# - Glue job execution role
# - Source data access
# - Target data access
# - CloudWatch logging permissions

Documentation

For detailed guidance & examples, see:

Contributing

We welcome contributions! See our Contributing Guide for details on:

Reporting bugs & features
Development setup
Contributing via Pull Requests
Code of Conduct

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Security

See CONTRIBUTING for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
.github		.github
docs		docs
src		src
tests		tests
tests_integ		tests_integ
.DS_Store		.DS_Store
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
STYLE_GUIDE.md		STYLE_GUIDE.md
devfile.yaml		devfile.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ETL Migration Assistant

Intelligent automation for migrating legacy ETL workflows to AWS Glue

Key Features

Quick Start

Installation

AWS Credentials Setup

Features in Detail

ETL Code Analysis

AWS Glue Code Generation

Validation and Optimization

Security Configuration

Documentation

Contributing

License

Security

About

Uh oh!

Releases

Packages

Languages

License

ChityShema/sdk-python

Folders and files

Latest commit

History

Repository files navigation

ETL Migration Assistant

Intelligent automation for migrating legacy ETL workflows to AWS Glue

Key Features

Quick Start

Installation

AWS Credentials Setup

Features in Detail

ETL Code Analysis

AWS Glue Code Generation

Validation and Optimization

Security Configuration

Documentation

Contributing

License

Security

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages