The ETL Migration Assistant is an AI-powered tool that automates the migration of legacy ETL workflows to AWS Glue. It supports analyzing and converting ETL jobs from various platforms including Informatica PowerCenter, IBM DataStage, SQL Server Integration Services (SSIS), and Talend.
- Intelligent ETL Analysis: Automatically parse and analyze ETL code from multiple platforms
- AWS Glue Code Generation: Convert legacy ETL jobs to optimized AWS Glue PySpark code
- Data Lineage & Dependencies: Extract and preserve data lineage and job dependencies
- Validation & Optimization: Validate migrations and suggest performance optimizations
- Security & Compliance: Generate appropriate IAM policies and security configurations
# Install the package
pip install strands-agents
from strands.tools.etl_migration import ETLMigrationAgent, ETLPlatform
# Initialize the migration agent
agent = ETLMigrationAgent()
# Analyze an Informatica PowerCenter workflow
with open('workflow.xml', 'r') as f:
source_code = f.read()
analysis = agent.analyze_etl_code(source_code, ETLPlatform.INFORMATICA)
# Generate AWS Glue job
glue_job = agent.generate_glue_job(analysis)
# Validate the migration
validation = agent.validate_migration(analysis, glue_job)
# Get optimization suggestions
optimizations = agent.optimize_glue_job(glue_job, validation)
# Generate required IAM policies
iam_policies = agent.generate_iam_policies(analysis)
Ensure you have Python 3.10+ and AWS credentials configured, then:
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows use: .venv\Scripts\activate
# Install the package
pip install strands-agents
Before using the ETL Migration Assistant, configure your AWS credentials using one of these methods:
- Environment Variables:
export AWS_ACCESS_KEY_ID='your_access_key'
export AWS_SECRET_ACCESS_KEY='your_secret_key'
export AWS_SESSION_TOKEN='your_session_token' # If using temporary credentials
export AWS_DEFAULT_REGION='your_region'
- AWS Credentials File:
Create or edit
~/.aws/credentials
:
[default]
aws_access_key_id = your_access_key
aws_secret_access_key = your_secret_key
aws_session_token = your_session_token
region = your_region
For detailed credential configuration and security best practices, see AWS Credentials Guide
The system supports parsing and analyzing ETL code from multiple platforms:
from strands.tools.etl_migration import ETLMigrationAgent, ETLPlatform
agent = ETLMigrationAgent()
# Analyze different types of ETL jobs
informatica_analysis = agent.analyze_etl_code(informatica_code, ETLPlatform.INFORMATICA)
datastage_analysis = agent.analyze_etl_code(datastage_code, ETLPlatform.DATASTAGE)
ssis_analysis = agent.analyze_etl_code(ssis_code, ETLPlatform.SSIS)
talend_analysis = agent.analyze_etl_code(talend_code, ETLPlatform.TALEND)
Convert legacy ETL jobs to optimized AWS Glue PySpark code:
# Generate Glue job with custom configuration
glue_job = agent.generate_glue_job({
"job": etl_job,
"config": {
"worker_type": "G.1X",
"number_of_workers": 5,
"timeout_minutes": 60
}
})
Ensure accurate migration and optimal performance:
# Validate with sample data
validation = agent.validate_migration(
original_job=analysis,
glue_job=glue_job,
sample_data_path="s3://bucket/sample-data/"
)
# Get optimization suggestions
optimizations = agent.optimize_glue_job(glue_job, validation)
Generate required IAM policies and security configurations:
# Generate IAM policies
iam_policies = agent.generate_iam_policies(analysis)
# The policies include:
# - Glue job execution role
# - Source data access
# - Target data access
# - CloudWatch logging permissions
For detailed guidance & examples, see:
We welcome contributions! See our Contributing Guide for details on:
- Reporting bugs & features
- Development setup
- Contributing via Pull Requests
- Code of Conduct
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
See CONTRIBUTING for more information.