Welcome to the Python Data Engineer learning repository! This repo contains a structured, practical set of Jupyter notebooks for learning core Python concepts, especially with a focus on data engineering. Each topic is covered with hands-on examples and explanations, and links are provided to the code for easy reference.
Note: This summary is based on the top-level files; for a full list of all tutorials and scripts, check the GitHub repository contents.
- Overview: Introduction to Python, variables, data types, and basic operations.
- Key Concepts:
- Printing and string manipulation
- Variable assignment and naming
- Numeric, string, and boolean data types
- Type conversion, built-in functions, and string methods
- List basics and common list operations
- Overview: Mastering conditional statements for decision making.
- Key Concepts:
if
,elif
,else
statements- Comparison and logical operators
- Nested conditions and practical examples
3. Python Loops
- Overview: Using loops to automate repetitive tasks.
- Key Concepts:
for
andwhile
loops- Loop control (
break
,continue
,pass
) - Looping through lists, strings, and dictionaries
- Overview: Writing reusable blocks of code with functions.
- Key Concepts:
- Defining and calling functions
- Parameters, return values, and scope
- Lambda functions and higher-order functions
- Overview: Using operators to manipulate data.
- Key Concepts:
- Arithmetic, assignment, comparison, logical, bitwise, and membership operators
- Precedence and associativity
- Overview: Mastering data structures for efficient storage and retrieval.
- Key Concepts:
- Lists, tuples, sets, dictionaries
- When and how to use each collection
- Real-world data engineering examples using collections
- Overview: Organizing and reusing code with modules and packages.
- Key Concepts:
- The difference between modules, packages, and libraries (with LEGO analogies)
- Importing and using built-in and external libraries (e.g., Pandas, NumPy, Matplotlib, Requests, Scikit-learn)
- Creating custom modules and packages
- Overview: Working with randomness, generating random numbers and data for testing and simulations.
- Key Concepts:
- Using Python’s
random
module for numbers, choices, and shuffling - Generating random data for data engineering tasks
- Introduction to the
faker
library for synthetic data creation - Practical examples: random sampling, data anonymization
- Using Python’s
- Overview: CSV File handling and manipulation for data storage and retrieval.
- Key Concepts:
- Reading and writing text and CSV files
- Using csv file with Pandas library
- File and directory operations using
os
andshutil
- Handling file paths and exceptions
- Data extraction and ingestion from files
10. JSON Directory
- Overview: Managing JSON data formats for configuration and data exchange.
- Key Concepts:
- Reading and writing JSON files with Python’s
json
module - Parsing and serializing complex JSON structures
- Real-world use cases: configuration files, API responses
- Data transformation between JSON and Python objects
- Reading and writing JSON files with Python’s
11. Blocks Directory
- Overview: Code blocks and reusable scripts for modular data engineering workflows.
- Key Concepts:
- Encapsulating logic in code blocks (functions, scripts)
- Organizing reusable code for ETL pipelines
- Example templates for batch processing and automation
- Overview: Logging and monitoring data engineering processes.
- Key Concepts:
- Using Python’s
logging
module for event tracking - Setting up log formats, levels, and handlers
- Best practices for error handling and process monitoring
- Writing logs to files and integrating with external tools
- Using Python’s
- Browse Notebooks: Start with the Jupyter notebooks in the main directory for a structured learning path.
- Explore Directories: Check out the additional folders for more scripts and data.
- Try the Code: Run the notebooks locally or in an online Jupyter environment.
- Contribute: Pull requests to add new topics or improve examples are welcome!