Skip to content

gkdevops/python-data-engineer

Repository files navigation

Python Data Engineer Learning Repository

Welcome to the Python Data Engineer learning repository! This repo contains a structured, practical set of Jupyter notebooks for learning core Python concepts, especially with a focus on data engineering. Each topic is covered with hands-on examples and explanations, and links are provided to the code for easy reference.

Note: This summary is based on the top-level files; for a full list of all tutorials and scripts, check the GitHub repository contents.


📚 Topics Covered

  • Overview: Introduction to Python, variables, data types, and basic operations.
  • Key Concepts:
    • Printing and string manipulation
    • Variable assignment and naming
    • Numeric, string, and boolean data types
    • Type conversion, built-in functions, and string methods
    • List basics and common list operations

  • Overview: Mastering conditional statements for decision making.
  • Key Concepts:
    • if, elif, else statements
    • Comparison and logical operators
    • Nested conditions and practical examples

  • Overview: Using loops to automate repetitive tasks.
  • Key Concepts:
    • for and while loops
    • Loop control (break, continue, pass)
    • Looping through lists, strings, and dictionaries

  • Overview: Writing reusable blocks of code with functions.
  • Key Concepts:
    • Defining and calling functions
    • Parameters, return values, and scope
    • Lambda functions and higher-order functions

  • Overview: Using operators to manipulate data.
  • Key Concepts:
    • Arithmetic, assignment, comparison, logical, bitwise, and membership operators
    • Precedence and associativity

  • Overview: Mastering data structures for efficient storage and retrieval.
  • Key Concepts:
    • Lists, tuples, sets, dictionaries
    • When and how to use each collection
    • Real-world data engineering examples using collections

  • Overview: Organizing and reusing code with modules and packages.
  • Key Concepts:
    • The difference between modules, packages, and libraries (with LEGO analogies)
    • Importing and using built-in and external libraries (e.g., Pandas, NumPy, Matplotlib, Requests, Scikit-learn)
    • Creating custom modules and packages

  • Overview: Working with randomness, generating random numbers and data for testing and simulations.
  • Key Concepts:
    • Using Python’s random module for numbers, choices, and shuffling
    • Generating random data for data engineering tasks
    • Introduction to the faker library for synthetic data creation
    • Practical examples: random sampling, data anonymization

  • Overview: CSV File handling and manipulation for data storage and retrieval.
  • Key Concepts:
    • Reading and writing text and CSV files
    • Using csv file with Pandas library
    • File and directory operations using os and shutil
    • Handling file paths and exceptions
    • Data extraction and ingestion from files

  • Overview: Managing JSON data formats for configuration and data exchange.
  • Key Concepts:
    • Reading and writing JSON files with Python’s json module
    • Parsing and serializing complex JSON structures
    • Real-world use cases: configuration files, API responses
    • Data transformation between JSON and Python objects

  • Overview: Code blocks and reusable scripts for modular data engineering workflows.
  • Key Concepts:
    • Encapsulating logic in code blocks (functions, scripts)
    • Organizing reusable code for ETL pipelines
    • Example templates for batch processing and automation

  • Overview: Logging and monitoring data engineering processes.
  • Key Concepts:
    • Using Python’s logging module for event tracking
    • Setting up log formats, levels, and handlers
    • Best practices for error handling and process monitoring
    • Writing logs to files and integrating with external tools

📎 How to Use This Repo

  1. Browse Notebooks: Start with the Jupyter notebooks in the main directory for a structured learning path.
  2. Explore Directories: Check out the additional folders for more scripts and data.
  3. Try the Code: Run the notebooks locally or in an online Jupyter environment.
  4. Contribute: Pull requests to add new topics or improve examples are welcome!

🔗 Explore More


About

Learn Python language for beginners in Data Analytics and Big Data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published