Home

Data Engineering 101

Zero to engineering with python, sql and data warehousing

This site is a structured, practical guide on learning enough Python, SQL and systems to get a first job in development or as a junior data engineering.

Though the focus is on data engineering, the essential skills taught will support a career in data science or data analysis - though you will need additional mathematics and statistical understanding which this course does not provide.

What this course is not:
A silver bullet, there is no substitute for practise and hours spent coding.

Contents

  1. Getting started
  2. UNIX & gitbash
  3. Navigating a file system
  4. git 101
  5. Setting up a project
  6. Python 101
  7. Collaborating in Software development
    • Agile Ways of Working
  8. Standards
    • Writing clean code
    • Linting & pre-commit
  9. git 102
  10. Shell 101
  11. Continuous Integration
    • yaml
    • GitHub Actions
  12. Python 102 - Data Handling With Python
    • Working with structured and unstructured data
    • Data ingestion with Python and Pandas
    • Data quality & cleaning
    • Joining data sets
    • Plotting with Python
    • Plotting maps with Python
    • Storing Data
    • Integration Testing
    • APIs
  13. Databases and Warehousing
    • What is a database?
    • Setting up a data warehouse with Snowflake
    • Getting data into Snowflake
    • Querying Data with SQL
    • Data Governance
    • Role Based Access Control
    • Accessing a Database with Python
  14. Connecting Python to Snowflake
    • Setting up the connection
    • Building a data warehouse using Python
    • Pushing data into Snowflake with Python
    • Query data
    • Getting data out of Snowflake using Python
  15. Consuming APIs
  16. ELT Pipelines
    • Using Python to retrieve data from an API
    • Data Lakes
    • Building the pipeline to Snowflake
    • Warehousing the data
    • Transforming the data
    • Automating a data export using the Gmail API
  17. Dashboarding and Data Presentation
  18. Moving to the Cloud
    • Amazon s3, Lambda and IAM
    • Architecting Systems
      • Event Driven Architecture
  19. Infrastructure as Code
  20. Containerisation with Docker
  21. Putting code into Production
    • Scheduling jobs
    • Continuous Deployment
    • Secrets and Configuration Files
    • Monitoring & Alerting
  22. Game day