Home

Data Engineering 101

Zero to engineering with python, sql and data warehousing

Version control with git

Version control allows us to track changes, collaborate and roll back to stable versions of software. Git is a tool which we use to version our code.

Creating a repository

A repository is a place to keep your code, your project. This allows people to work together and collaborate through ‘branching’.

Create a new directory (folder) somewhere on your machine.

mkdir my-project

Note: Do not use any whitespace in the name.

Enter the new directory my-project

cd my-project

Create a git repository:

git init

Branching

We use ‘branches’ to take a ‘copy’ of the code, make changes and then push (send) it back to the origin (central repository, likely a service like GitHub, BitBucket etc). Here we can then create a pull request, where someone can review your code and if accepted, merge it to the master/main branch.

To begin with, we must create the main branch locally.

master vs main

Historically the long-lived, central, stable branch was called the master branch, going back to the use of master and slave in computing terms. As part of a language reform ,to retire the terms master and slave, in branching main is the popular choice.

To create a new branch:

git checkout -b main

Create the README

A README contains instruction of what your project does and how to use it. With the touch command, create the README:

touch README.md

Add some information on what the purpose of the project is, see the 101 README page for details.

After you’ve created the README.md file, stage it by adding it.

git add README.md

Attach a message to the commit to tell people what changes you’ve made.

git commit -m "First commit, woo!! Added the README"

Then push (send) it to the central repo to be added in:

git push origin main

From here, make your changes, add, commit and push them to the remote repo for review and merge. The difference on push is:

git push origin <your-branch-name>

After you have pushed your branch to the remote repo / origin, you can merge the code to the main branch.

Note: You should only push to main on your first commit and then use branching and merging. If you’re continually pushing to main, you’re likely doing something wrong or have broken working pattens.

Pulling changes from the remote repo to local repo

To connect our main branch with the origin main branch, we must set the upstream pointer, this tell our local branch where to find the remote branch:

git branch --set-upstream-to=origin/main main

To pull (or grab) the changes and updates, use:

git pull

Cloning

To take a local copy of a project, which is connected to the original remote repository, you can clone the repo.

Try with:

git clone git@github.com:python/devguide.git

This will create a new directory called devguide and will copy all of the files and git history to your machine.

If you succeeded, you’ve just copied all of the development guide for Python 3 and how to contribute to the base code to your machine!! Not that you’ll really need it so feel free to delete it with:

rm -r devguide

FAQ

How to connect a remote repo to a local git repo?

git remote add origin git push --all origin git push --all --set-upstream origin