A Guide to Git
For quite some time, git had been this nebulous, terrifying, thing for me. It was kind of like walking on a tight rope holding a bunch of fine china. Yeah I knew how to
git commit, and
git push. But if I had to do anything outside of that, I’d quickly lose my balance, drop the fine china, and git would inevitable break my project into unrecognizable pieces.
If you’re a developer, you probably know git pretty well. But git is now becoming an inescapable skill for anyone in a field involving programming and collaboration, especially data science.
So I finally bit the proverbial bullet and tried to understand the more holistic picture of git and what its commands were doing with my code. Fortunately, it turned out to be something that wasn’t as complicated as I had thought and by correcting my underlying mental model of it, I had more confidence and less anxiety when tackling new projects.
The four areas of git
There are four areas in git: stash, working area, index, and repository.
Your typical git workflow works from left to right, starting at the working area. When you make any changes to files in a git repository, these changes show in the working area. To view which files were changed along with new files you created, simply do a
git status. To show the more granular details about what exactly was changed in a file, just do a
git diff [file name].
Once you’re satisfied with your changes, you then add the changed files from the working area to the index with a
git add command.
The index functions as a sort of staging area. It exists because you might be trying some things out and changing a lot of code in the working area but you don’t necessarily want all those changes to be in the repository area just yet.
So the idea is to just selectively add the changes you want to the index and it’s best practice for the changes you add to be some logical unit or collection of things that are related. For example, you might decide to add all the files with changes that relate to the new preprocessing function you wrote in Python.
Once you’ve added all relevant files to the index, finally move them to the repository by doing a
git commit -m 'Explanation of my changes'. You should see that there are no difference between the index and repository now - something you can verify with a
git diff --cached.
Mistakes were made
But life is not so ideal. Eventually you’ll make a commit with an inappropriate message that you want to change, or you’ll decide that all those new things you added totally broke everything and now you want to go back to how it was before. Ever wanted to include a coworker’s updates into the code you’re working on only to find there are file conflicts?
You might have seen some git commands like
revert. If these commands scare you, you’re not alone. Some of them are powerful, and can destroy your project if you don’t know how to use them. But fret not, you’re about to learn how to use them. 😊
Up to this point, the commands you’re used to like
git add and
git commit have moved your code throughout the areas from left to right. We now explore commands that will allow you to move backwards in the git areas in order to undo changes you’ve made to files or revert back entirely to a previous commit.
Say you’re working on a file and realize that all your new changes won’t pan out. Instead of manually figuring out how to undo all your changes, you realize you can just go back to the version of the file you had in your last commit. So you execute
git reset --hard HEAD [file name].
Let’s unpack that command.
HEAD refers to the previous commit you had. It’s useful because otherwise you would have to find the commit hash instead (though you can find it easily enough with
reset as the controlled claw in one of those crane claw machines you might have had to the pleasure of using, or rather being used, as a kid to get that coveted stuffed animal.
reset allows you to pick the files you want out of a particular commit in the repository. You then get the option of which git areas you want to place those files into by using one of the following flag commands:
As you saw above
git reset --hard HEAD [file name] will take the
file name from the previous commit and apply it to both the index and the working area.
On the other hand,
git reset --mixed HEAD [file name] will take the
file name from the previous commit and apply it to just the index.
git reset --soft HEAD [file name] will take the
file name from the previous commit and apply it to just the repository.
Reverting to a previous commit
Let’s say you pushed something to your repository but it’s not working out. You can easily go back to a previous commit where your code was working by just doing a
git revert [commit hash]. Remember, you can find your commit hash by doing a
It’s as easy as that, but there’s one thing to note. Revert doesn’t wipe any of your history up to the commit you choose. Rather, it simply creates a new commit using the exact state of the previous commit you choose to revert back to. It’s nice, because it doesn’t mess up any of your git history.
Editing your commit history
It doesn’t take long in your journey in git to come across some things you wish you could undo. For example, you commited a file to the local repository, but you messed up the commit message. Or say you made a number of really small commits that you’d rather just group together into one commit. This is where
git rebase shines.
Say you have a group of small commits.
You want to merge them together and then give the merged commit a new message. To do this, just do
git rebase --interactive origin/main.
Then go ahead and make the changes we want. In this case, we want to squash the last two commits into the first one we made.
After you do this, go ahead and save the file and close which will bring you to a new window to edit the final commit message.
We make a change to the final commit message.
And now you can see that our previous three commits have now been merged or squashed into one.
Renaming a remote branch
Say there’s a branch you want to rename and it’s already been pushed to your remote repository in GitHub or wherever. Assuming you’re the only one working on this branch (because changing the remote history can have some nasty effects), you can change your branch name by doing the following:
git checkout [your branch]
git branch -m [new branch name]
git push origin :[your branch] [new branch name]
git push origin -u [new branch name]
The typical git workflow
Let’s say you and your team are contributors to a repository in GitHub. You’ve got a
main branch as your default, primary branch. But you want to know how to collaborate with each other so that you’re not overwriting each others' work. The following is one such workflow I’ve used a lot in the past:
- First make sure the
mainbranch is up-to-date with
git checkout main; git pull.
- Create a new branch off of the
git checkout -b [your branch name].
- Add that new feature or fix that bug and do a
git pushto push it to your branch.
- When you’re ready to merge your changes into the
mainbranch, create a pull request in GitHub so that your teammates can look at your code and offer any suggestions for improvement. If it looks good, GitHub will let you merge your changes into
- Now just clean up your local repository by:
- Updating your local
mainbranch with the changes that were just merged in the remote
git checkout main; git merge your-branch. - Deleting the local branch you were working from,
git branch -d [your branch name]
Now just repeat that workflow for any changes you want to make. 👍
If you want to learn more about git or prefer an interactive environment, feel free to check out Learn Git Branching.
If you want to know how to handle some other tricky git scenarios, checkout Dangit, Git!?!.
You’ve learned a lot so far. You’ve learned about the four areas of git, the commands that move your changes through some of these areas, how to revert to a previous commit, edit your commit history, rename branches, and you’ve seen what a typical git workflow looks like.
I hope this post was helpful and that you’ve come away with a better understand of what many of these git commands are doing!