This is a series of articles about Git (https://git-scm.com) version control system (VCS). I aim to show you Git from a different perspective, starting with the central part of Git – commit and going further into the branches and remotes. You will see what is in reality underneath “origin”, why the button “Create branch” in Jira/GitHub/Gitlab makes no sense, how to merge unmergeable branches, and much much more.
Most of the articles will include tasks to execute in a terminal. I strongly advise you to do them. This way, you will get a better understanding of the explanations. To simplify the bootstrapping of the tasks, I will include a bash/powershell script, which will prepare a basic repository structure.
I hope, as soon as you finish through all the articles, you will level up your Git skills.
This chapter is going to be part II of explanations about the merge. The reason why I explain merge in so many details because we will need it to understand how the D from DVCS works in Git. D stands for Distributed, and this is where an understanding of ‘merge’ is of significant importance.
Fast-Forward
In the last chapter, we have seen that doing a merge creates a new commit and brings two history lines into one. But what about a “fast-forward” merge, which doesn’t create a separate commit? What is it, and why do we need it?
Fast-forward merge is not a real merge, but merely a movement of a branch label to a new commit. It’s important to note that fast-forward makes only sense when working with branch labels. Such a merge does not change Git history since it does not create any commits. Additionally, such a merge is not possible to trace back, like who did it, when this happened and why.
Often fast-forward occurs when working with a feature branch, and there are no changes in the master branch. As soon as you try to merge a feature branch back to master, Git will apply a fast-forward merge. It will move the ‘master’ label to the commit where the feature branch is on. Look at these two pictures below, one before the merge and one after.
After I try to merge the feature branch into master, I get the following:
Fortunately, editors of nowadays, like VisualCode are giving us “advice” on not doing ‘fast-forward’ and instead create a commit.
The option “Create a new commit even if fast-forward is possible” is precisely doing this. Creating a commit even for a fast-forward merge might be very convenient. This gives you the ability to trace back when this merge happened, why, and by whom. Though, to do it or not is a matter of taste.
It might look not essential to know about fast-forward, though, as you will see, this information is also needed when we will start studying remotes and distributed property of the Git.
In this chapter, we are going to look closer to the merge process. After you’ve learned, that branch is only a label, time to have a look at what merge is.
Let’s start with some theory and slowly move to practice. First, I want to introduce you to the five rules of the merge:
The first rule of the merge is: The merge is about merging two commits, not branches
The second rule of the merge is: The merge is about joining two commits, not branches ( this is not a typo, it’s just very essential )
The third rule of the merge is: It’s not necessary to merge the last two commits ( those without children ), but any commit can be merged with any commit
The fourth rule of the merge is: Merging two commits produces a new commit, even when you merge the same commits again
The fifth rule of the merge is: History of both commits should have diverged and they must have a common parent
If you have read the previous two chapters, you should already know that commit is the most fundamental part in Git. Diverged history points (alternate future) makes it possible to create branches. In this chapter I’m going to tell you about the “branches” you knew before – what are those, and why do you need them?
It appears that branches which you see in Git client, are only labels by themselves. It’s a human-readable reference to a commit. It’s like a Domain Name System (DNS). Domain name resolves to an IP address, and the branch resolves to a Git commit.
This brings us the first valuable property of a branch. Your commit can have as many labels as you want, and because of that, labels don’t have a time they’ve been created. Remember the task from the previous chapter? Have a look at the picture below. I’ve just added 5 “branches” to the dangling commit I had, while the branch was created at 28 of march, branches/labels are placed now.
The second and most valuable point is that removing a branch doesn’t remove the commit itself. It solely removes the label on that commit. I repeat – it SOLELY removes the label on that commit. Very important! Commit stays in history, and even if you do not see it, you can find it as I’ve already described in the previous chapter.
This is the second article of the series about Git. If you haven’t read the introduction, I strongly advise you to do this since this chapter will operate on the knowledge you’ve received in the intro. This time you will have to do exercises in the terminal of your choice. Most of the practices will have a bootstrap script, which will help you to set up the initial folder structure with dummy repositories and commits.
Branches without branch names
The previous chapter was an extended reading, I know, but it was needed to decouple your mind from the idea that “branch” in Git is the holy grail. Contrariwise, the branch is the last thing you should be thinking when working with Git. Putting ‘git commit’ in the first place makes working with Git a different adventure. To make this a second nature, we are going to practice in the form of small tasks, as I already mentioned before.
This is an introduction to my series of articles regarding Git. The information included is essential for understanding the other chapters. If you are familiar with it, you can skip to the next chapter.
There is no “spoon”
Probably, you have been working with Git for quite a long time; but for now, I want to ask you to forget everything you know about Git (like creating branches, committing to the branch, merging branches, etc.) You should forget about the “origin” and the fact Git has branches.
Task 1:
Do you remember how your Git history looks like? It is, probably, something like this (see below), where you can see branch names and that “origin” word:
Now, imagine the same history, but with the “those” strange numbers below each commit.
Have you ever been into a situation where you wanted to do something with your state inside useEffect hook but didn’t want to put it into a dependency array?
react-hooks/exhaustive-deps is an excellent guard when working with primitive values, but as soon as you have an object in the state, it might stay on your way.
Let me show you some code, which requires state handling in the way I’ve described above.
If you’re already using async/await syntax, you might notice that forEach is not working for asynced functions. In that case you might start doing old style for loops or even for-of loops.
Also if you have a bluebird package already installed you can use bluebird.each() instead. In ECMAScript 2018 there is going to be asynchronous iteration, but for those who stuck on older versions I have oneliner, which i use, if i don’t want to install bluebird or other promise library.
Definition of each:
const each = (arr, cb, i = 0) => i < arr.length ? Promise.resolve(cb(arr[i++])).then(() => each(arr, cb, i)) : Promise.resolve();
Usage:
await each(['a', 'b', 'c'], async(item) => {
console.log('this is ', item);
await new Promise(resolve => setTimeout(resolve, 1000));
});
Recently I’ve got a pleasure to debug some bug inside node.js app. Run-time was breaking with the following error:
copy.forEach( function(attrValue) { ^ TypeError: copy.forEach is not a function
Quite usual error, meaning that object has no forEach function. Let’s see the code itself:
function rebuild(data) { // for everything in the data we just fetched
data.forEach(function(group) { // force array for attr
if ( !Array.isArray(group.attr) ) {
group.attr = [group.attr];
}
// for all values in attr
const copy = group.attr.slice(0);
copy.forEach( function(attrValue) {
});
});
}
I bet you were surprised like I was. This is just not possible from the first look, right?. If you read the code above, you can see that one of the developers have put a forcing of the array on member attr and Array MUST have forEach method. Which means JS executor is insane and doesn’t know what he is talking about. At least, this is what I was thinking in the first place.
After scratching my head, drinking a cup of coffee and setting debugger; statement exactly before the break, I understood that JS executor is not insane and I just can’t trust things I see. If you want to find out yourself, what caused the code to break, then don’t read next paragraph, but open your editor and try to reproduce the error. One small tip. You can reproduce it not only in node.js, but in any recent browser.
For those who are back, let’s see what was causing all this mess. It has appeared that attr member of group object was actually defined by using getter and setter. It was not just a plain object as I assumed at the begin, but it was a real instance of one class in this system. The setter on that instance was doing some magic to the passed value causing getter to return a simple value and not an Array.
As I’ve already wrote in one of my article regarding getters and setters I still thinks it’s a bad idea. It might work in languages with static type checking, where every attribute has known type, but it is horribly broken in JavaScript, because JS developers are not used to fact that assign operator is doing some crazy stuff to the object.
Recently I’ve came across a couple of node.js projects which use NODE_ENV for defining environment for the Development, Testing, Acceptance and Production (DTAP) pipeline. At first sight this looks like a good idea, but I would advise against it.
What we often see is that npm modules are consuming NODE_ENV as either ‘production’ or something else. When NODE_ENV is set to ‘production’, then less logging is shown, code is optimized for performance and some other stuff is disabled, which makes it a ‘real production code’. React.js is one of the examples in doing this through the whole codebase. Based on that I see developers define NODE_ENV as ‘testing’, ‘acceptance’ and ‘production’ to have more logging in the test environments, less logging in the production and more performant code in the production. In my opinion, this is one of the things, which you should not do.
When code moves through the DTAP pipeline you want it to be as similar as possible on all the stages. It’s not without the reason that there are ‘testing’ and ‘acceptance’ stages, besides the ‘production’. By making difference inside the code and making it ‘development|testing|acceptance’ code and ‘production’ one, you can’t guarantee that the code which runs in DTA environments will run in ‘production’ the same way. Due to the subtle differences, bugs can popup at the places where you don’t expect them.
The second reason is extra logging which you get on non ‘production’ mode. You would say – That’s exactly what I want in my DTA environments – but I would argue. By making explicit differentiation between DTA and P you branch your release/debug process into two different threads: debugging production code and debugging loggable code. Though if debugging of production code is not in your daily workflow, then probably you will stuck for much longer when things will go wrong there. often people only learn when they’re doing something periodically. But how could we learn to trace and debug ‘production’ code if that doesn’t happen in the daily workflow? Also don’t forget that production bug must be solved MUCH faster than any other bug. This is what makes it even more high priority to learn to do this early in the phase of software development.
The last reason of not using NODE_ENV for defining environments applies only for isomorphic apps. This doesn’t make it less important to me. If you stick to the NODE_ENV this means you will also have to use it in the client code. Say honestly if (NODE_ENV === 'acceptance') looks weird in the client, isn’t it? There is no node in the browser, so it makes no sense.
Here is my rule of thumb. First of all keep your code similar as possible through all the environments and keep NODE_ENV always on ‘production’. Second, if you have to use differentiation, make a new variable for your environment, like APP_ENV or CODE_ENV, you name it. For example, we used APP_ENV for defining our environments, because we used shared log DB and need the way to know, where it comes from.