Get the Git

Summary

If not already produced by the source code owner, the following is an argument to request the production of the Git repository (or repositories) for the produced source code.

When engaged by a client to review source code, it has been a pleasure to be provided with source code that contains sufficient programmer comments.

However, the frequent and clichéd programmer resistance to commenting their source code makes it more difficult and time-consuming for a reviewer to understand how the source code works and why it works the way it does.

Further, even when comments are provided in the source code, the comments sometimes only answer the “who” and “when” questions. For example, “Who authored the source code and when did they author it?” Such information is often valuable for the case, but does little to help understand the source code.

Better are source code comments that answer the more nuanced “what”, “how”, and “why” questions; that is, questions which give insight into the high-level relationships between functionality and data that assist in source code tracing.

Source code inspection tools can help stitch together the low-level relationships between source code methods, data, and classes. However, determining the high-level relationships between classes and modules without source code comments often takes more work hours and calendar days than the client expects.

There are too few processes which require a programmer to add comments to their source code, and since their source code is sometimes only briefly reviewed by their peers, a programmer’s resistance to commenting their source code is often not challenged.

However, it is almost always the case that programmers are required to contribute their drafts and final source code to a version control system. Git is one such version code control system.

Git requires each programmer to enter a comment for each file version that they commit to a repository. While a Git commit comment can be uninformative, Git commit comments are often reviewed by the other programmers working on the same project, or other programmers who are dependent upon the project. Therefore, since a programmer’s mandated Git commit comments are broadly viewed, there is peer pressure for each programmer to make substantiative Git commit comments.

Therefore, if a Git repository and the source code controlled by that repository are produced, the Git commit comments can give the source code reviewer insights that are unavailable when programmers do not put similar comments in the source code itself.

Further, Git commit comments provide history that even source code with sufficient comments sometimes do not. For example, often source code comments only describe the version of the source code presently in the file, whereas Git commit comments will also describe previous versions of each file. Sometimes the contents of these previous versions are crucial to the case itself, but even if not, the Git commit comments made for these previous versions of a file can give insight into understanding the produced version of the file.

Details

Each folder tree whose file versions are controlled by Git contains a sub-folder named “.git”. The files and sub-folders under a “.git” folder are called a Git repository (also called a “repo”).

It is possible, and often likely, that a complex source code folder tree might contain more than one Git repo. A complex software project is often divided into one or more sub-projects, each with their own sub-folder tree. This is often the case when multiple groups are working on different parts of the same project. In this situation, it often the case that each group will control the versions of only the files in their sub-project and there will be one “.git” folder for each significant sub-folder tree that is produced. Each “.git” folder is very much part of the folder tree for a given project or sub-project. However, often the source code owner prunes or empties all “.git” folders before they produced their source code.

Git and other Distributed Version Control Systems

This article uses Git as an example for three reasons: (1) Git has a been the most prevalent version control system I have seen used for source code, (2) a Git repo can be easily produced by the source code owner, and (3) it is easy for the source code owner to install on a review computer a tool to navigate the file histories in the produced Git repo.

Git is a distributed version control system and as such, a Git repo is fully contained in a single folder tree; that is, the folder tree whose name is “.git”. Producing a “.git” folder tree is as easy as producing any other folder tree.

On Windows, it is easy to install either a GUI tool like Git GUI or a command-line tool like Git BASH; neither of these example tools requires a commercial license. On Linux and macOS, a command-line Git tool is pre-installed with the operating system. There are several other easy-to-install and non-commercial GUI and command-line tools on Windows, Linux, and macOS that navigate the file histories in a Git repo.

Subversion and other Centralized Version Control Systems

While it is easy to produce a Git repo and easy to install a tool to navigate a Git repo, it is often difficult to produce a repo and install a tool to navigate that repo for a centralized version control system. Centralized version control systems, like Subversion, require client software, server software, and a proprietary database containing the version control information. The complexity of installing and configuring these components on a review computer has often been met with resistance by the source code owner.

Conclusion

Most all software projects are managed by using some version control system. Negotiating with the source code owner to produce all version control repositories that are associated with the produced source code might provide insightful comments that are often lacking in the source code files. Comments from a version control system can make source code reviews more time-efficient and accurate.