The Workers’ Place in History

In one of my cases, the dates on which a set of inter-dependent files were created and modified were particularly important to the matter.

However, the creation and modification dates of files in a production are often not made clear to the recipient of the production. The production could have been copied from a source code escrow deposit, copied from a system backup, copied from an unspecified branch or label from an unspecified version control system, copied from a single developer’s computer, or copied from a copy made for a prior litigation.

Further, each file’s creation date and modification date, as recorded by the file system on the review computer, might not reflect the dates on which the developer created or modified that file. A file’s creation date and modification date sometimes only reflect the date on which the production was copied onto the review computer!

For this particular case, I received a source code production with thousands of files whose creation and modification dates were indeed the date the production was copied onto the review computer.

During my analysis of this production, the contents of some source code files appeared to be inconsistent with each other. For example, functions referenced in one file were not defined in any other file of the production. This caused me to doubt that the produced source code files were all from the same time range. 

Fortunately, the Git repositories (i.e., repos) for the production were also produced. Git is a very popular version control system, one of several such systems; for example, Subversion, Mercurial, and Perforce. For each file managed by a version control system, the version control system logs meta data about the file, or files, submitted (i.e., committed) by a developer. 

At a minimum, such logs contain the contents of the file, date and time of the commit, the user ID of the developer who made the commit, and whatever comment was made by that developer to describe why the commit was made and what was added, removed, or changed from the prior commit of that file.

In the jargon of the Git version control system, the copy of a file that the developer is editing on their local system is called the “working copy” of that file. A directory of working copies is called a “working directory” and a tree of working directories is called a “working tree”.

Often, the working copy of a file matches the version of that file that was most recently committed to the version control system. Other times, it is an edited copy of the most recently committed version that the developer intends to commit later.

However, there are no rules that require a working copy of a file to match, or to be derived from, the most recently committed version of that file. Further, a working copy is not required to have any relationship with any committed files; that is, a file on a local system might not yet have been added to the version control system or might never be intended to be added to the version control system.

In my case, I found that many of the inconsistent set of files were working copies that matched commits made on different dates. Using the meta data in the Git version control system, I was able to extract working copies of that set of files that were consistent on a particular date.

Instead of using the produced working tree as the source of truth for my analysis, I used the commit history as the source of truth for each file. For each file, I then noted which copy in the commit history matched the working copy of that file, if any.

To do this for the thousands of files in the production, I created the following script which correlates and tags all files in the working tree with all versions of all files that have committed to the Git version control system. See the documentation in the comments in the script itself. Note: this script is likely to fill your disk with many, many files.

#!/usr/bin/env python3

"""
Copyright 2020-2021 Stairstep Consulting LLC. All rights reserved.

Creative Commons Attribution-ShareAlike 4.0 International Public
License

By exercising the Licensed Rights (defined below), You accept and agree
to be bound by the terms and conditions of this Creative Commons
Attribution-ShareAlike 4.0 International Public License ("Public
License"). To the extent this Public License may be interpreted as a
contract, You are granted the Licensed Rights in consideration of Your
acceptance of these terms and conditions, and the Licensor grants You
such rights in consideration of benefits the Licensor receives from
making the Licensed Material available under these terms and
conditions.


Section 1 -- Definitions.

  a. Adapted Material means material subject to Copyright and Similar
     Rights that is derived from or based upon the Licensed Material
     and in which the Licensed Material is translated, altered,
     arranged, transformed, or otherwise modified in a manner requiring
     permission under the Copyright and Similar Rights held by the
     Licensor. For purposes of this Public License, where the Licensed
     Material is a musical work, performance, or sound recording,
     Adapted Material is always produced where the Licensed Material is
     synched in timed relation with a moving image.

  b. Adapter's License means the license You apply to Your Copyright
     and Similar Rights in Your contributions to Adapted Material in
     accordance with the terms and conditions of this Public License.

  c. BY-SA Compatible License means a license listed at
     creativecommons.org/compatiblelicenses, approved by Creative
     Commons as essentially the equivalent of this Public License.

  d. Copyright and Similar Rights means copyright and/or similar rights
     closely related to copyright including, without limitation,
     performance, broadcast, sound recording, and Sui Generis Database
     Rights, without regard to how the rights are labeled or
     categorized. For purposes of this Public License, the rights
     specified in Section 2(b)(1)-(2) are not Copyright and Similar
     Rights.

  e. Effective Technological Measures means those measures that, in the
     absence of proper authority, may not be circumvented under laws
     fulfilling obligations under Article 11 of the WIPO Copyright
     Treaty adopted on December 20, 1996, and/or similar international
     agreements.

  f. Exceptions and Limitations means fair use, fair dealing, and/or
     any other exception or limitation to Copyright and Similar Rights
     that applies to Your use of the Licensed Material.

  g. License Elements means the license attributes listed in the name
     of a Creative Commons Public License. The License Elements of this
     Public License are Attribution and ShareAlike.

  h. Licensed Material means the artistic or literary work, database,
     or other material to which the Licensor applied this Public
     License.

  i. Licensed Rights means the rights granted to You subject to the
     terms and conditions of this Public License, which are limited to
     all Copyright and Similar Rights that apply to Your use of the
     Licensed Material and that the Licensor has authority to license.

  j. Licensor means the individual(s) or entity(ies) granting rights
     under this Public License.

  k. Share means to provide material to the public by any means or
     process that requires permission under the Licensed Rights, such
     as reproduction, public display, public performance, distribution,
     dissemination, communication, or importation, and to make material
     available to the public including in ways that members of the
     public may access the material from a place and at a time
     individually chosen by them.

  l. Sui Generis Database Rights means rights other than copyright
     resulting from Directive 96/9/EC of the European Parliament and of
     the Council of 11 March 1996 on the legal protection of databases,
     as amended and/or succeeded, as well as other essentially
     equivalent rights anywhere in the world.

  m. You means the individual or entity exercising the Licensed Rights
     under this Public License. Your has a corresponding meaning.


Section 2 -- Scope.

  a. License grant.

       1. Subject to the terms and conditions of this Public License,
          the Licensor hereby grants You a worldwide, royalty-free,
          non-sublicensable, non-exclusive, irrevocable license to
          exercise the Licensed Rights in the Licensed Material to:

            a. reproduce and Share the Licensed Material, in whole or
               in part; and

            b. produce, reproduce, and Share Adapted Material.

       2. Exceptions and Limitations. For the avoidance of doubt, where
          Exceptions and Limitations apply to Your use, this Public
          License does not apply, and You do not need to comply with
          its terms and conditions.

       3. Term. The term of this Public License is specified in Section
          6(a).

       4. Media and formats; technical modifications allowed. The
          Licensor authorizes You to exercise the Licensed Rights in
          all media and formats whether now known or hereafter created,
          and to make technical modifications necessary to do so. The
          Licensor waives and/or agrees not to assert any right or
          authority to forbid You from making technical modifications
          necessary to exercise the Licensed Rights, including
          technical modifications necessary to circumvent Effective
          Technological Measures. For purposes of this Public License,
          simply making modifications authorized by this Section 2(a)
          (4) never produces Adapted Material.

       5. Downstream recipients.

            a. Offer from the Licensor -- Licensed Material. Every
               recipient of the Licensed Material automatically
               receives an offer from the Licensor to exercise the
               Licensed Rights under the terms and conditions of this
               Public License.

            b. Additional offer from the Licensor -- Adapted Material.
               Every recipient of Adapted Material from You
               automatically receives an offer from the Licensor to
               exercise the Licensed Rights in the Adapted Material
               under the conditions of the Adapter's License You apply.

            c. No downstream restrictions. You may not offer or impose
               any additional or different terms or conditions on, or
               apply any Effective Technological Measures to, the
               Licensed Material if doing so restricts exercise of the
               Licensed Rights by any recipient of the Licensed
               Material.

       6. No endorsement. Nothing in this Public License constitutes or
          may be construed as permission to assert or imply that You
          are, or that Your use of the Licensed Material is, connected
          with, or sponsored, endorsed, or granted official status by,
          the Licensor or others designated to receive attribution as
          provided in Section 3(a)(1)(A)(i).

  b. Other rights.

       1. Moral rights, such as the right of integrity, are not
          licensed under this Public License, nor are publicity,
          privacy, and/or other similar personality rights; however, to
          the extent possible, the Licensor waives and/or agrees not to
          assert any such rights held by the Licensor to the limited
          extent necessary to allow You to exercise the Licensed
          Rights, but not otherwise.

       2. Patent and trademark rights are not licensed under this
          Public License.

       3. To the extent possible, the Licensor waives any right to
          collect royalties from You for the exercise of the Licensed
          Rights, whether directly or through a collecting society
          under any voluntary or waivable statutory or compulsory
          licensing scheme. In all other cases the Licensor expressly
          reserves any right to collect such royalties.


Section 3 -- License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the
following conditions.

  a. Attribution.

       1. If You Share the Licensed Material (including in modified
          form), You must:

            a. retain the following if it is supplied by the Licensor
               with the Licensed Material:

                 i. identification of the creator(s) of the Licensed
                    Material and any others designated to receive
                    attribution, in any reasonable manner requested by
                    the Licensor (including by pseudonym if
                    designated);

                ii. a copyright notice;

               iii. a notice that refers to this Public License;

                iv. a notice that refers to the disclaimer of
                    warranties;

                 v. a URI or hyperlink to the Licensed Material to the
                    extent reasonably practicable;

            b. indicate if You modified the Licensed Material and
               retain an indication of any previous modifications; and

            c. indicate the Licensed Material is licensed under this
               Public License, and include the text of, or the URI or
               hyperlink to, this Public License.

       2. You may satisfy the conditions in Section 3(a)(1) in any
          reasonable manner based on the medium, means, and context in
          which You Share the Licensed Material. For example, it may be
          reasonable to satisfy the conditions by providing a URI or
          hyperlink to a resource that includes the required
          information.

       3. If requested by the Licensor, You must remove any of the
          information required by Section 3(a)(1)(A) to the extent
          reasonably practicable.

  b. ShareAlike.

     In addition to the conditions in Section 3(a), if You Share
     Adapted Material You produce, the following conditions also apply.

       1. The Adapter's License You apply must be a Creative Commons
          license with the same License Elements, this version or
          later, or a BY-SA Compatible License.

       2. You must include the text of, or the URI or hyperlink to, the
          Adapter's License You apply. You may satisfy this condition
          in any reasonable manner based on the medium, means, and
          context in which You Share Adapted Material.

       3. You may not offer or impose any additional or different terms
          or conditions on, or apply any Effective Technological
          Measures to, Adapted Material that restrict exercise of the
          rights granted under the Adapter's License You apply.


Section 4 -- Sui Generis Database Rights.

Where the Licensed Rights include Sui Generis Database Rights that
apply to Your use of the Licensed Material:

  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
     to extract, reuse, reproduce, and Share all or a substantial
     portion of the contents of the database;

  b. if You include all or a substantial portion of the database
     contents in a database in which You have Sui Generis Database
     Rights, then the database in which You have Sui Generis Database
     Rights (but not its individual contents) is Adapted Material,

     including for purposes of Section 3(b); and
  c. You must comply with the conditions in Section 3(a) if You Share
     all or a substantial portion of the contents of the database.

For the avoidance of doubt, this Section 4 supplements and does not
replace Your obligations under this Public License where the Licensed
Rights include other Copyright and Similar Rights.


Section 5 -- Disclaimer of Warranties and Limitation of Liability.

  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.

  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.

  c. The disclaimer of warranties and limitation of liability provided
     above shall be interpreted in a manner that, to the extent
     possible, most closely approximates an absolute disclaimer and
     waiver of all liability.


Section 6 -- Term and Termination.

  a. This Public License applies for the term of the Copyright and
     Similar Rights licensed here. However, if You fail to comply with
     this Public License, then Your rights under this Public License
     terminate automatically.

  b. Where Your right to use the Licensed Material has terminated under
     Section 6(a), it reinstates:

       1. automatically as of the date the violation is cured, provided
          it is cured within 30 days of Your discovery of the
          violation; or

       2. upon express reinstatement by the Licensor.

     For the avoidance of doubt, this Section 6(b) does not affect any
     right the Licensor may have to seek remedies for Your violations
     of this Public License.

  c. For the avoidance of doubt, the Licensor may also offer the
     Licensed Material under separate terms or conditions or stop
     distributing the Licensed Material at any time; however, doing so
     will not terminate this Public License.

  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
     License.


Section 7 -- Other Terms and Conditions.

  a. The Licensor shall not be bound by any additional or different
     terms or conditions communicated by You unless expressly agreed.

  b. Any arrangements, understandings, or agreements regarding the
     Licensed Material not stated herein are separate from and
     independent of the terms and conditions of this Public License.


Section 8 -- Interpretation.

  a. For the avoidance of doubt, this Public License does not, and
     shall not be interpreted to, reduce, limit, restrict, or impose
     conditions on any use of the Licensed Material that could lawfully
     be made without permission under this Public License.

  b. To the extent possible, if any provision of this Public License is
     deemed unenforceable, it shall be automatically reformed to the
     minimum extent necessary to make it enforceable. If the provision
     cannot be reformed, it shall be severed from this Public License
     without affecting the enforceability of the remaining terms and
     conditions.

  c. No term or condition of this Public License will be waived and no
     failure to comply consented to unless expressly agreed to by the
     Licensor.

  d. Nothing in this Public License constitutes or may be interpreted
     as a limitation upon, or waiver of, any privileges and immunities
     that apply to the Licensor or You, including from the legal
     processes of any jurisdiction or authority.
"""

import sys
import datetime
import hashlib
import os
import re
import subprocess
import functools

tag = '#'
tag_working = f'{tag}working'
tag_newest = f'{tag}newest'
tag_oldest = f'{tag}oldest'
tag_only = f'{tag}only'
tag_deleted = f'{tag}deleted'


@functools.lru_cache
def md5_from_file_contents(file_path):
    h = hashlib.new('md5')
    chunk_size = 1024 * 1024
    with open(file_path, 'rb') as f:
        chunk = f.read(chunk_size)
        while chunk:
            h.update(chunk)
            chunk = f.read(chunk_size)
    md5 = h.hexdigest()
    return md5


def minimum_commit_prefix_length(commits):
    """ Helps avoid using unnecessarily large commit IDs
    :param commits: array of all commit ID in the .git repository folder
    :return: the fewest number of characters at the beginning of each commit ID where each commit ID will be unique,
    with a minimum of 4 characters
    """

    for prefix_length in range(4, len(commits[0])):
        unique_commits = set()
        collision = False
        for commit in commits:
            prefix = commit[:prefix_length]
            if prefix in unique_commits:
                collision = True
                break
            else:
                unique_commits.add(prefix)
        if not collision:
            return prefix_length
    raise ValueError('git commit identifiers are not unique')


def link_working_directory(case_root, project_root, target_root):
    """Recreates the project_root folder tree structure in the target root folder by creating hard links for each file,
    tags each file with the "working" tag, and calculates the MD5 hash sum for each file for later use in match working
    copy versions to committed versions.
    :param case_root: argv[1]
    :param project_root: argv[2]
    :param target_root: argv[3]
    :return: dict mapping the lowercase version of each target file path to its MD5 hash sum
    """

    md5s = {}
    for folder, __, files in os.walk(project_root):
        for file in files:
            if folder.endswith('/.git') or '/.git/' in folder:
                continue
            # Extract the relative path from folder, starting at the case_root, to apply to the target_root
            target_rel = re.sub(re.escape(case_root) + '/?', '', folder)
            target_folder = os.path.join(target_root, target_rel)
            target_path = os.path.join(target_folder, f'{tag_working}{tag}{file}')
            os.makedirs(target_folder, exist_ok=True)
            os.link(os.path.join(folder, file), target_path)
            md5s[target_path.lower()] = md5_from_file_contents(target_path)
    return md5s


def tag_newest_oldest_or_only_commit(target_root):
    """Add tags to the newest and oldest (or only) distinct committed versions for all files that have at least one
    distinct committed version, as indicated by files which have already been tagged with a commit datetime and commit
    ID.
    :param target_root: argv[3]
    :return: None
    """

    tag_commit_datetime_regex = tag + r'[0-9]{4}[0-9]{2}[0-9]{2}T[0-9]{6}Z'
    tag_commit_id_regex = tag + r'[0-9a-f]{4,}'
    tag_working_or_tag_deleted_regex = f'{tag_working}|{tag_deleted}|'
    tag_filename_regex = f'{tag}.*'
    tags_compiled_regex = re.compile(f'^({tag_commit_datetime_regex})({tag_commit_id_regex})({tag_working_or_tag_deleted_regex})({tag_filename_regex})$')
    for folder, __, files in os.walk(target_root):
        names_newest = {}
        names_oldest = {}
        # The files MUST be in sorted order, because that will sort them by commit datetime which is vital for determining
        #     oldest and newest commit
        for file in sorted(files):
            if search := tags_compiled_regex.search(file):
                tag_commit_datetime = search.group(1)
                tag_commit_id = search.group(2)
                tag_working_or_tag_deleted = search.group(3)
                tag_filename = search.group(4)
                values = folder, file, tag_commit_datetime, tag_commit_id, tag_working_or_tag_deleted
                if tag_working_or_tag_deleted != tag_deleted:
                    if tag_filename not in names_oldest:
                        names_oldest[tag_filename] = values
                    names_newest[tag_filename] = values
        for tag_filename, values in names_newest.items():
            folder, file, tag_commit_datetime, tag_commit_id, tag_working_or_tag_deleted = values
            if tag_filename not in names_oldest or names_oldest[tag_filename] != values:
                os.rename(os.path.join(folder, file), os.path.join(folder, f'{tag_commit_datetime}{tag_commit_id}{tag_newest}{tag_working_or_tag_deleted}{tag_filename}'))
            else:
                os.rename(os.path.join(folder, file), os.path.join(folder, f'{tag_commit_datetime}{tag_commit_id}{tag_only}{tag_working_or_tag_deleted}{tag_filename}'))
        for tag_filename, values in names_oldest.items():
            folder, file, tag_commit_datetime, tag_commit_id, tag_working_or_tag_deleted = values
            if tag_filename not in names_newest or names_newest[tag_filename] != values:
                os.rename(os.path.join(folder, file), os.path.join(folder, f'{tag_commit_datetime}{tag_commit_id}{tag_oldest}{tag_working_or_tag_deleted}{tag_filename}'))


def lock_down(target_root):
    """ Change permission on every folder and every file in the target root folder to disallow addition, removals, or
    changes.
    :param target_root: argv[3]
    :return: None
    """

    for folder, __, files in os.walk(target_root):
        for file in files:
            os.chmod(os.path.join(folder, file), 0o444)
        os.chmod(folder, 0o555)


def recreate_git_commits(case_root, project_rel, target_root):
    """ The main method.
    :param case_root: argv[1]
    :param project_rel: argv[2]
    :param target_root: argv[3]
    :return: None
    """

    project_root = os.path.join(case_root, project_rel)

    # Link all files in working directory to new target directory,
    # preface each file with "working" tag;
    # Calculate md5 of each file for later matching against
    # committed file versions.
    print('Recreating working copy file versions in target folder tree...')
    md5s = link_working_directory(case_root, project_root, target_root)

    # Find each committed file and copy to target directory,
    # preface each file with commit datetime and commit id,
    # match committed file with working copy file if their
    # md5s match.

    timezone_compiled_regex = re.compile(r' ([-+])([0-9][0-9])([0-9][0-9])$')

    # Recursively process each '.git' folder in the project root folder tree
    for folder, sub_folders, __ in os.walk(project_root):
        for sub_folder in sub_folders:
            if sub_folder == '.git':
                # For each '.git' folder, find all commits
                git_root = folder
                git_rel = git_root[len(case_root)+1:]
                print(f'Creating and tagging committed versions from: {git_root}{sub_folder}')
                os.chdir(git_root)
                result = subprocess.run(['git', 'log', '--all'], capture_output=True)
                assert result.returncode == 0
                commit_prefix = 'commit '
                commits = []
                for log_line in result.stdout.decode('utf-8').splitlines():
                    if log_line.startswith(commit_prefix):
                        commit = log_line[len(commit_prefix):]
                        commits.append(commit)
                minimum_length = minimum_commit_prefix_length(commits)
                
                # For each commit, find all files
                for commit in commits:
                    result = subprocess.run(['git', 'show', '--name-only', commit], capture_output=True)
                    assert result.returncode == 0
                    merge_prefix = 'Merge:'
                    date_prefix = 'Date:'
                    commit_datetime = None
                    relative_paths = []
                    blank_line = False
                    merge = False
                    for show_line in result.stdout.decode('utf-8').splitlines():
                        if not blank_line:
                            if not show_line:
                                blank_line = True
                            elif show_line.startswith(merge_prefix):
                                merge = True
                                break
                            elif show_line.startswith(date_prefix):
                                commit_datetime = show_line[len(date_prefix):].strip()
                        else:
                            if show_line and not show_line.startswith(' '):
                                relative_paths.append(show_line)
                    if not merge:
                        assert commit_datetime

                        # Normalize all commit datetimes to ISO format in the UTC timezone
                        timezone = timezone_compiled_regex.search(commit_datetime)
                        assert timezone
                        timezone_direction = -1 if timezone.group(1) == '+' else 1
                        timezone_hour_offset = int(timezone.group(2))
                        timezone_minute_offset = int(timezone.group(3))
                        assert timezone_hour_offset < 24 and timezone_minute_offset < 60
                        commit_datetime_without_timezone = commit_datetime[:-len(' +0000')]
                        t = datetime.datetime.strptime(commit_datetime_without_timezone, '%c')
                        t = t + timezone_direction * datetime.timedelta(hours=timezone_hour_offset,
                                                                        minutes=timezone_minute_offset)
                        tag_commit_datetime = tag + t.isoformat().replace('-', '').replace(':', '') + 'Z'

                        # Use minimum possible commit ID
                        tag_commit_id = f'{tag}{commit[:minimum_length]}'

                        # for each file in the commit, get its contents
                        for relative_path in relative_paths:
                            # git show writes the contents of the file in the commit to stdout
                            result = subprocess.run(['git', 'show', f'{commit}:{relative_path}'], capture_output=True)
                            # git show uses return code 128 to indicate file deleted by the commit
                            assert result.returncode in (0, 128)
                            tag_maybe_deleted = tag_deleted if result.returncode == 128 else ''
                            relative_path = relative_path.strip('"')

                            parent, filename = os.path.split(relative_path)
                            tag_filename = f'{tag}{filename}'

                            # Calculate MD5 hash sum for contents of file in the commit
                            h = hashlib.new('md5')
                            h.update(result.stdout)
                            md5 = h.hexdigest()

                            target_folder = os.path.join(target_root, git_rel, parent)
                            target_path = os.path.join(target_folder, f'{tag_working}{tag_maybe_deleted}{tag_filename}')
                            os.makedirs(target_folder, exist_ok=True)

                            if os.path.isfile(target_path) and md5 == md5s[target_path.lower()]:
                                # This committed version is the same as the working copy version, so just add committed version commit tags to the working copy version file name
                                new_target_path = os.path.join(target_root, git_rel, parent, f'{tag_commit_datetime}{tag_commit_id}{tag_working}{tag_maybe_deleted}{tag_filename}')
                                os.rename(target_path, new_target_path)
                            else:
                                # This committed version is not the same as the working copy version, create a new file for the committed version
                                new_target_path = os.path.join(target_root, git_rel, parent, f'{tag_commit_datetime}{tag_commit_id}{tag_maybe_deleted}{tag_filename}')
                                with open(new_target_path, 'wb') as w:
                                    w.write(result.stdout)

    print(f'Tagging newest and oldest (or only) committed versions...')
    tag_newest_oldest_or_only_commit(target_root)
    print(f'Locking down target folder tree...')
    lock_down(target_root)


if __name__ == '__main__':
    """
    argv[1] = absolute folder path for the case 
    argv[2] = relative folder path under argv[1] that contains the .git repository folder (or folders) to be processed
    argv[3] = absolute folder path that will contain the output of this utility
    
    For example, to process the following .git repository folder,
        '/Users/username/Documents/casename/sourcecodefolder/production1/.git', specify the following arguments:
    
         argv[1] = '/Users/username/Documents/casename'
         argv[2] = 'sourcecodefolder/production1'
         argv[3] = '/Users/username/Documents/casename/Committed Versions in sourcecodefolder-production1'
     
    This utility will recursively process all .git repository folders it finds in the folder tree specified in argv[2].
                
    The output folder tree in argv[3] will have the same structure as the folder tree specified in argv[2].
    
    The output folder tree in argv[3] will contain a copy of all the files in the folder tree specified in argv[2]. Per
    Git terminology, these are called the "working copy" versions.
    
    Additionally, the output folder tree in argv[3] will contain a copy of all distinct committed versions of all files
    from all commits from all .git repository folders, even files that have been deleted via "git rm". To avoid
    redundancy, a committed version of a file appears in the output folder tree only if it is the oldest (or only)
    committed version or its contents are different from its most recently distinct committed version. That is, a file
    may be included in 100 commits, but if it has only changed, say, twice in those 100 commits, this utility will
    output only the original copy of the file and its two changed versions.
    
    This will result in multiple versions of the file in its output folder, one copy for each distinct version.
    Therefore, to distinguish these multiple versions, each copy will have a file which has the same file name, but
    a different combination of "tags" in its file name. Each tag and the original file name are preceded by the '#'
    character. Here are the possible tags:
    
    A working copy version contains the tag:
    
        #working
    
    Each distinct committed version contains the two tags,
    
        commit datetime in ISO format in UTC timezone; e.g., #20140718T160140Z
        commit ID; e.g., #f48a2a9e
         
    The commit ID contains only as many characters from the beginning of the full commit ID that are necessary to
    distinguish the commit ID from the other commits in the same .git repository folder.
    
    The newest distinct committed version contains the tag:
    
        #newest
    
    The oldest distinct committed version contains the tag:
    
        #oldest
    
    If there is only one distinct committed version, it contains the tag:
    
        #only
        
    If the file has been deleted via "git rm", this utility creates an empty file to memorialize the deletion. Along with
    the tags for the commit datetime and the commit ID, such a file contains the tag:
    
        #deleted
        
    A common scenario is a file which has three or more distinct committed versions, where the working copy version 
    matches the newest distinct committed version. In this scenario, the output folder will contain the following 
    versions of our example file, README.txt:
    
        For the oldest distinct committed version, an example file name is:
        
            #20200403T173955Z#503cd058#oldest#README.txt

        For each distinct committed version that is neither the newest nor oldest distinct committed version, some
        example file names are:
        
            #20200610T003255Z#06fda5da#README.txt
            #20200709T001011Z#8ac696ab#README.txt
            ...

        For the newest distinct committed version that matches the working copy version, an example file name is:
        
            #20201117T134113Z#e8b08035#newest#working#README.txt
        
    In addition to this common scenario, below are other scenarios: 
    
        If the above README.txt file is later deleted via "git rm", the above example files will be followed by an empty
        file containing the "#deleted" tag which matches the commit that deleted the file:
        
            #20200403T173955Z#503cd058#oldest#README.txt
            #20200610T003255Z#06fda5da#README.txt
            #20200709T001011Z#8ac696ab#README.txt
            ...
            #20201117T134113Z#e8b08035#newest#README.txt
            #20210804T204541Z#b50d9684#deleted#README.txt

        In the above scenario, since README.txt has been deleted via "git rm", is unlikely that there will be a working
        copy version of README.txt, but there could be. The above scenario assumes there is no working copy version of 
        README.txt, and thus there is no "#working" tag in the newest distinct committed version.
        
        If there is only one distinct committed version and that version matches the working copy version, an example
        file name is:
        
            #20210804T204541Z#b50d9684#only#working#README.txt
        
        It is possible that no distinct committed version has a matching working copy version. In this scenario, no
        distinct committed version file name would contain the "#working" tag.
        
        It is possible that the working copy version matches a distinct committed version other than the newest
        distinct committed version. In this scenario, the "#working" tag will only appear in the file name of the
        distinct committed version that matches the working copy version.
    
        It is possible that there is no distinct committed version that matches a working copy version. In this scenario,
        there will be only one version of the file and that version will contain only the "#working" tag.          
    """
    recreate_git_commits(sys.argv[1], sys.argv[2], sys.argv[3])