Unboxing the Unboxed

In many of my cases, either the source code production on the review computer comprises one or more archive files (e.g., ZIP files) or the production is a folder tree that contains one or more archive files.

Some archive files, when unarchived, expand to a folder tree that contains more archive files, and some of those more archive files, when unarchived, expand to a folder tree that contains even more archive files, etc.

Therefore, in order to expose all the files in the production, there are two important steps to take:

  1. Identify all archive files
  2. Recursively unarchive each identified archive file

The first step might seem easy: find all the files with a “.zip” file extension. However, in addition to the “.zip” extension, there are many file types which contain one or more files and folders. 

From my experience, the 7-Zip tool unarchives the largest number of archive file types. The 7-Zip tool can unarchive more than 100 different archive file types. For example, even though the following file extensions might be thought to identify opaque binary file types, they are actually archives which contain other files and folders:

  • .docx is a modern Microsoft Word file that is an archive of files which comprise the content of the document and meta data about the document. Below is an example of the folder tree created after unarchiving a file called Doc.docx, where the contents of the Microsoft Word document are in the file “Doc\word\document.xml”, and the contents of a Microsoft Excel document embedded within the Microsoft Word file are in the file “Doc\word\embeddings\Worksheet.xlsx”:
        Doc
        ├── [Content_Types].xml
        ├── _rels
        ├── docProps
        │   ├── app.xml
        │   └── core.xml
        └── word
            ├── _rels
            │   └── document.xml.rels
            ├── document.xml
            ├── embeddings
            │   └── Worksheet.xlsx
            ├── fontTable.xml
            ├── settings.xml
            ├── styles.xml
            ├── theme
            │   └── theme1.xml
            └── webSettings.xml
  • .pptx is a modern Microsoft Powerpoint file that is an archive of files which comprise the content of the slides and meta data about the slides.
  • .xlsx is a modern Microsoft Excel file that is an archive of files which comprise the content of the spreadsheet and meta data about the spreadsheet.
  • .jar is a Java ARchive file that is an archive of Java class files.
  • .apk is an Android Package Kit file that is an archive of files used to implement an Android app.

The most popular archive file type that 7-Zip does not unarchive without the requirement to install 7-Zip plugins, is the Roshal ARchive file type (i.e., .rar).

The following is a script which uses 7-Zip (for most all archives) and unar (for .rar archives) to recursively unarchive the files in a production. The documentation for the script is contained in the comments in the script itself.

#!/usr/bin/env python3

"""
Copyright 2020-2021 Stairstep Consulting LLC. All rights reserved.

Creative Commons Attribution-ShareAlike 4.0 International Public
License

By exercising the Licensed Rights (defined below), You accept and agree
to be bound by the terms and conditions of this Creative Commons
Attribution-ShareAlike 4.0 International Public License ("Public
License"). To the extent this Public License may be interpreted as a
contract, You are granted the Licensed Rights in consideration of Your
acceptance of these terms and conditions, and the Licensor grants You
such rights in consideration of benefits the Licensor receives from
making the Licensed Material available under these terms and
conditions.


Section 1 -- Definitions.

  a. Adapted Material means material subject to Copyright and Similar
     Rights that is derived from or based upon the Licensed Material
     and in which the Licensed Material is translated, altered,
     arranged, transformed, or otherwise modified in a manner requiring
     permission under the Copyright and Similar Rights held by the
     Licensor. For purposes of this Public License, where the Licensed
     Material is a musical work, performance, or sound recording,
     Adapted Material is always produced where the Licensed Material is
     synched in timed relation with a moving image.

  b. Adapter's License means the license You apply to Your Copyright
     and Similar Rights in Your contributions to Adapted Material in
     accordance with the terms and conditions of this Public License.

  c. BY-SA Compatible License means a license listed at
     creativecommons.org/compatiblelicenses, approved by Creative
     Commons as essentially the equivalent of this Public License.

  d. Copyright and Similar Rights means copyright and/or similar rights
     closely related to copyright including, without limitation,
     performance, broadcast, sound recording, and Sui Generis Database
     Rights, without regard to how the rights are labeled or
     categorized. For purposes of this Public License, the rights
     specified in Section 2(b)(1)-(2) are not Copyright and Similar
     Rights.

  e. Effective Technological Measures means those measures that, in the
     absence of proper authority, may not be circumvented under laws
     fulfilling obligations under Article 11 of the WIPO Copyright
     Treaty adopted on December 20, 1996, and/or similar international
     agreements.

  f. Exceptions and Limitations means fair use, fair dealing, and/or
     any other exception or limitation to Copyright and Similar Rights
     that applies to Your use of the Licensed Material.

  g. License Elements means the license attributes listed in the name
     of a Creative Commons Public License. The License Elements of this
     Public License are Attribution and ShareAlike.

  h. Licensed Material means the artistic or literary work, database,
     or other material to which the Licensor applied this Public
     License.

  i. Licensed Rights means the rights granted to You subject to the
     terms and conditions of this Public License, which are limited to
     all Copyright and Similar Rights that apply to Your use of the
     Licensed Material and that the Licensor has authority to license.

  j. Licensor means the individual(s) or entity(ies) granting rights
     under this Public License.

  k. Share means to provide material to the public by any means or
     process that requires permission under the Licensed Rights, such
     as reproduction, public display, public performance, distribution,
     dissemination, communication, or importation, and to make material
     available to the public including in ways that members of the
     public may access the material from a place and at a time
     individually chosen by them.

  l. Sui Generis Database Rights means rights other than copyright
     resulting from Directive 96/9/EC of the European Parliament and of
     the Council of 11 March 1996 on the legal protection of databases,
     as amended and/or succeeded, as well as other essentially
     equivalent rights anywhere in the world.

  m. You means the individual or entity exercising the Licensed Rights
     under this Public License. Your has a corresponding meaning.


Section 2 -- Scope.

  a. License grant.

       1. Subject to the terms and conditions of this Public License,
          the Licensor hereby grants You a worldwide, royalty-free,
          non-sublicensable, non-exclusive, irrevocable license to
          exercise the Licensed Rights in the Licensed Material to:

            a. reproduce and Share the Licensed Material, in whole or
               in part; and

            b. produce, reproduce, and Share Adapted Material.

       2. Exceptions and Limitations. For the avoidance of doubt, where
          Exceptions and Limitations apply to Your use, this Public
          License does not apply, and You do not need to comply with
          its terms and conditions.

       3. Term. The term of this Public License is specified in Section
          6(a).

       4. Media and formats; technical modifications allowed. The
          Licensor authorizes You to exercise the Licensed Rights in
          all media and formats whether now known or hereafter created,
          and to make technical modifications necessary to do so. The
          Licensor waives and/or agrees not to assert any right or
          authority to forbid You from making technical modifications
          necessary to exercise the Licensed Rights, including
          technical modifications necessary to circumvent Effective
          Technological Measures. For purposes of this Public License,
          simply making modifications authorized by this Section 2(a)
          (4) never produces Adapted Material.

       5. Downstream recipients.

            a. Offer from the Licensor -- Licensed Material. Every
               recipient of the Licensed Material automatically
               receives an offer from the Licensor to exercise the
               Licensed Rights under the terms and conditions of this
               Public License.

            b. Additional offer from the Licensor -- Adapted Material.
               Every recipient of Adapted Material from You
               automatically receives an offer from the Licensor to
               exercise the Licensed Rights in the Adapted Material
               under the conditions of the Adapter's License You apply.

            c. No downstream restrictions. You may not offer or impose
               any additional or different terms or conditions on, or
               apply any Effective Technological Measures to, the
               Licensed Material if doing so restricts exercise of the
               Licensed Rights by any recipient of the Licensed
               Material.

       6. No endorsement. Nothing in this Public License constitutes or
          may be construed as permission to assert or imply that You
          are, or that Your use of the Licensed Material is, connected
          with, or sponsored, endorsed, or granted official status by,
          the Licensor or others designated to receive attribution as
          provided in Section 3(a)(1)(A)(i).

  b. Other rights.

       1. Moral rights, such as the right of integrity, are not
          licensed under this Public License, nor are publicity,
          privacy, and/or other similar personality rights; however, to
          the extent possible, the Licensor waives and/or agrees not to
          assert any such rights held by the Licensor to the limited
          extent necessary to allow You to exercise the Licensed
          Rights, but not otherwise.

       2. Patent and trademark rights are not licensed under this
          Public License.

       3. To the extent possible, the Licensor waives any right to
          collect royalties from You for the exercise of the Licensed
          Rights, whether directly or through a collecting society
          under any voluntary or waivable statutory or compulsory
          licensing scheme. In all other cases the Licensor expressly
          reserves any right to collect such royalties.


Section 3 -- License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the
following conditions.

  a. Attribution.

       1. If You Share the Licensed Material (including in modified
          form), You must:

            a. retain the following if it is supplied by the Licensor
               with the Licensed Material:

                 i. identification of the creator(s) of the Licensed
                    Material and any others designated to receive
                    attribution, in any reasonable manner requested by
                    the Licensor (including by pseudonym if
                    designated);

                ii. a copyright notice;

               iii. a notice that refers to this Public License;

                iv. a notice that refers to the disclaimer of
                    warranties;

                 v. a URI or hyperlink to the Licensed Material to the
                    extent reasonably practicable;

            b. indicate if You modified the Licensed Material and
               retain an indication of any previous modifications; and

            c. indicate the Licensed Material is licensed under this
               Public License, and include the text of, or the URI or
               hyperlink to, this Public License.

       2. You may satisfy the conditions in Section 3(a)(1) in any
          reasonable manner based on the medium, means, and context in
          which You Share the Licensed Material. For example, it may be
          reasonable to satisfy the conditions by providing a URI or
          hyperlink to a resource that includes the required
          information.

       3. If requested by the Licensor, You must remove any of the
          information required by Section 3(a)(1)(A) to the extent
          reasonably practicable.

  b. ShareAlike.

     In addition to the conditions in Section 3(a), if You Share
     Adapted Material You produce, the following conditions also apply.

       1. The Adapter's License You apply must be a Creative Commons
          license with the same License Elements, this version or
          later, or a BY-SA Compatible License.

       2. You must include the text of, or the URI or hyperlink to, the
          Adapter's License You apply. You may satisfy this condition
          in any reasonable manner based on the medium, means, and
          context in which You Share Adapted Material.

       3. You may not offer or impose any additional or different terms
          or conditions on, or apply any Effective Technological
          Measures to, Adapted Material that restrict exercise of the
          rights granted under the Adapter's License You apply.


Section 4 -- Sui Generis Database Rights.

Where the Licensed Rights include Sui Generis Database Rights that
apply to Your use of the Licensed Material:

  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
     to extract, reuse, reproduce, and Share all or a substantial
     portion of the contents of the database;

  b. if You include all or a substantial portion of the database
     contents in a database in which You have Sui Generis Database
     Rights, then the database in which You have Sui Generis Database
     Rights (but not its individual contents) is Adapted Material,

     including for purposes of Section 3(b); and
  c. You must comply with the conditions in Section 3(a) if You Share
     all or a substantial portion of the contents of the database.

For the avoidance of doubt, this Section 4 supplements and does not
replace Your obligations under this Public License where the Licensed
Rights include other Copyright and Similar Rights.


Section 5 -- Disclaimer of Warranties and Limitation of Liability.

  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.

  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.

  c. The disclaimer of warranties and limitation of liability provided
     above shall be interpreted in a manner that, to the extent
     possible, most closely approximates an absolute disclaimer and
     waiver of all liability.


Section 6 -- Term and Termination.

  a. This Public License applies for the term of the Copyright and
     Similar Rights licensed here. However, if You fail to comply with
     this Public License, then Your rights under this Public License
     terminate automatically.

  b. Where Your right to use the Licensed Material has terminated under
     Section 6(a), it reinstates:

       1. automatically as of the date the violation is cured, provided
          it is cured within 30 days of Your discovery of the
          violation; or

       2. upon express reinstatement by the Licensor.

     For the avoidance of doubt, this Section 6(b) does not affect any
     right the Licensor may have to seek remedies for Your violations
     of this Public License.

  c. For the avoidance of doubt, the Licensor may also offer the
     Licensed Material under separate terms or conditions or stop
     distributing the Licensed Material at any time; however, doing so
     will not terminate this Public License.

  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
     License.


Section 7 -- Other Terms and Conditions.

  a. The Licensor shall not be bound by any additional or different
     terms or conditions communicated by You unless expressly agreed.

  b. Any arrangements, understandings, or agreements regarding the
     Licensed Material not stated herein are separate from and
     independent of the terms and conditions of this Public License.


Section 8 -- Interpretation.

  a. For the avoidance of doubt, this Public License does not, and
     shall not be interpreted to, reduce, limit, restrict, or impose
     conditions on any use of the Licensed Material that could lawfully
     be made without permission under this Public License.

  b. To the extent possible, if any provision of this Public License is
     deemed unenforceable, it shall be automatically reformed to the
     minimum extent necessary to make it enforceable. If the provision
     cannot be reformed, it shall be severed from this Public License
     without affecting the enforceability of the remaining terms and
     conditions.

  c. No term or condition of this Public License will be waived and no
     failure to comply consented to unless expressly agreed to by the
     Licensor.

  d. Nothing in this Public License constitutes or may be interpreted
     as a limitation upon, or waiver of, any privileges and immunities
     that apply to the Licensor or You, including from the legal
     processes of any jurisdiction or authority.
"""


import sys
import os
import subprocess


# file types to not attempt to unarchive because their unarchiving fails, at least for your production
fail_extensions = []

# file names to not attempt to unarchive because their unarchiving fails, at least for your production
fail_files = []

# file types to not attempt to unarchive because you don't need to see their unarchived contents
unnecessary_extensions = []

# file names to not attempt to unarchive because you don't need to see their unarchived contents
unnecessary_files = []

# If True, extensions to be unarchived will be parsed from output of "7z i"
do_parse_include_extensions = True

# file types to unarchive even if "7z i" output is not parsed or even if these file types are not output by "7z i"
force_extensions = []

# file names to unarchive even if "7z i" output is not parsed or even if these files' types are not output by "7z i"
force_files = []


fail_extensions = [x.lower() for x in fail_extensions]
fail_files = [x.lower() for x in fail_files]
unnecessary_extensions = [x.lower() for x in unnecessary_extensions]
unnecessary_files = [x.lower() for x in unnecessary_files]
force_extensions = [x.lower() for x in force_extensions]
force_files = [x.lower() for x in force_files]

exclude_extensions = set(fail_extensions + unnecessary_extensions)
exclude_files = set(fail_files + unnecessary_files)

include_extensions = set(force_extensions)
include_files = set(force_files)

# temporary suffix added to end of each archive file for which unarchiving has been attempted to avoid a second attempt
attempted_unarchiving_file_suffix = '-attempted_unarchiving'

# suffix added to the end of the folder created to hold the contents of an unarchived archive file
unarchived_folder_suffix = '-unarchived'


def parse_include_extensions() -> None:
    """
    Parses the output of the "7z i" command to determine what extensions 7z says it can process.
    At the end of this file is an example of the "7z i" output.
    Adds to the set in the include_extensions global variable.
    :return: None
    """
    if do_parse_include_extensions:
        start_of_formats_section = False
        for line in os.popen('7z i').readlines():
            line = line.rstrip()
            # Parses the "7z i" section between the line that begins with "Formats:" and the next blank line
            if start_of_formats_section:
                if not line:
                    # End of "Formats:" section
                    break
                # Ignore characters before start of the blank-separated extensions and create list of tokens that follow
                tokens = line[26:].split(' ')
                # Not all tokens at the end of the list might be extensions, so prune non-extensions from right-to-left
                add_remaining_tokens_as_extensions = False
                for index, token in enumerate(reversed(tokens)):
                    if len(token) > 2 or index == len(tokens) - 1:
                        if not token.startswith('offset=') and not token.startswith('\\x') and not token == '(~.swf)':
                            add_remaining_tokens_as_extensions = True
                    # Some extensions are specified as just, for example, "abc" and some are specified as "(.abc)";
                    #   normalize each to ".abc"
                    if add_remaining_tokens_as_extensions:
                        if token.startswith('(.') and token.endswith(')'):
                            token = token[1:-1]
                        else:
                            token = '.' + token
                        include_extensions.add(token)
            if line.startswith('Formats:'):
                start_of_formats_section = True
    if include_extensions or include_files:
        formatted_list = ' '.join([('*' + x) for x in sorted(include_extensions)] + sorted(include_files))
        print(f'\nWill attempt to unarchive these files: {formatted_list}')
    else:
        print(f'Will attempt to unarchive all files')
    if exclude_extensions or exclude_files:
        formatted_list = ' '.join([('*' + x) for x in sorted(exclude_extensions)] + sorted(exclude_files))
        print(f'\n...except will not attempt to unarchive these files: {formatted_list}')


def initial_linking(source_path: str, target_root: str) -> None:
    """
    Clones the files and folders in the source_path to the target_root folder, using hard links if available
    :param source_path: Path of file that might be an archive or of folder that might contain one or more archives
    :param target_root: Path of folder that will contain the recursively unarchived folder tree
    :return: None
    """
    print(f'\nLinking...')
    if os.path.isdir(source_path):
        source_path = source_path if not source_path.endswith(os.path.sep) else source_path[:-1]
        for source_folder, source_sub_folders, source_files in os.walk(source_path):
            for source_file in source_files:
                process_file(source_path, source_folder[len(source_path)+1:], source_file, target_root)
            for source_sub_folder in source_sub_folders:
                process_folder(source_path, source_folder[len(source_path)+1:], source_sub_folder, target_root)
    else:
        process_file(os.path.dirname(source_path), '', os.path.basename(source_path), target_root)


def process_file(source_root: str, source_rel: str, source_file_name: str, target_root: str) -> None:
    target_path = os.path.join(target_root, source_rel)
    os.makedirs(target_path, exist_ok=True)
    os.link(os.path.join(source_root, source_rel, source_file_name), os.path.join(target_path, source_file_name))


def process_folder(__source_root: str, source_rel: str, source_folder_name: str, target_root: str) -> None:
    target_path = os.path.join(target_root, source_rel, source_folder_name)
    os.makedirs(target_path, exist_ok=True)


def unarchive_recursively_passes(target_root: str, passwords: [str]) -> None:
    """
    Pass through the entire production unarchiving archive files. If any archive files were unarchived in a pass, do
    another pass since the prior unarchiving pass could have expanded to a folder tree that contains more archive files.
    Stop when a pass finds no archive files.
    :param target_root: Target folder
    :param passwords: List of passwords to try for each archive.
    :return: None
    """
    perform_another_pass = True
    pass_number = 1
    while perform_another_pass:
        print(f'\nUnarchiving... (pass {pass_number})')
        perform_another_pass = unarchive_pass(target_root, passwords)
        pass_number += 1
    print(f'None')


def unarchive_pass(target_root: str, passwords: [str]) -> bool:
    """
    Looks for archive files using the following precedents:
        ignore archive files for which unarchiving has already been attempted.
        ignore files that match either the file extension or file name exclusion lists.
        if either a file extension or file name inclusion list is specified, attempt to unarchive each file that matches
        either the file extension or file name inclusion list.
        if neither a file extension nor file name inclusion list is specified, attempt to unarchive all files (yes, even
        files like "*.txt" files, because some file's extension hides the fact that they are actually unarchive'able).
    :param target_root: path where the results go
    :param passwords: list of passwords to try for each unarchiving attempt
    :return: False, if no unarchiving was attempted on this pass; True, if at least one unarchiving was attempted on this
    pass, implying that an unarchiving in this pass might have exposed additional archive files to be processed in the
    next pass.
    """
    perform_another_pass = False
    for folder, __, files in os.walk(target_root):
        for file in files:
            if not file.endswith(attempted_unarchiving_file_suffix):
                _base, extension = os.path.splitext(file)
                extension = extension.lower()
                if (not exclude_extensions or extension not in exclude_extensions) and (not exclude_files or file not in exclude_files):
                    if (not include_extensions and not include_files) or (include_extensions and extension in include_extensions) or (include_files and file in include_files):
                        archive_path = os.path.join(folder, file)
                        print(f"{archive_path[len(target_root) + len('/'):]}")
                        unarchive_file(archive_path, extension, passwords)
                        perform_another_pass = True
    return perform_another_pass


def unarchive_file(archive_path: str, extension: str, passwords: [str]) -> None:
    """
    If a file has already been identified as being an archive (by the caller of this function), then call the external
    command to do the unarchiving.
    :param archive_path: Path of the archive file to be unarchived
    :param extension: The extension of the archive file
    :param passwords: The list of zero or more passwords to apply to this, and every other, archive
    :return: None
    """
    unarchived_folder_path = archive_path + unarchived_folder_suffix
    if not passwords:
        passwords.append('password')
    for password in passwords:
        try:
            if extension.lower() == '.rar':
                command = ['unar', '-q', '-p', password, '-o', unarchived_folder_path, archive_path]
            else:
                command = ['7z', 'x', archive_path, '-bso0', '-bsp0', f'-p{password}', f'-o{unarchived_folder_path}']
            result = subprocess.run(command, capture_output=True)
            if result.returncode != 0:
                print(result.stderr.decode('utf-8'), end='')
                exit(1)
            os.rename(archive_path, archive_path + attempted_unarchiving_file_suffix)
            return
        except subprocess.TimeoutExpired:
            pass
    print(f'Could not unarchive file: {archive_path}')
    exit(1)


def remove_attempted_unarchiving_file_suffixes(target_root: str) -> None:
    """
    To avoid unarchiving an archive more than once, an already unarchived archive is given a unique suffix to be
    identified as already having been processed. After all recursive unarchiving passes have completed, this function
    removes those suffixes.
    :param target_root: Target folder
    :return: None
    """
    print(f'\nRemoving Attempted Unarchiving File Suffixes...')
    paths = []
    for folder, __, files in os.walk(target_root):
        for file in files:
            if file.endswith(attempted_unarchiving_file_suffix):
                paths.append(os.path.join(folder, file))
    for path in sorted(paths, reverse=True):
        os.rename(path, path[:-len(attempted_unarchiving_file_suffix)])


def identify_longest_paths(target_root: str) -> None:
    """
    Prints the longest path. Necessary because Windows does not allow a path > 260 characters, including "C:\\" and a
    null byte at the end of the path. For example, if "C:\\Review\\" is prepended to every path in the target_root, then
    the maximum path starting from the target_root to the end is 260 - length("C:\\Review\\") - length(b'0x00') = 249
    :param target_root: The root of the target folder
    :return: None
    """
    print(f'\nIdentifying Longest Paths without Prefix (> 260 characters with prefix is too long)...')
    longest_path_len = 0
    longest_paths = []
    for folder, _, files in os.walk(target_root):
        for file in files:
            path = os.path.join(folder, file)[len(target_root) + len('/'):]
            path_len = len(path)
            if path_len > longest_path_len:
                longest_path_len = path_len
                longest_paths = [path]
            elif path_len == longest_path_len:
                longest_paths.append(path)
    for longest_path in sorted(longest_paths):
        print(f'{longest_path_len}: {longest_path}')


def unarchive_recursively(source_path: str, target_root: str, passwords: [str]) -> None:
    """
    Recursively unarchives any archive files found in source_path
    :param source_path: Path of the file or folder that might contain one or archive files
    :param target_root: Path of the folder to contain the recursively unarchived folder tree
    :param passwords: List of zero or more passwords to apply to each archive
    :return: None
    """
    if source_path.endswith('/'):
        source_path = source_path[:-1]

    if target_root.endswith('/'):
        target_root = target_root[:-1]

    if not os.path.exists(source_path):
        print(f'Source Path Does Not Exist: {source_path}')
        exit(1)

    if os.path.exists(target_root):
        print(f'Target Path Exists: {target_root}')
        exit(1)

    parse_include_extensions()
    initial_linking(source_path, target_root)
    unarchive_recursively_passes(target_root, passwords)
    remove_attempted_unarchiving_file_suffixes(target_root)
    identify_longest_paths(target_root)


def _unarchive_recursively_main():
    """
    :usage: unarchive_recursively source_file_or_folder target_folder [password...]
                source_file_or_folder   A file or a folder.
                                        Trivially, if this is a file which is not an archive, the target folder will
                                        contain the source file, and if this is a folder that does not contain any
                                        archive files, the target folder will contain the source folder.
                target_folder           The recursively unarchived source is put under the target folder.
                                        The target folder is populated with hard-links of all files from the source.
                [password...]           One or more passwords can be specified. For each archive, each specified
                                        password will be tried until one succeeds. This allows you to unarchive multiple
                                        nested archive files that use different passwords with only one invocation of
                                        this utility. If no password is specified, the default is "password".

    Dependencies
        If even one archive file in the source is a .rar file, then the unar utility must be in your search path
        (for macOS use: brew install unar).

        If even one archive file in the source is not a .rar file, then the 7z utility must be in your search path
        (for macOS use: brew install p7zip).

    Preferences
        It may be the case that files which this utility considers to be archive files, you don't feel you need to
        unarchive, for whatever reason. You can prevent unarchiving files with specific extensions and/or files with
        specific names by listing them in the Python variables "unnecessary_extensions" and "unnecessary_files" at the
        top of this script.

        By default, this script determines which files are archives based on the extensions output by calling the "7z i"
        command. This command lists over 100 extensions. If you want to only unarchive files with a smaller number of
        specific extension and/or files with specific names, then set the Python variable "do_parse_include_extensions"
        at the top of this file to False and list only the extensions you want to unarchive in the Python variable
        "force_extensions" and/or the file names you want to unarchive in the Python variable "force_files", both at the
        top of this script. Even if you allow the default so that this script determines which files are archives based
        on the extensions output by calling the "7z i" command, you can still list file extensions and file names in the
        "force_extensions" and "force_files" variables to archive files that would not otherwise be archived by default.

    Troubleshooting
        If this utility hangs, it might be waiting for the password of an archive file which you did not specify because
        you did not know the archive file requires a password.

        It may be the case that files which this utility considers to be archive files are not really archive files,
        or are corrupted archive files. You can prevent unarchiving files with specific extensions and/or files with
        specific names by listing them in the Python variables "fail_extensions" and "fail_files" at the top of this
        script.

    Paths that might be too long for Microsoft Windows are identified
        By default, the Microsoft Windows operating system does not allow the path of a file to be longer than 260
        characters. This limit includes the drive designation (e.g., "C:\\") and includes the null byte at the end of the
        path. For example, if all the files in the production are under the folder "C:\\Review\\", then the maximum path
        following "C:\\Review\\" will be 249 characters; that is, 260 - length(“C:\\Review\\”) - 1.

        While a folder tree itself can reach the 260 character path limit, archives embedded in other archives can
        quickly reach or exceed the 260 character path limit as they are recursively unarchived. Therefore, this script
        identifies the longest paths.

    Already unarchived archive files are avoided
        It is possible that the same folder in the production might contain both an archive file and a folder containing
        the unarchived contents of that archive file. For example, I have seen folders in productions which contain
        something like the following:
            foo.zip
            foo\

        In those situations, the folder named “foo\\” contained the unarchived contents of the archive file named
        “foo.zip”. However, the folder named “foo\\” is not guaranteed to contain the unarchived contents of the archive
        file named “foo.zip”.

        For this reason, when unarchiving a file named “foo.zip”, this script will unarchive its contents into a folder
        called “foo.zip-unarchived\\”, with the expectation that there will not already be a folder named
        “foo.zip-unarchived\\”. Therefore, in the above example, after this script is run, there will be the following:
            foo.zip
            foo\\
            foo.zip-unarchived\\

        If the contents of the folder named “foo\\” is indeed the unarchived contents of the archive file named
        “foo.zip”, then the contents of the folder named “foo.zip-unarchived\\” will be a duplicate of the contents of
        the folder named “foo\\”.
    """
    if len(sys.argv) >= 4:
        passwords = sys.argv[3:]
    else:
        passwords = []
    unarchive_recursively(sys.argv[1], sys.argv[2], passwords)


if __name__ == '__main__':
    _unarchive_recursively_main()


"""
The following is an example of the output of "7z i" that is parsed by the parse_include_extensions() function...

% 7z i

7-Zip [64] 17.03 : Copyright (c) 1999-2020 Igor Pavlov : 2017-08-28
p7zip Version 17.03 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,10 CPUs x64)


Libs:
 0  /usr/local/Cellar/p7zip/17.03/lib/p7zip/7z.dll

Formats:
 0 C   F         7z       7z            7 z BC AF ' 1C
 0               APM      apm           E R
 0               Ar       ar a deb lib  ! < a r c h > 0A
 0               Arj      arj           ` EA
 0 CK            bzip2    bz2 bzip2 tbz2 (.tar) tbz (.tar) B Z h
 0     F         Cab      cab           M S C F 00 00 00 00
 0               Chm      chm chi chq chw I T S F 03 00 00 00 ` 00 00 00
 0     F         Hxs      hxs hxi hxr hxq hxw lit I T O L I T L S 01 00 00 00 ( 00 00 00
 0               Compound msi msp doc xls ppt D0 CF 11 E0 A1 B1 1A E1
 0      M        Cpio     cpio          0 7 0 7 0  ||  C7 q  ||  q C7
 0               CramFS   cramfs        offset=16 C o m p r e s s e d 20 R O M F S
 0       G  B    Dmg      dmg           k o l y 00 00 00 04 00 00 02 00
 0           E   ELF      elf            E L F
 0               Ext      ext ext2 ext3 ext4 img offset=1080 S EF
 0               FAT      fat img       offset=510 U AA
 0               FLV      flv           F L V 01
 0 CK            gzip     gz gzip tgz (.tar) tpz (.tar) apk (.tar) 1F 8B 08
 0               GPT      gpt mbr       offset=512 E F I 20 P A R T 00 00 01 00
 0      M        HFS      hfs hfsx      offset=1024 H + 00 04  ||  H X 00 05
 0        O      IHex     ihex          
 0               Iso      iso img       offset=32769 C D 0 0 1
 0               Lzh      lzh lha       offset=2 - l h
 0  K     O      lzma     lzma          
 0  K            lzma86   lzma86        
 0      M    E   MachO    macho         CE FA ED FE  ||  CF FA ED FE  ||  FE ED FA CE  ||  FE ED FA CF
 0         P     MBR      mbr           
 0               MsLZ     mslz          S Z D D 88 F0 ' 3 A
 0      M        Mub      mub           CA FE BA BE 00 00 00  ||  B9 FA F1 0E
 0     F G       Nsis     nsis          offset=4 EF BE AD DE N u l l s o f t I n s t
 0               NTFS     ntfs img      offset=3 N T F S 20 20 20 20 00
 0           E   PE       exe dll sys   M Z
 0           E   TE       te            V Z
 0               Ppmd     pmd           8F AF AC 84
 0               QCOW     qcow qcow2 qcow2c Q F I FB 00 00 00
 0     F         Rar      rar r00       R a r ! 1A 07 00
 0     F         Rar5     rar r00       R a r ! 1A 07 01 00
 0               Rpm      rpm           ED AB EE DB
 0               Split    001           
 0      M        SquashFS squashfs      h s q s  ||  s q s h  ||  s h s q  ||  q s h s
 0 C    M        SWFc     swf (~.swf)   C W S  ||  Z W S
 0  K            SWF      swf           F W S
 0 C      O   LH tar      tar ova       offset=257 u s t a r
 0        O      Udf      udf iso img   offset=32768 01 C D 0 0 1
 0     FM        UEFIc    scap          BD 86 f ; v 0D 0 @ B7 0E B5 Q 9E / C5 A0  ||  8B A6 < J # w FB H 80 = W 8C C1 FE C4 M  ||  B9 82 91 S B5 AB 91 C B6 9A E3 A9 C F7 / CC
 0     FM        UEFIf    uefif         offset=16 D9 T 93 z h 04 J D 81 CE 0B F6 17 D8 90 DF  ||  x E5 8C 8C = 8A 1C O 99 5 89 a 85 C3 - D3
 0               VDI      vdi           offset=64  10 DA BE
 0       G       VHD      vhd           c o n e c t i x 00 00
 0               VMDK     vmdk          K D M V
 0 C SN       LH wim      wim swm esd ppkg M S W I M 00 00 00
 0               Xar      xar pkg xip   x a r ! 00 1C
 0 CK            xz       xz txz (.tar) FD 7 z X Z 00
 0               Z        z taz (.tar)  1F 9D
 0 C   FMG       zip      zip z01 zipx jar xpi odt ods docx xlsx epub ipa apk appx P K 03 04  ||  P K 05 06  ||  P K 06 06  ||  P K 07 08 P K  ||  P K 0 0 P K
 0 CK            zstd     zst tzstd (.tar) 0 x F D 2 F B 5 2 2 . . 2 8 00
 0 CK            lz4      lz4 tlz4 (.tar) 0 x 1 8 4 D 2 2 0 4 00
 0 CK            lz5      lz5 tlz5 (.tar) 0 x 1 8 4 D 2 2 0 5 00
 0 CK            lizard   liz tliz (.tar) 0 x 1 8 4 D 2 2 0 6 00

Codecs:
 0  ED    40202 BZip2
 0 4ED  303011B BCJ2
 0  ED  3030103 BCJ
 0  ED  3030205 PPC
 0  ED  3030401 IA64
 0  ED  3030501 ARM
 0  ED  3030701 ARMT
 0  ED  3030805 SPARC
 0  ED    20302 Swap2
 0  ED    20304 Swap4
 0  ED        0 Copy
 0  ED    40109 Deflate64
 0  ED    40108 Deflate
 0  ED        3 Delta
 0  ED       21 LZMA2
 0  ED       21 FLZMA2
 0  ED    30101 LZMA
 0  ED    30401 PPMD
 0  ED  6F10701 7zAES
 0  ED  6F00181 AES256CBC
 0  ED  4F71101 ZSTD
 0  ED  4F71104 LZ4
 0  ED  4F71102 BROTLI
 0  ED  4F71105 LZ5
 0  ED  4F71106 LIZARD

Hashers:
 0   32      202 BLAKE2sp
 0    4        1 CRC32
 0   20      201 SHA1
 0   32        A SHA256
 0    8        4 CRC64
 0   16      205 MD2
 0   16      206 MD4
 0   16      207 MD5
 0   48      208 SHA384
 0   64      209 SHA512
 0    4      203 XXH32
 0    8      204 XXH64
"""