#! /usr/bin/env python3

###############################################################################
#                                                                             #
# Copyright (C) 2017-2019 Edward d'Auvergne                                   #
#                                                                             #
# This file is part of the program relax (http://www.nmr-relax.com).          #
#                                                                             #
# This program is free software: you can redistribute it and/or modify        #
# it under the terms of the GNU General Public License as published by        #
# the Free Software Foundation, either version 3 of the License, or           #
# (at your option) any later version.                                         #
#                                                                             #
# This program is distributed in the hope that it will be useful,             #
# but WITHOUT ANY WARRANTY; without even the implied warranty of              #
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the               #
# GNU General Public License for more details.                                #
#                                                                             #
# You should have received a copy of the GNU General Public License           #
# along with this program.  If not, see <http://www.gnu.org/licenses/>.       #
#                                                                             #
###############################################################################

"""Recursively check all files for FSF copyright notice compliance.

This standard is from https://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html, and
reproduced here for a permanent record:

6.5 Copyright Notices
=====================

You should maintain a proper copyright notice and a license notice in each nontrivial file in the
package. (Any file more than ten lines long is nontrivial for this purpose.) This includes header
files and interface definitions for building or running the program, documentation files, and any
supporting files. If a file has been explicitly placed in the public domain, then instead of a
copyright notice, it should have a notice saying explicitly that it is in the public domain.

Even image files and sound files should contain copyright notices and license notices, if their
format permits. Some formats do not have room for textual annotations; for these files, state the
copyright and copying permissions in a README file in the same directory.

Change log files should have a copyright notice and license notice at the end, since new material is
added at the beginning but the end remains the end.

When a file is automatically generated from some other file in the distribution, it is useful for
the automatic procedure to copy the copyright notice and permission notice of the file it is
generated from, if possible. Alternatively, put a notice at the beginning saying which file it is
generated from.

A copyright notice looks like this:

Copyright (C) year1, year2, year3 copyright-holder

The word 'Copyright' must always be in English, by international convention.

The copyright-holder may be the Free Software Foundation, Inc., or someone else; you should know who
is the copyright holder for your package.

Replace the '(C)' with a C-in-a-circle symbol if it is available. For example, use '@copyright{}' in
a Texinfo file. However, stick with parenthesized 'C' unless you know that C-in-a-circle will work.
For example, a program's standard --version message should use parenthesized 'C' by default, though
message translations may use C-in-a-circle in locales where that symbol is known to work.
Alternatively, the '(C)' or C-in-a-circle can be omitted entirely; the word 'Copyright' suffices.

To update the list of year numbers, add each year in which you have made nontrivial changes to the
package. (Here we assume you're using a publicly accessible revision control server, so that every
revision installed is also immediately and automatically published.) When you add the new year, it
is not required to keep track of which files have seen significant changes in the new year and which
have not. It is recommended and simpler to add the new year to all files in the package, and be done
with it for the rest of the year.

Don't delete old year numbers, though; they are significant since they indicate when older versions
might theoretically go into the public domain, if the movie companies don't continue buying laws to
further extend copyright. If you copy a file into the package from some other program, keep the
copyright years that come with the file.

You can use a range ('2008-2010') instead of listing individual years ('2008, 2009, 2010') if and
only if: 1) every year in the range, inclusive, really is a "copyrightable" year that would be
listed individually; and 2) you make an explicit statement in a README file about this usage.

For files which are regularly copied from another project (such as 'gnulib'), leave the copyright
notice as it is in the original.

The copyright statement may be split across multiple lines, both in source files and in any
generated output. This often happens for files with a long history, having many different years of
publication.

For an FSF-copyrighted package, if you have followed the procedures to obtain legal papers, each
file should have just one copyright holder: the Free Software Foundation, Inc. You should edit the
file's copyright notice to list that name and only that name.

But if contributors are not all assigning their copyrights to a single copyright holder, it can
easily happen that one file has several copyright holders. Each contributor of nontrivial text is a
copyright holder.

In that case, you should always include a copyright notice in the name of main copyright holder of
the file. You can also include copyright notices for other copyright holders as well, and this is a
good idea for those who have contributed a large amount and for those who specifically ask for
notices in their names. (Sometimes the license on code that you copy in may require preserving
certain copyright notices.) But you don't have to include a notice for everyone who contributed to
the file (which would be rather inconvenient).

Sometimes a program has an overall copyright notice that refers to the whole program. It might be in
the README file, or it might be displayed when the program starts up. This copyright notice should
mention the year of completion of the most recent major version; it can mention years of completion
of previous major versions, but that is optional.


SVN archive
===========

Both the svn and git histories for relax are incomplete.  For example the git repository finds
history within branches that the svn history misses.  And the git history is quite wrong for a
number of files.  Therefore both a svn repository and git repository should be used with this
script.

Firstly download the whole SVN archive repository with:

$ rsync -av --delete svn.code.sf.net::p/nmr-relax/code-svn-archive/ relax_sf_svn_repo

Then check out a local copy:

$ svn co file://$PWD/relax_sf_svn_repo/trunk relax_sf_svn_archive

Having a copy of the entire repository on a local hard disk allows this script to complete within a
reasonable time frame.
"""

# Python module imports.
from argparse import Action, ArgumentParser
import bz2
from datetime import date, datetime
import gzip
from importlib.machinery import SourceFileLoader
import locale
import mimetypes
from os import F_OK, access, getcwd, path, sep, walk
import platform
from pytz import utc
from re import search
from subprocess import PIPE, Popen
import sys
from types import ModuleType

# The operating system.
SYSTEM = platform.uname()[0]
if SYSTEM == 'Microsoft':
    SYSTEM == 'Windows'


BLANK_CONFIG = """\
# Module docstring.
\"\"\"Configuration file for the FSF Copyright Notice Validation script.

This configuration file uses the concept of a commit ID, which is generated as the first line of the commit message followed by the ISO date in brackets.
\"\"\"

# The significant number of new lines of code added.
SIG_CODE = 10

# The repository checkout copies, to allow for repository migrations, ordered by date from oldest to newest.
# The data consists of:
#       0 - The repository path or committer information file path.
#       1 - The repository type (either "svn", "git", or "committer_info").
#       2 - The start date (year).
#       3 - The end date (year).
#       4 - The optional HEAD directory for svn.
#       5 - Flag which if True indicates a truncated start (so don't add the first commit).
# Type:  list of [str, str, int, int, str or None, bool]
REPOS = [
]

# README file creation variables, for appending copyright notices to README files.
README_APPEND_NOTICE = False
README_COMMITTER = ""

# The committer name translation table.
# This is for mapping the name in the copyright notice to the committers real name (e.g. handling non-ASCII characters, or different naming conventions).
# Desc:  The key is the internally consistent name and the value is the name of the committer in the repository.
# Type:  dict of str
COMMITTERS = {
}

# The svn committer name translation table.
# Desc:  The key is the svn repository committer name and the value is the internally consistent name of the committer.
# Type:  dict of str
SVN_COMMITTERS = {
}

# Alternative names for the committers.
# Desc:  The key is the alternative committer name in the copyright notice and the value is the name of the committer in the repository.
# Type:  dict of str
COMMITTER_ALT = {
}

# Blacklisted files to avoid checking.
# Type:  list of str
BLACKLISTED_FILES = [
]

# Directories to skip.
# Type:  list of str
DIR_SKIP = [
    '.git',
    '.svn',
]

# Add some new mimetypes.
# Desc:  The list elements consist of the mimetype name and the file extension.
# Type:  list of [str, str]
NEW_MIMETYPES = [
    ['application/numpy', '.npy'],
]

# Specify binary mimetypes.
# Type:  list of str.
BINARY_MIMETYPES = [
]

# Binary files (for those without a mimetype or extension).
# Desc:  The values are the file names.
# Type:  list of str
BINARY_FILES = [
]

# Stop incorrect svn history by specifying the first commit key of a file (i.e. svn copy but then a complete file replacement).
# Desc:  The key is the file name and the value is the commit ID.
# Type:  dict of str
SVN_START = {
}

# Stop incorrect git history by specifying the first commit key of a misidentified file.
# Desc:  The key is the file name and the value is the commit ID.
# Type:  dict of str
GIT_START = {
}

# Additional copyright notices that are not present in the git log.
# Desc:  The key is the file and the value is a list of copyright statements.
# Type:  dict of list of str
ADDITIONAL_COPYRIGHT = {
}

# Additional copyright years and authors to add to the list.
# Desc:  The key is the file and the value is a list of lists of the year as an int (or list of ints for multiple years) and the author name as a string.
# Type:  dict of list of [int or list of int, str]
ADDITIONAL_COPYRIGHT_YEARS = {
}

# False positives (copyright notices in files to ignore, as they are not in the git log).
# Desc:  The key is the file and the value is the list of copyright statements to ignore.
# Type:  dict of list of str
FALSE_POS = {
}

# False negatives (significant git log commits which do not imply copyright ownership).
# Desc:  The key is the file and the value is the copyright statement.
# Type:  dict of str
FALSE_NEG = {
}

# False negatives (significant git log commits which do not imply copyright ownership).
# Desc:  The key is the file and the value is a list of lists of the year as an int and the author name as a string.
# Type:  dict of list of [int, str]
FALSE_NEG_YEARS = {
}

# Commits to exclude as a list of commit IDs.
# Desc:  The list items are the commit IDs.
# Type:  list of str
EXCLUDE = [
]

# Commits to switch authorship of (e.g. if someone commits someone else's code).
# The data consists of:
#       0 - The comitter's name.
#       1 - The real author.
#       2 - The commit key, consisting of the first line of the commit message followed by the ISO date in brackets.
# Type:  list of [str, str, str]
AUTHOR_SWITCH = [
]
"""


class Validate:
    """Execute the FSF copyright notice validation."""

    def __init__(self):
        """Run the validation."""

        # Initialise a successful status.
        self.status = 0

        # Process the command line options.
        self.process_args()

        # The blank configuration file argument.
        if self.args.blank:
            sys.stdout.write(BLANK_CONFIG)
            return

        # Handle files as arguments.
        directory = self.args.directory
        file_arg = None
        if path.isfile(directory):
            directory, file_arg = path.split(directory)
        if directory in [None, '']:
            directory = '.'

        # Parse the configuration file, and perform the subsequent set up steps.
        self.parse_config(config_file=self.args.config)
        if not len(self.config.REPOS):
            self.setup_repo()
        self.setup_mimetypes()

        # Check for a VC repository.
        repo_flag = True
        if not len(self.config.REPOS):
            repo_flag = False

        # Initial printout.
        if self.args.committer_info:
            sys.stdout.write("committer_info = {\n")
        else:
            if file_arg:
                sys.stdout.write("\nFSF copyright notice compliance checking for the file '%s%s%s'.\n" % (directory, sep, file_arg))
            else:
                sys.stdout.write("\nFSF copyright notice compliance checking for the directory '%s'.\n" % directory)
            if not repo_flag:
                sys.stdout.write("No version control repository found, only checking for missing copyrights!\n")
            sys.stdout.write("\n")
        sys.stdout.write("\nRepository information:\n\n")
        rule = "-" * 112 + "\n"
        sys.stdout.write(rule)
        sys.stdout.write("%-50s %-15s %-10s %-10s %-10s %-12s\n" % ("Path", "Type", "Start", "End", "SVN HEAD", "Trunc start"))
        sys.stdout.write(rule)
        for i in range(len(self.config.REPOS)):
            sys.stdout.write("%-50s %-15s %-10s %-10s %-10s %-10s\n" % tuple(self.config.REPOS[i]))
        sys.stdout.write(rule)
        sys.stdout.write("\n\n")
        sys.stdout.flush()

        # Pre-saved committer information.
        presaved_committer_info = []
        for i in range(len(self.config.REPOS)):
            # Unpack.
            repo_path, repo_type, repo_start, repo_end, repo_head, trunc_flag = self.config.REPOS[i]

            # Not pre-saved.
            if repo_type != "committer_info":
                presaved_committer_info.append(None)
                continue

            # Read and store the data.
            file = self.open_read_file(repo_path)
            _locals = locals()
            exec(file.read(), {}, _locals)
            presaved_committer_info.append(_locals["committer_info"])
            file.close()

        # Counters.
        files_total = 0
        files_blacklisted = 0
        files_untracked = 0
        files_valid = 0
        files_missing = 0
        files_nonvalid = 0

        # Walk through the current dir, alphabetically.
        for root, dirs, files in walk(directory):
            dirs.sort()

            # Single file argument.
            if file_arg and directory != root:
                continue

            # Directory skip.
            skip = False
            for name in self.config.DIR_SKIP:
                if name in root:
                    skip = True
                    break
            if skip:
                continue

            # Validate any copyright statements in the README file, if present.
            self.validate_readme(root)

            # Loop over the files.
            files.sort()
            for file_name in files:
                # Command line argument supplied file.
                if file_arg and file_name != file_arg:
                    continue

                # Count the file.
                files_total += 1
                if not self.args.verbose and not self.args.committer_info:
                    self.progress_meter(files_total)

                # Full path to the file.
                if root[-1] == sep:
                    file_path = root + file_name
                else:
                    file_path = root + sep + file_name

                # Convert MS Windows path separators.
                file_path = file_path.replace("\\", "/")

                # Strip any './' characters from the start.
                if len(file_path) >= 2 and file_path[:2] == './':
                    file_path = file_path[2:]

                # Blacklisted file.
                if file_path in self.config.BLACKLISTED_FILES:
                    if self.args.debug:
                        sys.stdout.write("Blacklisted file: %s" % file_path)
                    files_blacklisted += 1
                    continue

                # Determine the file type.
                type, encoding = mimetypes.guess_type(file_path)
                if self.args.verbose:
                    sys.stdout.write("Checking: %s (mimetype = '%s')\n" % (file_path, type))
                    sys.stdout.flush()

                # Check for untracked files.
                if repo_flag:
                    if self.config.REPOS[-1][1] == 'git':
                        cmd = 'git ls-files --error-unmatch "%s"' % file_path
                        if SYSTEM == "Windows":
                            cmd += "&& echo %%ERRORLEVEL%%"
                        else:
                            cmd += "; echo $?"
                        pipe = Popen(cmd, shell=True, stderr=PIPE, stdout=PIPE, close_fds=False)
                    else:
                        pipe = Popen("svn info \"%s/%s/%s\"" % (self.config.REPOS[-1][0], self.config.REPOS[-1][4], file_path), shell=True, stderr=PIPE, stdout=PIPE, close_fds=False)
                    err = pipe.stderr.readlines()
                    if err:
                        if self.args.verbose:
                            sys.stdout.write("    Untracked file.\n")
                            sys.stdout.flush()
                        files_untracked += 1
                        continue

                # Public domain files.
                if self.extract_public_domain_readme(file_name, root):
                    files_valid += 1
                    continue

                # Get the committer and year information from the repository logs or stored committer info file.
                committer_info = {}
                for i in range(len(self.config.REPOS)):
                    # Unpack.
                    repo_path, repo_type, repo_start, repo_end, repo_head, trunc_flag = self.config.REPOS[i]

                    # Find renames in later repositories.
                    file_path_rename = file_path
                    if i < (len(self.config.REPOS) - 1):
                        if self.config.REPOS[i+1][1] == 'git':
                            start_commit = self.config.GIT_START
                        else:
                            start_commit = self.config.SVN_START
                        file_path_rename = self.find_renames(file_path, repo_path=self.config.REPOS[i+1][0], repo_type=self.config.REPOS[i+1][1], start_commit=start_commit, after=repo_end, before=self.config.REPOS[i+1][3])

                    # Git history.
                    if repo_type == 'git':
                        self.git_log_data(file_path_rename, repo_path=repo_path, exclude=self.config.EXCLUDE, start_commit=self.config.GIT_START, author_switch=self.config.AUTHOR_SWITCH, committer_info=committer_info, after=repo_start, before=repo_end, trunc=trunc_flag)

                    # SVN history.
                    elif repo_type == 'svn':
                        self.svn_log_data(file_path_rename, repo_path=repo_path, exclude=self.config.EXCLUDE, start_commit=self.config.SVN_START, author_switch=self.config.AUTHOR_SWITCH, svn_head=repo_head, committer_info=committer_info, after=repo_start, before=repo_end, trunc=trunc_flag)

                    # Pre-saved committer information file.
                    elif repo_type == 'committer_info':
                        if file_path_rename in presaved_committer_info[i]:
                            for committer in presaved_committer_info[i][file_path_rename]:
                                # A new committer.
                                if committer not in committer_info:
                                    committer_info[committer] = []

                                # Loop over the years.
                                for year in presaved_committer_info[i][file_path_rename][committer]:
                                    if year not in committer_info[committer]:
                                        committer_info[committer].append(year)

                    # Config error.
                    else:
                        raise NameError("Unknown repository type '%s'." % repo_type)

                self.committer_info_cleanup(file_path_rename, committer_info)

                # Output the committer info.
                if self.args.committer_info:
                    # Sort the year for easier parsing and compression.
                    for author in committer_info:
                        committer_info[author].sort()

                    # Output the info and skip the rest.
                    sys.stdout.write("    \"%s\": %s,\n" % (file_path_rename, committer_info))
                    sys.stdout.flush()
                    continue

                # Add any additional committer years.
                if file_path in self.config.ADDITIONAL_COPYRIGHT_YEARS:
                    for years, committer in self.config.ADDITIONAL_COPYRIGHT_YEARS[file_path]:
                        if not committer in committer_info:
                            committer_info[committer] = []
                        if isinstance(years, int):
                            years = [years]
                        for year in years:
                            if year not in committer_info[committer]:
                                if self.args.debug:
                                    sys.stdout.write("  Additional year: %s %s\n" % (committer, year))
                                committer_info[committer].append(year)

                # Remove false negative years.
                if file_path in self.config.FALSE_NEG_YEARS:
                    for year, committer in self.config.FALSE_NEG_YEARS[file_path]:
                        if committer in committer_info and year in committer_info[committer]:
                            if self.args.debug:
                                sys.stdout.write("  False negative year: %s %s\n" % (committer, year))
                            committer_info[committer].pop(committer_info[committer].index(year))
                            if not len(committer_info[committer]):
                                del committer_info[committer]

                # Format the data as copyright statements.
                expected_copyright = self.format_copyright(committer_info)

                # Search for missing copyright notices in local README files.
                recorded_copyright = self.extract_copyright_readme(file_name, root)

                # Otherwise parse text files for the current copyright statements.
                if not len(recorded_copyright) and type not in self.config.BINARY_MIMETYPES and file_path not in self.config.BINARY_FILES:
                    recorded_copyright = self.extract_copyright(file_path)

                # Add any additional copyright notices.
                if file_path in self.config.ADDITIONAL_COPYRIGHT:
                    for notice in self.config.ADDITIONAL_COPYRIGHT[file_path]:
                        if self.args.debug:
                            sys.stdout.write("  Additional copyright: '%s'\n" % notice)
                        expected_copyright.append(notice)

                # Remove false positives and negatives.
                if file_path in self.config.FALSE_POS:
                    for i in range(len(self.config.FALSE_POS[file_path])):
                        for j in reversed(range(len(recorded_copyright))):
                            if self.config.FALSE_POS[file_path][i] in recorded_copyright[j]:
                                if self.args.debug:
                                    sys.stdout.write("  False positive: '%s'\n" % recorded_copyright[j])
                                recorded_copyright.pop(j)
                if file_path in self.config.FALSE_NEG:
                    for i in range(len(self.config.FALSE_NEG[file_path])):
                        for j in reversed(range(len(expected_copyright))):
                            if self.config.FALSE_NEG[file_path][i] in expected_copyright[j]:
                                if self.args.debug:
                                    sys.stdout.write("  False negative: '%s'\n" % recorded_copyright[j])
                                expected_copyright.pop(j)

                # Remove duplicates and sort the lists.
                expected_copyright = list(set(expected_copyright))
                recorded_copyright = list(set(recorded_copyright))
                expected_copyright.sort()
                recorded_copyright.sort()
                if self.args.debug:
                    for i in range(len(expected_copyright)):
                        sys.stdout.write("  Expected copyright: '%s'\n" % expected_copyright[i])
                    for i in range(len(recorded_copyright)):
                        sys.stdout.write("  Recorded copyright: '%s'\n" % recorded_copyright[i])

                # Clear the progress meter.
                if not self.args.verbose:
                    sys.stderr.write("\b")
                    sys.stderr.flush()

                # Missing copyright notices.
                if not len(recorded_copyright) and len(expected_copyright):
                    # Failure printout.
                    sys.stdout.write("Missing copyright notice: '%s'\n" % file_path)
                    if repo_flag:
                        sys.stdout.write("Expected copyrights:\n")
                        for i in range(len(expected_copyright)):
                            sys.stdout.write("    %s\n" % expected_copyright[i])
                        sys.stdout.write("\n")
                        sys.stdout.flush()

                    # Skip the rest of the validation process.
                    files_missing += 1
                    continue

                # If the missing command line argument has been supplied, mark all other files as valid.
                elif self.args.missing:
                    files_valid += 1
                    continue

                # No repository.
                if not repo_flag:
                    files_valid += 1
                    continue

                # Validate.
                if self.validate_copyright(expected_copyright, recorded_copyright):
                    files_valid += 1
                    continue

                # README file copyright notice addition.
                if self.config.README_APPEND_NOTICE:
                    # Prepare the README file, if necessary.
                    readme = root + sep + 'README'
                    self.readme_setup(file=readme)

                    # Add the copyright.
                    self.readme_add_notice(file_name=file_name, file=readme, notices=expected_copyright)

                    # Skip the failure printout.
                    files_valid += 1
                    continue

                # A non-valid file.
                files_nonvalid += 1

                # Failure printout.
                sys.stdout.write("File: '%s'\n" % file_path)
                sys.stdout.write("Expected non-matching copyrights:\n")
                for i in range(len(expected_copyright)):
                    if expected_copyright[i] not in recorded_copyright:
                        sys.stdout.write("    %s\n" % expected_copyright[i])
                sys.stdout.write("Recorded non-matching copyrights:\n")
                for i in range(len(recorded_copyright)):
                    if recorded_copyright[i] not in expected_copyright:
                        sys.stdout.write("    %s\n" % recorded_copyright[i])
                sys.stdout.write("\n")
                sys.stdout.flush()

        # Statistics.
        valid_sum = files_valid + files_missing + files_nonvalid
        percent_missing = 0
        if valid_sum:
            percent_missing = files_missing / valid_sum * 100
        percent_nonvalid = 0
        if valid_sum:
            percent_nonvalid = files_nonvalid / valid_sum * 100

        # Final printout.
        if self.args.committer_info:
            sys.stdout.write("}\n")
        else:
            sys.stdout.write("\n\nStatistics:\n\n")
            sys.stdout.write("    %-35s %8i\n" % ("All files:", files_total))
            sys.stdout.write("    %-35s %8i\n" % ("Blacklisted files:", files_blacklisted))
            if repo_flag:
                sys.stdout.write("    %-35s %8i\n" % ("Untracked files:", files_untracked))
            sys.stdout.write("\n")
            sys.stdout.write("    %-35s %8i\n" % ("Validated file count:", valid_sum))
            sys.stdout.write("    %-35s %8i %8.2f%%\n" % ("Missing copyright notices:", files_missing, percent_missing))
            if repo_flag and not self.args.missing:
                sys.stdout.write("    %-35s %8i %8.2f%%\n" % ("Non-matching copyright notices:", files_nonvalid, percent_nonvalid))

        # Store the status for returning to the shell.
        self.status = files_missing + files_nonvalid


    def committer_info_cleanup(self, file_path, committer_info):
        """Clean up the committer info data structure.

        @param file_path:       The full file path.
        @type file_path:        str
        @param committer_info:  The committer info data structure, listing the committers and years of significant commits.  This is a dictionary with the committer's name as a key with the value as the list of years.
        @type committer_info:   dict of lists of str
        """

        # Remove committers with no commits.
        prune = []
        for committer in committer_info:
            if len(committer_info[committer]) == 0:
                prune.append(committer)
        for committer in prune:
            del committer_info[committer]


    def determine_compression(self, file_path):
        """Function for determining the compression type, and for also testing if the file exists.

        @param file_path:   The full file path of the file.
        @type file_path:    str
        @return:            A tuple of the compression type and full path of the file (including its extension).  A value of 0 corresponds to no compression.  Bzip2 compression corresponds to a value of 1.  Gzip compression corresponds to a value of 2.
        @rtype:             (int, str)
        """

        # The file has been supplied without its compression extension.
        if access(file_path, F_OK):
            compress_type = 0
            if search('.bz2$', file_path):
                compress_type = 1
            elif search('.gz$', file_path):
                compress_type = 2

        # The file has been supplied with the '.bz2' extension.
        elif access(file_path + '.bz2', F_OK):
            file_path = file_path + '.bz2'
            compress_type = 1

        # The file has been supplied with the '.gz' extension.
        elif access(file_path + '.gz', F_OK):
            file_path = file_path + '.gz'
            compress_type = 2

        # Return the compression type.
        return compress_type, file_path


    def extract_copyright(self, file_path):
        """Pull out all the copyright notices from the given file.

        @param file_path:   The full file path.
        @type file_path:    str
        @return:            The list of current copyright notices.
        @rtype:             list of str
        """

        # Read the file data, returning nothing if not a text file.
        try:
            file = self.open_read_file(file_path)
            lines = file.readlines()
            file.close()
        except UnicodeDecodeError:
            return []

        # Loop over the file, finding the statements.
        statements = []
        for line in lines:
            lower_line = line.lower()
            if "copyright (c)" in lower_line:
                # Skip README file copyright notices for other files.
                if 'README' in file_path and search(": *copyright \(c\)", lower_line):
                    continue

                # Strip leading and trailing comment characters, and all whitespace.
                line = line.strip()
                if line[0] in ['#', '%', '*']:
                    line = line[1:]
                if line[-1] in ['#', '%', '*']:
                    line = line[:-2]
                if search("^rem", line):
                    line = line[4:]
                line = line.strip()

                # Append the statement.
                statements.append(line)

        # Return the list of copyright statements.
        return statements


    def extract_copyright_readme(self, file_name, root):
        """Try to extract copyright notice for the file from the README file.

        @param file_name:   The isolated file name to search for the copyright notice.
        @type file_name:    str
        @param root:        The file path root which should contain the README file.
        @type root:         str
        @return:            The list of current copyright notices.
        @rtype:             list of str
        """

        # Search for the README file.
        readme = root + sep + 'README'
        if not path.exists(readme):
            return []

        # Read the README file data.
        file = open(readme)
        lines = file.readlines()
        file.close()

        # Loop over the file, finding the statements.
        statements = []
        file_name = file_name.replace('+', '\+')
        for line in lines:
            if search("^%s: " % file_name, line) and "Copyright (C)" in line:
                statements.append(line[line.index("Copyright"):].strip())

        # Return the list of copyright statements.
        return statements


    def extract_public_domain_readme(self, file_name, root):
        """Try to extract public domain information for the file from the README file.

        @param file_name:   The isolated file name to search for the public domain notice.
        @type file_name:    str
        @param root:        The file path root which should contain the README file.
        @type root:         str
        @return:            True if the file is stated as public domain, False otherwise.
        @rtype:             bool
        """

        # Search for the README file.
        readme = root + sep + 'README'
        if not path.exists(readme):
            return []

        # Read the README file data.
        file = open(readme)
        lines = file.readlines()
        file.close()

        # Loop over the file, finding the statements.
        file_name = file_name.replace('+', '\+')
        for line in lines:
            if search("^%s: " % file_name, line) and "Public domain" in line:
                return True

        # Not public domain.
        return False


    def find_renames(self, file_path, repo_path=None, repo_type=None, start_commit=None, after=None, before=None):
        """Search subsequent repositories for file renames.

        @param file_path:           The relative path of the file to track.
        @type file_path:            str
        @keyword repo_path:         The path to the local copy of the VC repository.
        @type repo_path:            str
        @keyword start_commit:      The starting commit for each file, where 'git log' identifies an incorrect history path.  This is a dictionary with the keys being the file paths and the values being the commit keys (the first line of the commit message followed by the ISO date in brackets).
        @type start_commit:         dict of str
        @keyword after:             Show commits more recent than a specific date.
        @type after:                int or None
        @keyword before:            Show commits older than a specific date.
        @type before:               int or None
        @return:                    The original file name.
        @rtype:                     str
        """

        # The full file path.
        full_path = path.join(repo_path, file_path)

        # Date restrictions.
        after_opt = ''
        before_opt = ''
        if after:
            after_opt = '--after=%i-01-01' % after
        if before:
            before_opt = '--before=%i-12-31' % before

        # Exec.
        cmd = ""
        if repo_type == "git":
            cmd = "git log %s %s --numstat --follow --pretty=\"%%an xyzyx %%ad xyzyx %%H xyzyx %%s\" --date=iso \"%s\"" % (after_opt, before_opt, full_path)
        else:
            raise NameError("Repository type '%s' not supported yet." % repo_type)

        pipe = Popen(cmd, shell=True, stdout=PIPE, close_fds=False)

        # Get the data.
        lines = pipe.stdout.readlines()
        i = 0
        committer = None
        commit_key = ''
        history_stop = False
        original_name = None
        while 1:
            # Termination.
            if i >= len(lines):
                break

            # Obtain the committer and date info.
            committer, date, commit_hash, subject = lines[i].decode().split(' xyzyx ')
            year = int(date.split('-')[0])
            commit_key = "%s (%s)" % (subject.strip(), date)

            # Termination.
            if file_path in start_commit and start_commit[file_path] == commit_key:
                history_stop = True
                if self.args.debug:
                    sys.stdout.write("  Finding rename: Terminating to stop false history.  Commit by '%s': %s\n" % (committer, commit_key))
                break

            # The file name.
            original_name = lines[i+2].decode().split("\t")[2].strip()
            if "=>" in original_name:
                original_name = original_name.split("=>")[0].strip()

            # Increment the index.
            i += 3

        # No rename.
        if original_name == None:
            original_name = file_path
        elif self.args.debug and original_name != file_path:
            sys.stdout.write("  Finding rename: Renamed from '%s' to '%s'.\n" % (original_name, file_path))

        # The last file name.
        return original_name


    def format_copyright(self, committer_info):
        """Convert the committer and year data structure into copyright statements.

        @param committer_info:  The committer info data structure, listing the committers and years of significant commits.  This is a dictionary with the committer's name as a key with the value as the list of years.
        @type committer_info:   dict of lists of str
        @return:                The ordered list of copyright statements.
        @rtype:                 list of str
        """

        # Init.
        statements = []

        # Replace alternative names.
        committers = list(committer_info.keys())
        for committer in committers:
            if committer in self.config.COMMITTER_ALT:
                # Add the standard name of the committer if required.
                if self.config.COMMITTER_ALT[committer] not in committer_info:
                    committer_info[self.config.COMMITTER_ALT[committer]] = []

                # Copyright merger.
                committer_info[self.config.COMMITTER_ALT[committer]] += committer_info[committer]

                # Remove the alternative name.
                committer_info.pop(committer)

        # Loop over each committer.
        for committer in committer_info:
            # Format the year string.
            years = self.format_years(committer_info[committer])

            # Format the copyright statement.
            statements.append("Copyright (C) %s %s" % (years, committer))

        # Return the list of copyright statements.
        return statements


    def format_years(self, years):
        """Format the given list of years for the copyright string.

        @param years:   The unordered list of years.
        @type years:    list of str
        """

        # Convert the years to ints and sort the list.
        dates = []
        for i in range(len(years)):
            dates.append(int(years[i]))
        dates.sort()

        # Split the dates into ranges.
        date_ranges = [[dates[0]]]
        for i in range(1, len(dates)):
            if dates[i]-1 == date_ranges[-1][-1]:
                date_ranges[-1].append(dates[i])
            else:
                date_ranges.append([dates[i]])

        # String format the ranges.
        year_string = ''
        for i in range(len(date_ranges)):
            # Range separator required.
            if len(year_string):
                year_string += ','

            # A single year.
            if len(date_ranges[i]) == 1:
                year_string += '%s' % date_ranges[i][0]

            # A range.
            else:
                year_string += '%s-%s' % (date_ranges[i][0], date_ranges[i][-1])

        # Return the formatted string.
        return year_string


    def git_log_data(self, file_path, repo_path=None, exclude=[], start_commit=[], author_switch=[], committer_info={}, after=None, before=None, trunc=False):
        """Get the committers and years of significant commits from the git log.

        @param file_path:           The full file path to obtain the git info for.
        @type file_path:            str
        @keyword repo_path:         The path to the local copy of the git repository.
        @type repo_path:            str
        @keyword exclude:           A list of commit keys to exclude from the search.  The commit key consists of the first line of the commit message followed by the ISO date in brackets.
        @type exclude:              list of str
        @keyword start_commit:      The starting commit for each file, where 'git log' identifies an incorrect history path.  This is a dictionary with the keys being the file paths and the values being the commit keys (the first line of the commit message followed by the ISO date in brackets).
        @type start_commit:         dict of str
        @keyword author_switch:     List of commit keys and authors to switch the authorship of.  The first element should be the comitter, the second the real comitter, and the third the commit key.  The commit key consists of the first line of the commit message followed by the ISO date in brackets.
        @type author_switch:        list of list of str
        @keyword committer_info:    The committer info data structure, listing the committers and years of significant commits.  This is a dictionary with the committer's name as a key with the value as the list of years.
        @type committer_info:       dict of lists of str
        @keyword after:             Show commits more recent than a specific date.
        @type after:                int or None
        @keyword before:            Show commits older than a specific date.
        @type before:               int or None
        @keyword trunc:             A flag which if True indicates a truncated start (so don't add the first commit).
        @type trunc:                bool
        """

        # Init.
        debug_format = "  Git: %-30s By %s: '%s'\n"

        # File check.
        full_path = "%s%s%s" % (repo_path, sep, file_path)
        if not path.exists(full_path):
            sys.stdout.write("Warning, file missing from git: %s\n" % full_path)
            return

        # Date restrictions.
        after_opt = ''
        before_opt = ''
        if after:
            after_opt = '--after=%i-01-01' % after
        if before:
            before_opt = '--before=%i-12-31' % before

        # Exec.
        pipe = Popen("git log %s %s --numstat --follow --pretty=\"%%an xyzyx %%ad xyzyx %%H xyzyx %%s\" --date=iso \"%s\"" % (after_opt, before_opt, full_path), shell=True, stdout=PIPE, close_fds=False)

        # Get the data.
        lines = pipe.stdout.readlines()
        i = 0
        committer = None
        commit_key = ''
        first_committer = None
        first_commit_key = ''
        first_year = None
        history_stop = False
        while 1:
            # Termination.
            if i >= len(lines):
                break
            if file_path in start_commit and start_commit[file_path] == commit_key:
                history_stop = True
                if self.args.debug:
                    sys.stdout.write(debug_format % ("Stopping false history.", committer, commit_key))
                break

            # Obtain the committer and date info.
            committer, date, commit_hash, subject = lines[i].decode().split(' xyzyx ')
            year = int(date.split('-')[0])
            commit_key = "%s (%s)" % (subject.strip(), date)

            # Translate the committer name, if necessary.
            committer = self.translate_committer_name(committer)

            # The next line is a committer, so skip the current line.
            if search(' Ø ', lines[i+1].decode()):
                i += 1
                continue

            # Author switch.
            for j in range(len(author_switch)):
                if author_switch[j][2] == commit_key:
                    committer = self.translate_committer_name(author_switch[j][1])

            # Commits to exclude.
            if commit_key in exclude:
                if self.args.debug:
                    sys.stdout.write(debug_format % ("Excluded commit.", committer, commit_key))
                i += 3
                continue

            # Skip svnmerge.py merges for svn->git migration repositories as these do not imply copyright ownership for the comitter.
            if search("^Merged revisions .* via svnmerge from", subject):
                if self.args.debug:
                    sys.stdout.write(debug_format % ("Skipping svnmerge.py migrated commit.", committer, commit_key))
                i += 3
                continue

            # The numstat info.
            newlines = lines[i+2].decode().split()[0]
            if newlines == '-':
                newlines = 1e10
            else:
                newlines = int(newlines)

            # Not significant.
            if newlines < self.config.SIG_CODE:
                if self.args.debug:
                    sys.stdout.write(debug_format % ("Skipping insignificant commit.", committer, commit_key))
                i += 3
                first_committer = committer
                first_commit_key = commit_key
                first_year = year
                continue

            # Date already exists.
            if committer in committer_info and year in committer_info[committer]:
                i += 3
                if self.args.debug:
                    sys.stdout.write(debug_format % ("Date already exists.", committer, commit_key))
                first_committer = committer
                first_commit_key = commit_key
                first_year = year
                continue

            # Debugging printout.
            if self.args.debug:
                sys.stdout.write(debug_format % ("Adding commit.", committer, commit_key))

            # A new committer.
            if committer not in committer_info:
                committer_info[committer] = []

            # Store the info.
            committer_info[committer].append(year)

            # Record the first commit.
            first_committer = committer
            first_commit_key = commit_key
            first_year = year

            # Increment the index.
            i += 3

        # Add committer info if the history was stopped, and no such info exists.
        if history_stop and first_committer and (first_committer not in committer_info or not len(committer_info[first_committer]) or first_year < min(committer_info[first_committer])):
            if self.args.debug:
                sys.stdout.write(debug_format % ("Adding stopped history commit.", first_committer, first_commit_key))
            if first_committer not in committer_info:
                committer_info[first_committer] = []
            committer_info[first_committer].append(first_year)

        # Always include the very first commit.
        if first_committer:
            # Handle truncated start dates.
            if trunc:
                # If no one else is recorded, then assume this is the real first commit.
                if len(committer_info) == 0:
                    if self.args.debug:
                        sys.stdout.write(debug_format % ("Adding first commit.", first_committer, first_commit_key))
                    committer_info[first_committer] = [first_year]

            # Normal starting date.
            else:
                if first_committer not in committer_info:
                    committer_info[first_committer] = []
                if not len(committer_info[first_committer]) or first_year < min(committer_info[first_committer]):
                    if self.args.debug:
                        sys.stdout.write(debug_format % ("Adding first commit.", first_committer, first_commit_key))
                    committer_info[first_committer].append(first_year)


    def open_read_file(self, file_name=None):
        """Open the file 'file' and return all the data.

        @keyword file_name: The name of the file to extract the data from.
        @type file_name:    str
        @return:            The open file object.
        @rtype:             file object
        """

        # Test if the file exists and determine the compression type.
        compress_type, file_path = self.determine_compression(file_name)

        # Open the file for reading.
        try:
            # Uncompressed text.
            if compress_type == 0:
                file_obj = open(file_path, 'r')

            # Bzip2 compressed text.
            elif compress_type == 1:
                file_obj = bz2.open(file_path, 't')

            # Gzipped compressed text.
            elif compress_type == 2:
                file_obj = gzip.open(file_path, 'rt')

        # Cannot open.
        except IOError:
            message = sys.exc_info()[1]
            raise NameError("Cannot open the file %s.  %s." % (file_path, message.args[1]))

        # Return the opened file.
        return file_obj


    def parse_config(self, config_file=None):
        """Parse the configuration file, dropping back to the defaults if missing.

        @keyword config_file:   The name of the optional configuration file.
        @type config_file:      str or None
        """

        # Firstly, parse the blank config as a module.
        self.config = ModuleType('config')
        exec(BLANK_CONFIG, self.config.__dict__)

        # Nothing left to do.
        if config_file == None:
            return

        # Override the defaults.
        loader = SourceFileLoader('config', config_file)
        self.config = loader.load_module()


    def process_args(self):
        """Process all command line options."""

        # Add script argument parsing.
        parser = ArgumentParser(description="FSF Copyright Notice Validation.")

        # Add the script arguments.
        parser.add_argument('directory', metavar='dir', type=str, nargs='?', default='.', help="The directory to check, defaulting to '.'.")
        parser.add_argument('-c', '--config', metavar='CONFIG_SCRIPT', dest='config', help="The configuration script.")
        parser.add_argument('-m', '--missing', action='store_true', dest='missing', help="Only check for missing copyright notices (files marked as valid may nevertheless have incorrect notices).")
        parser.add_argument('--blank-config', action='store_true', dest='blank', default=False, help="Print out a blank configuration file for use with this program and then quit.")
        parser.add_argument('-v', '--verbose', action='store_true', dest='verbose', default=False, help="Verbose output.")
        parser.add_argument('-d', '--debug', action='store_true', dest='debug', default=False, help="Activate the debugging mode (this will also turn on the verbosity flag).")
        parser.add_argument('--committer-info', action='store_true', dest='committer_info', default=False, help="Output the committer information for the currently configured repositories.  The contents output into a file can be subsequently used in the config file to avoid having to load history from old repositories.")

        # Parse and store the arguments.
        self.args = parser.parse_args()

        # Turn on the verbosity flag in debug mode.
        if self.args.debug:
            self.args.verbose = True


    def progress_meter(self, i, a=1, b=1000, file=sys.stderr):
        """A simple progress write out (which defaults to the terminal STDERR).

        @param i:       The current iteration.
        @type i:        int
        @keyword a:     The step size for spinning the spinner.
        @type a:        int
        @keyword b:     The step size for printing out the progress.
        @type b:        int
        @keyword file:  The file object to write the output to.
        @type file:     file object
        """

        # The spinner characters.
        chars = ['-', '\\', '|', '/']

        # A spinner.
        if i % a == 0:
            file.write('\b%s' % chars[i%4])
            if hasattr(file, 'flush'):
                file.flush()

        # Dump the progress.
        if i % b == 0:
            num = locale.format_string("%d", i, grouping=True)
            file.write('\b%s files validated.\n' % num)


    def readme_add_notice(self, file_name=None, file=None, notices=[]):
        """Add all copyright notices to the README file.

        @param file_name:   The isolated file name to add a copyright notice for.
        @type file_name:    str
        @keyword file:      The full README file path.
        @type file:         str
        @keyword notices:   The list of current copyright notices.
        @type notices:      list of str
        """

        # Append to the file.
        readme = open(file, 'a')

        # Loop over the notices.
        for notice in notices:
            readme.write("%-87s %s\n" % (("%s:" % file_name), notice))


    def readme_setup(self, file=None):
        """Prepare the README file for appending copyright notices.

        @keyword file:  The full README file path.
        @type file:     str
        """

        # Create a new file.
        if not path.exists(file):
            # Open the file.
            readme = open(file, 'w')

            # Add a copyright notice.
            now = datetime.now()
            readme.write("###############################################################################\n")
            readme.write("#                                                                             #\n")
            readme.write("# Copyright (C) %i %-56s #\n" % (now.year, self.config.README_COMMITTER))
            readme.write("#                                                                             #\n")
            readme.write("# This file is part of the program relax (http://www.nmr-relax.com).          #\n")
            readme.write("#                                                                             #\n")
            readme.write("# This program is free software: you can redistribute it and/or modify        #\n")
            readme.write("# it under the terms of the GNU General Public License as published by        #\n")
            readme.write("# the Free Software Foundation, either version 3 of the License, or           #\n")
            readme.write("# (at your option) any later version.                                         #\n")
            readme.write("#                                                                             #\n")
            readme.write("# This program is distributed in the hope that it will be useful,             #\n")
            readme.write("# but WITHOUT ANY WARRANTY; without even the implied warranty of              #\n")
            readme.write("# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the               #\n")
            readme.write("# GNU General Public License for more details.                                #\n")
            readme.write("#                                                                             #\n")
            readme.write("# You should have received a copy of the GNU General Public License           #\n")
            readme.write("# along with this program.  If not, see <http://www.gnu.org/licenses/>.       #\n")
            readme.write("#                                                                             #\n")
            readme.write("###############################################################################\n\n\n")

            # Close the file.
            readme.close()

        # Add the licencing section.
        readme = open(file)
        lines = readme.readlines()
        section = False
        for line in lines:
            if line == "Licensing\n":
                section = True
                break
        if not section:
            readme = open(file, 'a')
            readme.write("Licensing\n")
            readme.write("=========\n\n")
            readme.write("These files are licensed under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version:\n\n")
            readme.close()


    def setup_mimetypes(self):
        """Set up the mimetype handling."""

        # Add any user supplied mimetypes.
        for type, ext in self.config.NEW_MIMETYPES:
            mimetypes.add_type(type, ext)


    def setup_repo(self):
        """If not supplied, set up the repository info based on the current directory."""

        # Determine the repository type.
        type = None
        if path.isdir('.git'):
            type = 'git'
        elif path.isdir('.svn'):
            type = 'svn'

        # Add the current repository.
        if type:
            self.config.REPOS.append(['.', type, None, None, None])


    def svn_log_data(self, file_path, repo_path=None, exclude=[], start_commit=[], author_switch=[], svn_head=None, committer_info={}, after=None, before=None, trunc=False):
        """Get the committers and years of significant commits from the svn log.

        @param file_path:           The full file path to obtain the git info for.
        @type file_path:            str
        @keyword repo_path:         The path to the local copy of the svn repository.
        @type repo_path:            str
        @keyword exclude:           A list of commit keys to exclude from the search.  The commit key consists of the first line of the commit message followed by the ISO date in brackets.
        @type exclude:              list of str
        @keyword start_commit:      The starting commit for each file to exclude incorrectly labelled history (i.e. a svn copy followed by complete file replacement).  This is a dictionary with the keys being the file paths and the values being the commit keys (the first line of the commit message followed by the ISO date in brackets).
        @type start_commit:         dict of str
        @keyword author_switch:     List of commit keys and authors to switch the authorship of.  The first element should be the comitter, the second the real comitter, and the third the commit key.  The commit key consists of the first line of the commit message followed by the ISO date in brackets.
        @type author_switch:        list of list of str
        @keyword svn_head:          The HEAD directory, e.g. "trunk".
        @type svn_head:             str
        @keyword committer_info:    The committer info data structure, listing the committers and years of significant commits.  This is a dictionary with the committer's name as a key with the value as the list of years.
        @type committer_info:       dict of lists of str
        @keyword after:             Show commits more recent than a specific date.
        @type after:                int or None
        @keyword before:            Show commits older than a specific date.
        @type before:               int or None
        @keyword trunc:             A flag which if True indicates a truncated start (so don't add the first commit).
        @type trunc:                bool
        """

        # Init.
        debug_format = "  SVN: %-30s By %s: '%s'\n"

        # File check.
        full_path = "%s%s%s%s%s" % (repo_path, sep, svn_head, sep, file_path)
        if "://" not in full_path and not path.exists(full_path):
            if self.args.verbose:
                sys.stdout.write("Warning, file missing from svn: %s\n" % full_path)
            return

        # Date restrictions.
        date_range = ''
        if after or before:
            date_range += "-r"
        if before:
            date_range += '{%i-12-31}:' % before
        else:
            date_range += '{3000-01-01}:'
        if after:
            date_range += '{%i-01-01}' % after
        else:
            date_range += '{1000-01-01}'

        # Exec.
        pipe = Popen("svn log --diff %s \"%s\"" % (date_range, full_path), shell=True, stdout=PIPE, close_fds=False)

        # Get the data.
        lines = pipe.stdout.readlines()
        for i in range(len(lines)):
            try:
                lines[i] = lines[i].decode()[:-1]
            except UnicodeError:
                # Catch the ascii character 43 ("+").
                if lines[i][0] == 43:
                    lines[i] = "+binary diff"
                else:
                    lines[i] = ""
        i = 0
        committer = None
        commit_key = ''
        first_committer = None
        first_commit_key = ''
        first_year = None
        history_stop = False
        while 1:
            # Termination.
            if i >= len(lines)-1:
                break
            if file_path in start_commit and start_commit[file_path] == commit_key:
                history_stop = True
                if self.args.debug:
                    sys.stdout.write(debug_format % ("Stopping false history.", committer, commit_key))
                break

            # A new commit.
            if search('^------------------------------------------------------------------------$', lines[i]) and lines[i+1][0] == 'r':
                # Move to the summary line.
                i += 1

                # Extract the committer and year.
                rev, svn_committer, date, length = lines[i].split(' | ')
                committer = self.config.SVN_COMMITTERS[svn_committer]
                year = int(date.split()[0].split('-')[0])
                date = date.split(" (")[0]
                date = datetime.strptime(date, '%Y-%m-%d %H:%M:%S %z')
                date = date.astimezone(tz=utc)

                # Translate the committer name, if necessary.
                committer = self.translate_committer_name(committer)

                # Find the diff.
                in_diff = False
                newlines = 0
                msg = ""
                msg_flag = True
                while 1:
                    # Walk down the lines.
                    i += 1

                    # Store the first line of the commit message.
                    if msg_flag and search("^[A-Za-z]", lines[i]):
                        # Store the line.
                        msg += lines[i]
                        msg_flag = False

                        # Search for additional first lines.
                        while 1:
                            # Walk down the lines.
                            i += 1

                            # Termination.
                            if not len(lines[i]):
                                break

                            # Add the line.
                            else:
                                msg += " %s" % lines[i]

                    # End of the diff.
                    if i >= len(lines):
                        break
                    if search('^------------------------------------------------------------------------$', lines[i]) and i < len(lines)-1 and len(lines[i+1]) and lines[i+1][0] == 'r':
                        break

                    # Inside the diff.
                    if search('^===================================================================$', lines[i]):
                        in_diff = True
                        i += 1
                    if not in_diff:
                        continue

                    # Binary diff.
                    if "Cannot display: file marked as a binary type." in lines[i]:
                        newlines = 1000000
                        break

                    # Count the added lines.
                    if len(lines[i]) and lines[i][0] == "+" and lines[i][0:3] != "+++":
                        newlines += 1

                # Create the commit key.
                commit_key = "%s (%s +0000)" % (msg.strip(), date.strftime("%Y-%m-%d %H:%M:%S"))

            # Not a new commit.
            else:
                i += 1
                continue

            # Author switch.
            for j in range(len(author_switch)):
                if author_switch[j][2] == commit_key:
                    committer = self.translate_committer_name(author_switch[j][1])

            # Commits to exclude.
            if commit_key in exclude:
                if self.args.debug:
                    sys.stdout.write(debug_format % ("Excluded commit.", committer, commit_key))
                continue

            # Skip svnmerge commits.
            if search("^Merged revisions .* via svnmerge from", msg):
                if self.args.debug:
                    sys.stdout.write(debug_format % ("Skipping svnmerge.py migrated commit.", committer, commit_key))
                continue

            # No diff found.
            if not in_diff:
                if self.args.debug:
                    sys.stdout.write(debug_format % ("No diff found, skipping.", committer, commit_key))
                first_committer = committer
                first_commit_key = commit_key
                first_year = year
                continue

            # Not significant.
            if newlines < self.config.SIG_CODE:
                if self.args.debug:
                    sys.stdout.write(debug_format % ("Skipping insignificant commit.", committer, commit_key))
                first_committer = committer
                first_commit_key = commit_key
                first_year = year
                continue

            # Date already exists.
            if committer in committer_info and year in committer_info[committer]:
                if self.args.debug:
                    sys.stdout.write(debug_format % ("Date already exists.", committer, commit_key))
                first_committer = committer
                first_commit_key = commit_key
                first_year = year
                continue

            # Debugging printout.
            if self.args.debug:
                sys.stdout.write(debug_format % ("Adding commit.", committer, commit_key))

            # A new committer.
            if committer not in committer_info:
                committer_info[committer] = []

            # Store the info.
            committer_info[committer].append(year)

            # Record the first commit.
            first_committer = committer
            first_commit_key = commit_key
            first_year = year

        # Add committer info if the history was stopped, and no such info exists.
        if history_stop and first_committer and (first_committer not in committer_info or not len(committer_info[first_committer]) or first_year < min(committer_info[first_committer])):
            if self.args.debug:
                sys.stdout.write(debug_format % ("Adding stopped history commit.", first_committer, first_commit_key))
            if first_committer not in committer_info:
                committer_info[first_committer] = []
            committer_info[first_committer].append(first_year)

        # Always include the very first commit.
        if first_committer:
            # Handle truncated start dates.
            if trunc:
                # If no one else is recorded, then assume this is the real first commit.
                if len(committer_info) == 0:
                    if self.args.debug:
                        sys.stdout.write(debug_format % ("Adding first commit.", first_committer, first_commit_key))
                    committer_info[first_committer] = [first_year]

            # Normal starting date.
            else:
                if first_committer not in committer_info:
                    committer_info[first_committer] = []
                if not len(committer_info[first_committer]) or first_year < min(committer_info[first_committer]):
                    if self.args.debug:
                        sys.stdout.write(debug_format % ("Adding first commit.", first_committer, first_commit_key))
                    committer_info[first_committer].append(first_year)


    def translate_committer_name(self, committer):
        """Translate the committer name, if necessary.

        @param committer:   The committer name to translate.
        @type committer:    str
        @return:            The translated name.
        @rtype:             str
        """

        # The name is in the translation table.
        if committer in self.config.COMMITTERS:
            return self.config.COMMITTERS[committer]

        # Or not.
        return committer


    def validate_copyright(self, expected_copyright, recorded_copyright):
        """Check if the expected and recorded copyrights match.

        @param expected_copyright:  The unsorted list of expected copyright notices.
        @type expected_copyright:   list of str
        @param recorded_copyright:  The unsorted list of recorded copyright notices.
        @type recorded_copyright:   list of str
        @return:                    True if the copyright notices match, False otherwise.
        @rtype:                     bool
        """

        # Replace alternative names in the recorded list.
        for i in range(len(recorded_copyright)):
            for alt in self.config.COMMITTER_ALT:
                if search(alt, recorded_copyright[i]):
                    recorded_copyright[i] = recorded_copyright[i].replace(alt, self.config.COMMITTER_ALT[alt])

        # Compare the lists.
        if expected_copyright == recorded_copyright:
            return True
        return False


    def validate_readme(self, root):
        """Check the validity of the copyright notices in the README file.

        @param root:    The path which should contain the README file.
        @type root:     str
        """

        # Search for the README file.
        if root[-1] == sep:
            readme = root + 'README'
        else:
            readme = root + sep + 'README'
        if not path.exists(readme):
            return

        # Printout.
        if self.args.verbose:
            sys.stdout.write("Validating: %s\n" % readme)

        # Read the README file data.
        file = open(readme)
        lines = file.readlines()
        file.close()

        # Loop over the file, finding the statements.
        missing = []
        for line in lines:
            if search(": *Copyright \(C\)", line):
                # Strip out the file.
                file_name = line.split(':')[0]

                # Check if the file exists.
                file_path = root + sep + file_name
                if not path.exists(file_path):
                    missing.append(file_path)

        # Errors.
        if missing:
            sys.stdout.write("Missing files with copyright notices:\n")
            for i in range(len(missing)):
                sys.stdout.write("    %s\n" % missing[i])


# Execute the script.
if __name__ == '__main__':
    sys.exit(Validate().status)