Check Reference

This page describes each of the checks that pydistcheck performs. The section headings correspond to the error codes printed in pydistcheck’s output.

compiled-objects-have-debug-symbols

The distribution contains compiled objects, like C/C++ shared libraries, with debug symbols.

Compilers for languages like C, C++, Fortran, and Rust can optionally include additional information like source code file names and line numbers, and other information useful for printing stack traces or enabling interactive debugging.

The inclusion of such information can increase the size of built objects substantially. It’s pydistcheck’s position that the inclusion of such debug symbols in a shared library distributed as part of Python wheel is rarely desirable, and that by default wheels shouldn’t include that type of information.

This check attempts to run the following tools with subprocess.run().

  • dsymutil

  • llvm-nm

  • llvm-objdump

  • nm

  • objdump

  • readelf

Installing more of these in the environment where you run pydistcheck improves its ability to detect debug symbols.

Warning

If pydistcheck invoking these other tools with subprocess.run() is a concern for you (for example, if it causes permissions-related issues), turn this check off by passing it to --ignore.

For a LOT more information about this topic, see these discussions in other open source projects.

And these other resources.

distro-too-large-compressed

The package distribution is larger (compressed) than the allowed size.

Change that limit using configuration option max-distro-size-compressed.

The compressed size of the distribution affects the following:

  • download speed and bandwidth usage

  • storage footprint
    • including for package registries like PyPI

For example, as of this writing (May 2024), PyPI placed the following restrictions on projects by default:

  • files: < 100 MiB (link)

  • projects (sum of all files): < 10 GiB (link)

For reference, https://pypi.org/stats/ displays the total size (compressed) of some of the largest projects on PyPI.

distro-too-large-uncompressed

The package distribution is larger (uncompressed) than the allowed size.

Change that limit using configuration option max-distro-size-uncompressed.

The uncompressed size of the distribution affects the following:

  • installation time

  • storage footprint for installed packages
    • including in things like VM and Docker images

This can especially matter in storage-constrained environments.

For example, several cloud function-as-a-service services allow uploading additional Python packages for use in function execution, with the following limits on their uncompressed size:

For a thorough discussion of some issues caused by larger distribution size, see “FEEDBACK: PyArrow as a required dependency and PyArrow backed strings “ (pandas-dev/pandas#54466).

expected-files

The package distribution does not contain a file or directory that it was expected to contain.

This can be used to test that changes to MANIFEST.in, package_data, and similar don’t accidentally result in the exclusion of any expected files.

files-only-differ-by-case

The package distribution contains filepaths which are identical after lowercasing.

Such paths are not portable, as some filesystems (notably macOS), are case-insensitive.

mixed-file-extensions

Filepaths in the package distribution use a mix of file extensions for the same type of file.

For example, some_file.yaml and other_file.yml.

Some programs may use file extensions, instead of more reliable mechanisms like magic bytes to detect file types, like this:

if filepath.endswith(".yaml"):
    x = yaml.safe_load(filepath)

In such cases, having a mix of file extensions can lead to only a subset of relevant files being matched.

Standardizing on a single extension for files of the same type improves the probability of either catching or completely avoiding such bugs… either all intended files will be matched or none will.

path-contains-non-ascii-characters

At least one filepath in the package distribution contains non-ASCII characters.

Non-ASCII characters are not portable, and their inclusion in filepaths can lead to installation and usage issues on different platforms.

For more information, see:

path-contains-spaces

At least one filepath in the package distribution contains spaces.

Filepaths with spaces require special treatment, like quoting in some settings. Avoiding paths with spaces eliminates a whole class of potential issues related to software that doesn’t handle such paths well.

For more information, see:

path-too-long

A file or directory in the distribution has a path that has too many characters.

Some operating systems have limits on path lengths, and distributions with longer paths might not be installable on those systems.

By default, pydistcheck reports this check failure if it detects any paths longer than 200 characters. This is primarily informed by the following limitations:

  • many Windows systems limit the total filepath length (excluding drive specifiers like C://) to 256 characters

  • some older tar implementations will not support paths longer than 256 characters

See below for details.

R CMD check source code:

“Tarballs are only required to store paths of up to 100 bytes and cannot store those of more than 256 bytes”.

“Package Structure” (Writing R Extensions):

“…packages are normally distributed as tarballs, and these have a limit on path lengths: for maximal portability 100 bytes.”

“Removing the Max Path Limitation” (Python Windows docs):

“Windows historically has limited path lengths to 260 characters. This meant that paths longer than this would not resolve and errors would result. In the latest versions of Windows, this limitation can be expanded to approximately 32,000 characters. Your administrator will need to activate the "Enable Win32 long paths" group policy, or set LongPathsEnabled to 1 in the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem.

This allows the open() function, the os module and most other path functionality to accept and return paths longer than 260 characters.”

Filename too long in Git for Windows (Stack Overflow answer):

“Git has a limit of 4096 characters for a filename, except on Windows when Git is compiled with msys. It uses an older version of the Windows API and there’s a limit of 260 characters for a filename.

You can circumvent this by using another Git client on Windows or set core.longpaths to true …”

Other relevant discussions:

too-many-files

The package distribution contains more than the allowed number of files.

This is a very very rough way to detect that unexpected files have been included in a new release of a project.

pydistcheck defaults to raising this error when a distribution has more than 2000 files…a totally arbitrary number chosen by the author.

To change that limit, use configuration option max-allowed-files.

unexpected-files

Files were found in the distribution which weren’t expected to be included.

With pydistcheck’s default settings, this check raises errors for the inclusion of files that are commonly found in source control during development but are not useful in distributions, like .gitignore.

Which files are “expected” is highly project-specific. See Configuration for a list of the files pydistcheck complains about by default, and for information about how to customize that list.

This can be used to test that changes to MANIFEST.in, package_data, and similar don’t accidentally result in the exclusion of any expected files.