feat(py_wheel): Normalize name and version (#1331) Added the `incompatible_normalize_name` feature flag to normalize the package distribution name according to latest Python packaging standards. Defaults to `False` for the time being. Added the `incompatible_normalize_version` feature flag to normalize the package version according to PEP440 standard. This also adds support for local version specifiers (versions with a `+` in them), in accordance with PEP440. Defaults to `False` for the time being. Instead of following the obsolete PEP 427 escaping procedure for distribution names and versions, we should use the rules specified by https://packaging.python.org/en/latest/specifications (sections "Package name normalization" and "Binary distribution format"). For the versions, this means normalizing them according to PEP 440. Added as feature flags to avoid forcing the user to deal with breaking changes when upgrading `rules_python`: - Distribution names have stronger requirements now: "A valid name consists only of ASCII letters and numbers, period, underscore and hyphen. It must start and end with a letter or number." https://packaging.python.org/en/latest/specifications/name-normalization/ - Versions must be valid PEP 440 version identifiers. Previously versions such as "0.1-2-3" would have been accepted; that is no longer the case. - The file name of generated wheels may have changed, if the distribution name or the version identifier wasn't in normalized form. - The wheelmaker now depends on `packaging.version`, which means the `py_wheel` user now needs to load pip dependencies in their `WORKSPACE.bazel` file: ``` load("@rules_python//python/pip_install:repositories.bzl", "pip_install_dependencies") pip_install_dependencies() ``` Fixes bazelbuild/rules_python#883. Fixes bazelbuild/rules_python#1132. --------- Co-authored-by: Ignas Anikevicius <anikevicius@gmail.com> Co-authored-by: Ignas Anikevicius <240938+aignas@users.noreply.github.com>

commit: 382b6785a57ee428fc0ec367bcb380c6266cab7b [log] [tgz]
author: Christian von Schultz <christian@embedl.com> Thu Oct 05 16:04:09 2023 +0200
committer: GitHub <noreply@github.com> Thu Oct 05 14:04:09 2023 +0000
tree: d16bd11f5d1288311fb4edfc329c2754c3ea0a06
parent: 423c1de345c32d67dfe1e8d43510399ab10dc2c4 [diff]
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 59bdac1..3c421a9 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md

@@ -55,6 +55,14 @@
   authentication against private HTTP hosts serving Python toolchain binaries.
 * `//python:packaging_bzl` added, a `bzl_library` for the Starlark
   files `//python:packaging.bzl` requires.
+* (py_wheel) Added the `incompatible_normalize_name` feature flag to
+  normalize the package distribution name according to latest Python
+  packaging standards. Defaults to `False` for the time being.
+* (py_wheel) Added the `incompatible_normalize_version` feature flag
+  to normalize the package version according to PEP440 standard. This
+  also adds support for local version specifiers (versions with a `+`
+  in them), in accordance with PEP440. Defaults to `False` for the
+  time being.
 
 ### Removed
 

diff --git a/docs/packaging.md b/docs/packaging.md
index 0e8e110..90c66dc 100755
--- a/docs/packaging.md
+++ b/docs/packaging.md

@@ -59,8 +59,9 @@
 <pre>
 py_wheel_rule(<a href="#py_wheel_rule-name">name</a>, <a href="#py_wheel_rule-abi">abi</a>, <a href="#py_wheel_rule-author">author</a>, <a href="#py_wheel_rule-author_email">author_email</a>, <a href="#py_wheel_rule-classifiers">classifiers</a>, <a href="#py_wheel_rule-console_scripts">console_scripts</a>, <a href="#py_wheel_rule-deps">deps</a>,
               <a href="#py_wheel_rule-description_content_type">description_content_type</a>, <a href="#py_wheel_rule-description_file">description_file</a>, <a href="#py_wheel_rule-distribution">distribution</a>, <a href="#py_wheel_rule-entry_points">entry_points</a>,
-              <a href="#py_wheel_rule-extra_distinfo_files">extra_distinfo_files</a>, <a href="#py_wheel_rule-extra_requires">extra_requires</a>, <a href="#py_wheel_rule-homepage">homepage</a>, <a href="#py_wheel_rule-license">license</a>, <a href="#py_wheel_rule-platform">platform</a>, <a href="#py_wheel_rule-project_urls">project_urls</a>,
-              <a href="#py_wheel_rule-python_requires">python_requires</a>, <a href="#py_wheel_rule-python_tag">python_tag</a>, <a href="#py_wheel_rule-requires">requires</a>, <a href="#py_wheel_rule-stamp">stamp</a>, <a href="#py_wheel_rule-strip_path_prefixes">strip_path_prefixes</a>, <a href="#py_wheel_rule-summary">summary</a>, <a href="#py_wheel_rule-version">version</a>)
+              <a href="#py_wheel_rule-extra_distinfo_files">extra_distinfo_files</a>, <a href="#py_wheel_rule-extra_requires">extra_requires</a>, <a href="#py_wheel_rule-homepage">homepage</a>, <a href="#py_wheel_rule-incompatible_normalize_name">incompatible_normalize_name</a>,
+              <a href="#py_wheel_rule-incompatible_normalize_version">incompatible_normalize_version</a>, <a href="#py_wheel_rule-license">license</a>, <a href="#py_wheel_rule-platform">platform</a>, <a href="#py_wheel_rule-project_urls">project_urls</a>, <a href="#py_wheel_rule-python_requires">python_requires</a>,
+              <a href="#py_wheel_rule-python_tag">python_tag</a>, <a href="#py_wheel_rule-requires">requires</a>, <a href="#py_wheel_rule-stamp">stamp</a>, <a href="#py_wheel_rule-strip_path_prefixes">strip_path_prefixes</a>, <a href="#py_wheel_rule-summary">summary</a>, <a href="#py_wheel_rule-version">version</a>)
 </pre>
 
 Internal rule used by the [py_wheel macro](/docs/packaging.md#py_wheel).
@@ -89,6 +90,8 @@
 | <a id="py_wheel_rule-extra_distinfo_files"></a>extra_distinfo_files |  Extra files to add to distinfo directory in the archive.   | <a href="https://bazel.build/rules/lib/dict">Dictionary: Label -> String</a> | optional | <code>{}</code> |
 | <a id="py_wheel_rule-extra_requires"></a>extra_requires |  List of optional requirements for this package   | <a href="https://bazel.build/rules/lib/dict">Dictionary: String -> List of strings</a> | optional | <code>{}</code> |
 | <a id="py_wheel_rule-homepage"></a>homepage |  A string specifying the URL for the package homepage.   | String | optional | <code>""</code> |
+| <a id="py_wheel_rule-incompatible_normalize_name"></a>incompatible_normalize_name |  Normalize the package distribution name according to latest Python packaging standards.<br><br>See https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode and https://packaging.python.org/en/latest/specifications/name-normalization/.<br><br>Apart from the valid names according to the above, we also accept '{' and '}', which may be used as placeholders for stamping.   | Boolean | optional | <code>False</code> |
+| <a id="py_wheel_rule-incompatible_normalize_version"></a>incompatible_normalize_version |  Normalize the package version according to PEP440 standard. With this option set to True, if the user wants to pass any stamp variables, they have to be enclosed in '{}', e.g. '{BUILD_TIMESTAMP}'.   | Boolean | optional | <code>False</code> |
 | <a id="py_wheel_rule-license"></a>license |  A string specifying the license of the package.   | String | optional | <code>""</code> |
 | <a id="py_wheel_rule-platform"></a>platform |  Supported platform. Use 'any' for pure-Python wheel.<br><br>If you have included platform-specific data, such as a .pyd or .so extension module, you will need to specify the platform in standard pip format. If you support multiple platforms, you can define platform constraints, then use a select() to specify the appropriate specifier, eg:<br><br><code> platform = select({     "//platforms:windows_x86_64": "win_amd64",     "//platforms:macos_x86_64": "macosx_10_7_x86_64",     "//platforms:linux_x86_64": "manylinux2014_x86_64", }) </code>   | String | optional | <code>"any"</code> |
 | <a id="py_wheel_rule-project_urls"></a>project_urls |  A string dict specifying additional browsable URLs for the project and corresponding labels, where label is the key and url is the value. e.g <code>{{"Bug Tracker": "http://bitbucket.org/tarek/distribute/issues/"}}</code>   | <a href="https://bazel.build/rules/lib/dict">Dictionary: String -> String</a> | optional | <code>{}</code> |

diff --git a/examples/wheel/BUILD.bazel b/examples/wheel/BUILD.bazel
index f56a41b..81422d3 100644
--- a/examples/wheel/BUILD.bazel
+++ b/examples/wheel/BUILD.bazel

@@ -54,6 +54,8 @@
     testonly = True,  # Set this to verify the generated .dist target doesn't break things
     # Package data. We're building "example_minimal_library-0.0.1-py3-none-any.whl"
     distribution = "example_minimal_library",
+    incompatible_normalize_name = True,
+    incompatible_normalize_version = True,
     python_tag = "py3",
     version = "0.0.1",
     deps = [
@@ -76,6 +78,8 @@
     testonly = True,
     abi = "$(ABI)",
     distribution = "example_minimal_library",
+    incompatible_normalize_name = True,
+    incompatible_normalize_version = True,
     python_tag = "$(PYTHON_TAG)",
     toolchains = ["//examples/wheel:make_variable_tags"],
     version = "$(VERSION)",
@@ -95,6 +99,8 @@
     name = "minimal_with_py_library_with_stamp",
     # Package data. We're building "example_minimal_library-0.0.1-py3-none-any.whl"
     distribution = "example_minimal_library{BUILD_USER}",
+    incompatible_normalize_name = False,
+    incompatible_normalize_version = False,
     python_tag = "py3",
     stamp = 1,
     version = "0.1.{BUILD_TIMESTAMP}",
@@ -123,6 +129,8 @@
     name = "minimal_with_py_package",
     # Package data. We're building "example_minimal_package-0.0.1-py3-none-any.whl"
     distribution = "example_minimal_package",
+    incompatible_normalize_name = True,
+    incompatible_normalize_version = True,
     python_tag = "py3",
     version = "0.0.1",
     deps = [":example_pkg"],
@@ -156,6 +164,8 @@
         "//examples/wheel:README.md": "README",
     },
     homepage = "www.example.com",
+    incompatible_normalize_name = True,
+    incompatible_normalize_version = True,
     license = "Apache 2.0",
     project_urls = {
         "Bug Tracker": "www.example.com/issues",
@@ -177,6 +187,8 @@
     entry_points = {
         "console_scripts": ["main = foo.bar:baz"],
     },
+    incompatible_normalize_name = True,
+    incompatible_normalize_version = True,
     python_tag = "py3",
     strip_path_prefixes = [
         "examples",
@@ -191,6 +203,8 @@
     name = "custom_package_root_multi_prefix",
     # Package data. We're building "custom_custom_package_root_multi_prefix-0.0.1-py3-none-any.whl"
     distribution = "example_custom_package_root_multi_prefix",
+    incompatible_normalize_name = True,
+    incompatible_normalize_version = True,
     python_tag = "py3",
     strip_path_prefixes = [
         "examples/wheel/lib",
@@ -206,6 +220,8 @@
     name = "custom_package_root_multi_prefix_reverse_order",
     # Package data. We're building "custom_custom_package_root_multi_prefix_reverse_order-0.0.1-py3-none-any.whl"
     distribution = "example_custom_package_root_multi_prefix_reverse_order",
+    incompatible_normalize_name = True,
+    incompatible_normalize_version = True,
     python_tag = "py3",
     strip_path_prefixes = [
         "examples/wheel",
@@ -220,6 +236,8 @@
 py_wheel(
     name = "python_requires_in_a_package",
     distribution = "example_python_requires_in_a_package",
+    incompatible_normalize_name = True,
+    incompatible_normalize_version = True,
     python_requires = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*",
     python_tag = "py3",
     version = "0.0.1",
@@ -231,6 +249,8 @@
 py_wheel(
     name = "use_rule_with_dir_in_outs",
     distribution = "use_rule_with_dir_in_outs",
+    incompatible_normalize_name = True,
+    incompatible_normalize_version = True,
     python_tag = "py3",
     version = "0.0.1",
     deps = [
@@ -244,6 +264,8 @@
     name = "python_abi3_binary_wheel",
     abi = "abi3",
     distribution = "example_python_abi3_binary_wheel",
+    incompatible_normalize_name = True,
+    incompatible_normalize_version = True,
     # these platform strings must line up with test_python_abi3_binary_wheel() in wheel_test.py
     platform = select({
         ":aarch64-apple-darwin": "macosx_11_0_arm64",
@@ -258,16 +280,32 @@
 )
 
 py_wheel(
-    name = "filename_escaping",
+    name = "legacy_filename_escaping",
     # Per https://www.python.org/dev/peps/pep-0427/#escaping-and-unicode
     # runs of non-alphanumeric, non-digit symbols should be replaced with a single underscore.
     # Unicode non-ascii letters should *not* be replaced with underscore.
     distribution = "file~~name-escaping",
+    incompatible_normalize_name = False,
+    incompatible_normalize_version = False,
     python_tag = "py3",
     version = "0.0.1-r7",
     deps = [":example_pkg"],
 )
 
+py_wheel(
+    name = "filename_escaping",
+    # Per https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode
+    # runs of "-", "_" and "." should be replaced with a single underscore.
+    # Unicode non-ascii letters aren't allowed according to
+    # https://packaging.python.org/en/latest/specifications/name-normalization/.
+    distribution = "File--Name-Escaping",
+    incompatible_normalize_name = True,
+    incompatible_normalize_version = True,
+    python_tag = "py3",
+    version = "v0.0.1.RC1+ubuntu-r7",
+    deps = [":example_pkg"],
+)
+
 py_test(
     name = "wheel_test",
     srcs = ["wheel_test.py"],
@@ -277,6 +315,7 @@
         ":custom_package_root_multi_prefix_reverse_order",
         ":customized",
         ":filename_escaping",
+        ":legacy_filename_escaping",
         ":minimal_with_py_library",
         ":minimal_with_py_library_with_stamp",
         ":minimal_with_py_package",

diff --git a/examples/wheel/wheel_test.py b/examples/wheel/wheel_test.py
index f51a0ec..671bd8a 100644
--- a/examples/wheel/wheel_test.py
+++ b/examples/wheel/wheel_test.py

@@ -153,13 +153,51 @@
 second = second.main:s""",
             )
 
+    def test_legacy_filename_escaping(self):
+        filename = os.path.join(
+            os.environ['TEST_SRCDIR'],
+            'rules_python',
+            'examples',
+            'wheel',
+            'file_name_escaping-0.0.1_r7-py3-none-any.whl',
+        )
+        with zipfile.ZipFile(filename) as zf:
+            self.assertEquals(
+                zf.namelist(),
+                [
+                    'examples/wheel/lib/data.txt',
+                    'examples/wheel/lib/module_with_data.py',
+                    'examples/wheel/lib/simple_module.py',
+                    'examples/wheel/main.py',
+                    # PEP calls for replacing only in the archive filename.
+                    # Alas setuptools also escapes in the dist-info directory
+                    # name, so let's be compatible.
+                    'file_name_escaping-0.0.1_r7.dist-info/WHEEL',
+                    'file_name_escaping-0.0.1_r7.dist-info/METADATA',
+                    'file_name_escaping-0.0.1_r7.dist-info/RECORD',
+                ],
+            )
+            metadata_contents = zf.read(
+                'file_name_escaping-0.0.1_r7.dist-info/METADATA'
+            )
+            self.assertEquals(
+                metadata_contents,
+                b"""\
+Metadata-Version: 2.1
+Name: file~~name-escaping
+Version: 0.0.1-r7
+
+UNKNOWN
+""",
+            )
+
     def test_filename_escaping(self):
         filename = os.path.join(
             os.environ["TEST_SRCDIR"],
             "rules_python",
             "examples",
             "wheel",
-            "file_name_escaping-0.0.1_r7-py3-none-any.whl",
+            "file_name_escaping-0.0.1rc1+ubuntu.r7-py3-none-any.whl",
         )
         with zipfile.ZipFile(filename) as zf:
             self.assertEqual(
@@ -172,20 +210,20 @@
                     # PEP calls for replacing only in the archive filename.
                     # Alas setuptools also escapes in the dist-info directory
                     # name, so let's be compatible.
-                    "file_name_escaping-0.0.1_r7.dist-info/WHEEL",
-                    "file_name_escaping-0.0.1_r7.dist-info/METADATA",
-                    "file_name_escaping-0.0.1_r7.dist-info/RECORD",
+                    "file_name_escaping-0.0.1rc1+ubuntu.r7.dist-info/WHEEL",
+                    "file_name_escaping-0.0.1rc1+ubuntu.r7.dist-info/METADATA",
+                    "file_name_escaping-0.0.1rc1+ubuntu.r7.dist-info/RECORD",
                 ],
             )
             metadata_contents = zf.read(
-                "file_name_escaping-0.0.1_r7.dist-info/METADATA"
+                "file_name_escaping-0.0.1rc1+ubuntu.r7.dist-info/METADATA"
             )
             self.assertEqual(
                 metadata_contents,
                 b"""\
 Metadata-Version: 2.1
-Name: file~~name-escaping
-Version: 0.0.1-r7
+Name: File--Name-Escaping
+Version: 0.0.1rc1+ubuntu.r7
 
 UNKNOWN
 """,

diff --git a/python/BUILD.bazel b/python/BUILD.bazel
index 34b4de3..5ff752e 100644
--- a/python/BUILD.bazel
+++ b/python/BUILD.bazel

@@ -77,6 +77,7 @@
         ":py_binary_bzl",
         "//python/private:py_package.bzl",
         "//python/private:py_wheel_bzl",
+        "//python/private:py_wheel_normalize_pep440.bzl",
         "//python/private:stamp_bzl",
         "//python/private:util_bzl",
     ],

diff --git a/python/private/BUILD.bazel b/python/private/BUILD.bazel
index d161058..f6e3012 100644
--- a/python/private/BUILD.bazel
+++ b/python/private/BUILD.bazel

@@ -236,6 +236,7 @@
         "coverage.patch",
         "py_package.bzl",
         "py_wheel.bzl",
+        "py_wheel_normalize_pep440.bzl",
         "reexports.bzl",
         "stamp.bzl",
         "util.bzl",

diff --git a/python/private/py_wheel.bzl b/python/private/py_wheel.bzl
index d8bceab..4152e08 100644
--- a/python/private/py_wheel.bzl
+++ b/python/private/py_wheel.bzl

@@ -16,6 +16,7 @@
 
 load("//python/private:stamp.bzl", "is_stamping_enabled")
 load(":py_package.bzl", "py_package_lib")
+load(":py_wheel_normalize_pep440.bzl", "normalize_pep440")
 
 PyWheelInfo = provider(
     doc = "Information about a wheel produced by `py_wheel`",
@@ -117,6 +118,29 @@
     ),
 }
 
+_feature_flags = {
+    "incompatible_normalize_name": attr.bool(
+        default = False,
+        doc = """\
+Normalize the package distribution name according to latest
+Python packaging standards.
+
+See https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode
+and https://packaging.python.org/en/latest/specifications/name-normalization/.
+
+Apart from the valid names according to the above, we also accept
+'{' and '}', which may be used as placeholders for stamping.
+""",
+    ),
+    "incompatible_normalize_version": attr.bool(
+        default = False,
+        doc = "Normalize the package version according to PEP440 standard. " +
+              "With this option set to True, if the user wants to pass any " +
+              "stamp variables, they have to be enclosed in '{}', e.g. " +
+              "'{BUILD_TIMESTAMP}'.",
+    ),
+}
+
 _requirement_attrs = {
     "extra_requires": attr.string_list_dict(
         doc = "List of optional requirements for this package",
@@ -203,6 +227,42 @@
 }
 _DEFAULT_DESCRIPTION_FILE_TYPE = "text/plain"
 
+def _escape_filename_distribution_name(name):
+    """Escape the distribution name component of a filename.
+
+    See https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode
+    and https://packaging.python.org/en/latest/specifications/name-normalization/.
+
+    Apart from the valid names according to the above, we also accept
+    '{' and '}', which may be used as placeholders for stamping.
+    """
+    escaped = ""
+    for character in name.elems():
+        if character.isalnum() or character in ["{", "}"]:
+            escaped += character.lower()
+        elif character in ["-", "_", "."]:
+            if escaped == "":
+                fail(
+                    "A valid name must start with a letter or number.",
+                    "Name '%s' does not." % name,
+                )
+            elif escaped.endswith("_"):
+                pass
+            else:
+                escaped += "_"
+        else:
+            fail(
+                "A valid name consists only of ASCII letters ",
+                "and numbers, period, underscore and hyphen.",
+                "Name '%s' has bad character '%s'." % (name, character),
+            )
+    if escaped.endswith("_"):
+        fail(
+            "A valid name must end with a letter or number.",
+            "Name '%s' does not." % name,
+        )
+    return escaped
+
 def _escape_filename_segment(segment):
     """Escape a segment of the wheel filename.
 
@@ -237,13 +297,25 @@
     python_tag = _replace_make_variables(ctx.attr.python_tag, ctx)
     version = _replace_make_variables(ctx.attr.version, ctx)
 
-    outfile = ctx.actions.declare_file("-".join([
-        _escape_filename_segment(ctx.attr.distribution),
-        _escape_filename_segment(version),
+    filename_segments = []
+
+    if ctx.attr.incompatible_normalize_name:
+        filename_segments.append(_escape_filename_distribution_name(ctx.attr.distribution))
+    else:
+        filename_segments.append(_escape_filename_segment(ctx.attr.distribution))
+
+    if ctx.attr.incompatible_normalize_version:
+        filename_segments.append(normalize_pep440(version))
+    else:
+        filename_segments.append(_escape_filename_segment(version))
+
+    filename_segments.extend([
         _escape_filename_segment(python_tag),
         _escape_filename_segment(abi),
         _escape_filename_segment(ctx.attr.platform),
-    ]) + ".whl")
+    ])
+
+    outfile = ctx.actions.declare_file("-".join(filename_segments) + ".whl")
 
     name_file = ctx.actions.declare_file(ctx.label.name + ".name")
 
@@ -272,6 +344,10 @@
     args.add("--out", outfile)
     args.add("--name_file", name_file)
     args.add_all(ctx.attr.strip_path_prefixes, format_each = "--strip_path_prefix=%s")
+    if ctx.attr.incompatible_normalize_name:
+        args.add("--incompatible_normalize_name")
+    if ctx.attr.incompatible_normalize_version:
+        args.add("--incompatible_normalize_version")
 
     # Pass workspace status files if stamping is enabled
     if is_stamping_enabled(ctx.attr):
@@ -423,6 +499,7 @@
             ),
         },
         _distribution_attrs,
+        _feature_flags,
         _requirement_attrs,
         _entrypoint_attrs,
         _other_attrs,

diff --git a/python/private/py_wheel_normalize_pep440.bzl b/python/private/py_wheel_normalize_pep440.bzl
new file mode 100644
index 0000000..9566348
--- /dev/null
+++ b/python/private/py_wheel_normalize_pep440.bzl

@@ -0,0 +1,519 @@
+# Copyright 2023 The Bazel Authors. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"Implementation of PEP440 version string normalization"
+
+def mkmethod(self, method):
+    """Bind a struct as the first arg to a function.
+
+    This is loosely equivalent to creating a bound method of a class.
+    """
+    return lambda *args, **kwargs: method(self, *args, **kwargs)
+
+def _isdigit(token):
+    return token.isdigit()
+
+def _isalnum(token):
+    return token.isalnum()
+
+def _lower(token):
+    # PEP 440: Case sensitivity
+    return token.lower()
+
+def _is(reference):
+    """Predicate testing a token for equality with `reference`."""
+    return lambda token: token == reference
+
+def _is_not(reference):
+    """Predicate testing a token for inequality with `reference`."""
+    return lambda token: token != reference
+
+def _in(reference):
+    """Predicate testing if a token is in the list `reference`."""
+    return lambda token: token in reference
+
+def _ctx(start):
+    return {"norm": "", "start": start}
+
+def _open_context(self):
+    """Open an new parsing ctx.
+
+    If the current parsing step succeeds, call self.accept().
+    If the current parsing step fails, call self.discard() to
+    go back to how it was before we opened a new ctx.
+
+    Args:
+      self: The normalizer.
+    """
+    self.contexts.append(_ctx(_context(self)["start"]))
+    return self.contexts[-1]
+
+def _accept(self):
+    """Close the current ctx successfully and merge the results."""
+    finished = self.contexts.pop()
+    self.contexts[-1]["norm"] += finished["norm"]
+    self.contexts[-1]["start"] = finished["start"]
+    return True
+
+def _context(self):
+    return self.contexts[-1]
+
+def _discard(self):
+    self.contexts.pop()
+    return False
+
+def _new(input):
+    """Create a new normalizer"""
+    self = struct(
+        input = input,
+        contexts = [_ctx(0)],
+    )
+
+    public = struct(
+        # methods: keep sorted
+        accept = mkmethod(self, _accept),
+        context = mkmethod(self, _context),
+        discard = mkmethod(self, _discard),
+        open_context = mkmethod(self, _open_context),
+
+        # attributes: keep sorted
+        input = self.input,
+    )
+    return public
+
+def accept(parser, predicate, value):
+    """If `predicate` matches the next token, accept the token.
+
+    Accepting the token means adding it (according to `value`) to
+    the running results maintained in ctx["norm"] and
+    advancing the cursor in ctx["start"] to the next token in
+    `version`.
+
+    Args:
+      parser: The normalizer.
+      predicate: function taking a token and returning a boolean
+        saying if we want to accept the token.
+      value: the string to add if there's a match, or, if `value`
+        is a function, the function to apply to the current token
+        to get the string to add.
+
+    Returns:
+      whether a token was accepted.
+    """
+
+    ctx = parser.context()
+
+    if ctx["start"] >= len(parser.input):
+        return False
+
+    token = parser.input[ctx["start"]]
+
+    if predicate(token):
+        if type(value) in ["function", "builtin_function_or_method"]:
+            value = value(token)
+
+        ctx["norm"] += value
+        ctx["start"] += 1
+        return True
+
+    return False
+
+def accept_placeholder(parser):
+    """Accept a Bazel placeholder.
+
+    Placeholders aren't actually part of PEP 440, but are used for
+    stamping purposes. A placeholder might be
+    ``{BUILD_TIMESTAMP}``, for instance. We'll accept these as
+    they are, assuming they will expand to something that makes
+    sense where they appear. Before the stamping has happened, a
+    resulting wheel file name containing a placeholder will not
+    actually be valid.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether a placeholder was accepted.
+    """
+    ctx = parser.open_context()
+
+    if not accept(parser, _is("{"), str):
+        return parser.discard()
+
+    start = ctx["start"]
+    for _ in range(start, len(parser.input) + 1):
+        if not accept(parser, _is_not("}"), str):
+            break
+
+    if not accept(parser, _is("}"), str):
+        return parser.discard()
+
+    return parser.accept()
+
+def accept_digits(parser):
+    """Accept multiple digits (or placeholders).
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether some digits (or placeholders) were accepted.
+    """
+
+    ctx = parser.open_context()
+    start = ctx["start"]
+
+    for i in range(start, len(parser.input) + 1):
+        if not accept(parser, _isdigit, str) and not accept_placeholder(parser):
+            if i - start >= 1:
+                if ctx["norm"].isdigit():
+                    # PEP 440: Integer Normalization
+                    ctx["norm"] = str(int(ctx["norm"]))
+                return parser.accept()
+            break
+
+    return parser.discard()
+
+def accept_string(parser, string, replacement):
+    """Accept a `string` in the input. Output `replacement`.
+
+    Args:
+      parser: The normalizer.
+      string: The string to search for in the parser input.
+      replacement: The normalized string to use if the string was found.
+
+    Returns:
+      whether the string was accepted.
+    """
+    ctx = parser.open_context()
+
+    for character in string.elems():
+        if not accept(parser, _in([character, character.upper()]), ""):
+            return parser.discard()
+
+    ctx["norm"] = replacement
+
+    return parser.accept()
+
+def accept_alnum(parser):
+    """Accept an alphanumeric sequence.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether an alphanumeric sequence was accepted.
+    """
+
+    ctx = parser.open_context()
+    start = ctx["start"]
+
+    for i in range(start, len(parser.input) + 1):
+        if not accept(parser, _isalnum, _lower) and not accept_placeholder(parser):
+            if i - start >= 1:
+                return parser.accept()
+            break
+
+    return parser.discard()
+
+def accept_dot_number(parser):
+    """Accept a dot followed by digits.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether a dot+digits pair was accepted.
+    """
+    parser.open_context()
+
+    if accept(parser, _is("."), ".") and accept_digits(parser):
+        return parser.accept()
+    else:
+        return parser.discard()
+
+def accept_dot_number_sequence(parser):
+    """Accept a sequence of dot+digits.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether a sequence of dot+digits pairs was accepted.
+    """
+    ctx = parser.context()
+    start = ctx["start"]
+    i = start
+
+    for i in range(start, len(parser.input) + 1):
+        if not accept_dot_number(parser):
+            break
+    return i - start >= 1
+
+def accept_separator_alnum(parser):
+    """Accept a separator followed by an alphanumeric string.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether a separator and an alphanumeric string were accepted.
+    """
+    parser.open_context()
+
+    # PEP 440: Local version segments
+    if (
+        accept(parser, _in([".", "-", "_"]), ".") and
+        (accept_digits(parser) or accept_alnum(parser))
+    ):
+        return parser.accept()
+
+    return parser.discard()
+
+def accept_separator_alnum_sequence(parser):
+    """Accept a sequence of separator+alphanumeric.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether a sequence of separator+alphanumerics was accepted.
+    """
+    ctx = parser.context()
+    start = ctx["start"]
+    i = start
+
+    for i in range(start, len(parser.input) + 1):
+        if not accept_separator_alnum(parser):
+            break
+
+    return i - start >= 1
+
+def accept_epoch(parser):
+    """PEP 440: Version epochs.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether a PEP 440 epoch identifier was accepted.
+    """
+    ctx = parser.open_context()
+    if accept_digits(parser) and accept(parser, _is("!"), "!"):
+        if ctx["norm"] == "0!":
+            ctx["norm"] = ""
+        return parser.accept()
+    else:
+        return parser.discard()
+
+def accept_release(parser):
+    """Accept the release segment, numbers separated by dots.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether a release segment was accepted.
+    """
+    parser.open_context()
+
+    if not accept_digits(parser):
+        return parser.discard()
+
+    accept_dot_number_sequence(parser)
+    return parser.accept()
+
+def accept_pre_l(parser):
+    """PEP 440: Pre-release spelling.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether a prerelease keyword was accepted.
+    """
+    parser.open_context()
+
+    if (
+        accept_string(parser, "alpha", "a") or
+        accept_string(parser, "a", "a") or
+        accept_string(parser, "beta", "b") or
+        accept_string(parser, "b", "b") or
+        accept_string(parser, "c", "rc") or
+        accept_string(parser, "preview", "rc") or
+        accept_string(parser, "pre", "rc") or
+        accept_string(parser, "rc", "rc")
+    ):
+        return parser.accept()
+    else:
+        return parser.discard()
+
+def accept_prerelease(parser):
+    """PEP 440: Pre-releases.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether a prerelease identifier was accepted.
+    """
+    ctx = parser.open_context()
+
+    # PEP 440: Pre-release separators
+    accept(parser, _in(["-", "_", "."]), "")
+
+    if not accept_pre_l(parser):
+        return parser.discard()
+
+    accept(parser, _in(["-", "_", "."]), "")
+
+    if not accept_digits(parser):
+        # PEP 440: Implicit pre-release number
+        ctx["norm"] += "0"
+
+    return parser.accept()
+
+def accept_implicit_postrelease(parser):
+    """PEP 440: Implicit post releases.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether an implicit postrelease identifier was accepted.
+    """
+    ctx = parser.open_context()
+
+    if accept(parser, _is("-"), "") and accept_digits(parser):
+        ctx["norm"] = ".post" + ctx["norm"]
+        return parser.accept()
+
+    return parser.discard()
+
+def accept_explicit_postrelease(parser):
+    """PEP 440: Post-releases.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether an explicit postrelease identifier was accepted.
+    """
+    ctx = parser.open_context()
+
+    # PEP 440: Post release separators
+    if not accept(parser, _in(["-", "_", "."]), "."):
+        ctx["norm"] += "."
+
+    # PEP 440: Post release spelling
+    if (
+        accept_string(parser, "post", "post") or
+        accept_string(parser, "rev", "post") or
+        accept_string(parser, "r", "post")
+    ):
+        accept(parser, _in(["-", "_", "."]), "")
+
+        if not accept_digits(parser):
+            # PEP 440: Implicit post release number
+            ctx["norm"] += "0"
+
+        return parser.accept()
+
+    return parser.discard()
+
+def accept_postrelease(parser):
+    """PEP 440: Post-releases.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether a postrelease identifier was accepted.
+    """
+    parser.open_context()
+
+    if accept_implicit_postrelease(parser) or accept_explicit_postrelease(parser):
+        return parser.accept()
+
+    return parser.discard()
+
+def accept_devrelease(parser):
+    """PEP 440: Developmental releases.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether a developmental release identifier was accepted.
+    """
+    ctx = parser.open_context()
+
+    # PEP 440: Development release separators
+    if not accept(parser, _in(["-", "_", "."]), "."):
+        ctx["norm"] += "."
+
+    if accept_string(parser, "dev", "dev"):
+        accept(parser, _in(["-", "_", "."]), "")
+
+        if not accept_digits(parser):
+            # PEP 440: Implicit development release number
+            ctx["norm"] += "0"
+
+        return parser.accept()
+
+    return parser.discard()
+
+def accept_local(parser):
+    """PEP 440: Local version identifiers.
+
+    Args:
+      parser: The normalizer.
+
+    Returns:
+      whether a local version identifier was accepted.
+    """
+    parser.open_context()
+
+    if accept(parser, _is("+"), "+") and accept_alnum(parser):
+        accept_separator_alnum_sequence(parser)
+        return parser.accept()
+
+    return parser.discard()
+
+def normalize_pep440(version):
+    """Escape the version component of a filename.
+
+    See https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode
+    and https://peps.python.org/pep-0440/
+
+    Args:
+      version: version string to be normalized according to PEP 440.
+
+    Returns:
+      string containing the normalized version.
+    """
+    parser = _new(version.strip())  # PEP 440: Leading and Trailing Whitespace
+    accept(parser, _is("v"), "")  # PEP 440: Preceding v character
+    accept_epoch(parser)
+    accept_release(parser)
+    accept_prerelease(parser)
+    accept_postrelease(parser)
+    accept_devrelease(parser)
+    accept_local(parser)
+    if parser.input[parser.context()["start"]:]:
+        fail(
+            "Failed to parse PEP 440 version identifier '%s'." % parser.input,
+            "Parse error at '%s'" % parser.input[parser.context()["start"]:],
+        )
+    return parser.context()["norm"]

diff --git a/tests/py_wheel/py_wheel_tests.bzl b/tests/py_wheel/py_wheel_tests.bzl
index e580732..3c03a1b 100644
--- a/tests/py_wheel/py_wheel_tests.bzl
+++ b/tests/py_wheel/py_wheel_tests.bzl

@@ -16,7 +16,9 @@
 load("@rules_testing//lib:analysis_test.bzl", "analysis_test", "test_suite")
 load("@rules_testing//lib:util.bzl", rt_util = "util")
 load("//python:packaging.bzl", "py_wheel")
+load("//python/private:py_wheel_normalize_pep440.bzl", "normalize_pep440")  # buildifier: disable=bzl-visibility
 
+_basic_tests = []
 _tests = []
 
 def _test_metadata(name):
@@ -92,8 +94,109 @@
 
 _tests.append(_test_content_type_from_description)
 
+def _test_pep440_normalization(env):
+    prefixes = ["v", "  v", " \t\r\nv"]
+    epochs = {
+        "": ["", "0!", "00!"],
+        "1!": ["1!", "001!"],
+        "200!": ["200!", "00200!"],
+    }
+    releases = {
+        "0.1": ["0.1", "0.01"],
+        "2023.7.19": ["2023.7.19", "2023.07.19"],
+    }
+    pres = {
+        "": [""],
+        "a0": ["a", ".a", "-ALPHA0", "_alpha0", ".a0"],
+        "a4": ["alpha4", ".a04"],
+        "b0": ["b", ".b", "-BETA0", "_beta0", ".b0"],
+        "b5": ["beta05", ".b5"],
+        "rc0": ["C", "_c0", "RC", "_rc0", "-preview_0"],
+    }
+    explicit_posts = {
+        "": [""],
+        ".post0": [],
+        ".post1": [".post1", "-r1", "_rev1"],
+    }
+    implicit_posts = [[".post1", "-1"], [".post2", "-2"]]
+    devs = {
+        "": [""],
+        ".dev0": ["dev", "-DEV", "_Dev-0"],
+        ".dev9": ["DEV9", ".dev09", ".dev9"],
+        ".dev{BUILD_TIMESTAMP}": [
+            "-DEV{BUILD_TIMESTAMP}",
+            "_dev_{BUILD_TIMESTAMP}",
+        ],
+    }
+    locals = {
+        "": [""],
+        "+ubuntu.7": ["+Ubuntu_7", "+ubuntu-007"],
+        "+ubuntu.r007": ["+Ubuntu_R007"],
+    }
+    epochs = [
+        [normalized_epoch, input_epoch]
+        for normalized_epoch, input_epochs in epochs.items()
+        for input_epoch in input_epochs
+    ]
+    releases = [
+        [normalized_release, input_release]
+        for normalized_release, input_releases in releases.items()
+        for input_release in input_releases
+    ]
+    pres = [
+        [normalized_pre, input_pre]
+        for normalized_pre, input_pres in pres.items()
+        for input_pre in input_pres
+    ]
+    explicit_posts = [
+        [normalized_post, input_post]
+        for normalized_post, input_posts in explicit_posts.items()
+        for input_post in input_posts
+    ]
+    pres_and_posts = [
+        [normalized_pre + normalized_post, input_pre + input_post]
+        for normalized_pre, input_pre in pres
+        for normalized_post, input_post in explicit_posts
+    ] + [
+        [normalized_pre + normalized_post, input_pre + input_post]
+        for normalized_pre, input_pre in pres
+        for normalized_post, input_post in implicit_posts
+        if input_pre == "" or input_pre[-1].isdigit()
+    ]
+    devs = [
+        [normalized_dev, input_dev]
+        for normalized_dev, input_devs in devs.items()
+        for input_dev in input_devs
+    ]
+    locals = [
+        [normalized_local, input_local]
+        for normalized_local, input_locals in locals.items()
+        for input_local in input_locals
+    ]
+    postfixes = ["", "  ", " \t\r\n"]
+    i = 0
+    for nepoch, iepoch in epochs:
+        for nrelease, irelease in releases:
+            for nprepost, iprepost in pres_and_posts:
+                for ndev, idev in devs:
+                    for nlocal, ilocal in locals:
+                        prefix = prefixes[i % len(prefixes)]
+                        postfix = postfixes[(i // len(prefixes)) % len(postfixes)]
+                        env.expect.that_str(
+                            normalize_pep440(
+                                prefix + iepoch + irelease + iprepost +
+                                idev + ilocal + postfix,
+                            ),
+                        ).equals(
+                            nepoch + nrelease + nprepost + ndev + nlocal,
+                        )
+                        i += 1
+
+_basic_tests.append(_test_pep440_normalization)
+
 def py_wheel_test_suite(name):
     test_suite(
         name = name,
+        basic_tests = _basic_tests,
         tests = _tests,
     )

diff --git a/tools/BUILD.bazel b/tools/BUILD.bazel
index fd951d9..51bd56d 100644
--- a/tools/BUILD.bazel
+++ b/tools/BUILD.bazel

@@ -21,6 +21,7 @@
 py_binary(
     name = "wheelmaker",
     srcs = ["wheelmaker.py"],
+    deps = ["@pypi__packaging//:lib"],
 )
 
 filegroup(

diff --git a/tools/wheelmaker.py b/tools/wheelmaker.py
index 63b833f..dce5406 100644
--- a/tools/wheelmaker.py
+++ b/tools/wheelmaker.py

@@ -33,10 +33,67 @@
 
 
 def escape_filename_segment(segment):
-    """Escapes a filename segment per https://www.python.org/dev/peps/pep-0427/#escaping-and-unicode"""
+    """Escapes a filename segment per https://www.python.org/dev/peps/pep-0427/#escaping-and-unicode
+
+    This is a legacy function, kept for backwards compatibility,
+    and may be removed in the future. See `escape_filename_distribution_name`
+    and `normalize_pep440` for the modern alternatives.
+    """
     return re.sub(r"[^\w\d.]+", "_", segment, re.UNICODE)
 
 
+def normalize_package_name(name):
+    """Normalize a package name according to the Python Packaging User Guide.
+
+    See https://packaging.python.org/en/latest/specifications/name-normalization/
+    """
+    return re.sub(r"[-_.]+", "-", name).lower()
+
+
+def escape_filename_distribution_name(name):
+    """Escape the distribution name component of a filename.
+
+    See https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode
+    """
+    return normalize_package_name(name).replace("-", "_")
+
+
+def normalize_pep440(version):
+    """Normalize version according to PEP 440, with fallback for placeholders.
+
+    If there's a placeholder in braces, such as {BUILD_TIMESTAMP},
+    replace it with 0. Such placeholders can be used with stamping, in
+    which case they would have been resolved already by now; if they
+    haven't, we're doing an unstamped build, but we still need to
+    produce a valid version. If such replacements are made, the
+    original version string, sanitized to dot-separated alphanumerics,
+    is appended as a local version segment, so you understand what
+    placeholder was involved.
+
+    If that still doesn't produce a valid version, use version 0 and
+    append the original version string, sanitized to dot-separated
+    alphanumerics, as a local version segment.
+
+    """
+
+    import packaging.version
+
+    try:
+        return str(packaging.version.Version(version))
+    except packaging.version.InvalidVersion:
+        pass
+
+    sanitized = re.sub(r'[^a-z0-9]+', '.', version.lower()).strip('.')
+    substituted = re.sub(r'\{\w+\}', '0', version)
+    delimiter = '.' if '+' in substituted else '+'
+    try:
+        return str(
+            packaging.version.Version(f'{substituted}{delimiter}{sanitized}')
+        )
+    except packaging.version.InvalidVersion:
+        return str(packaging.version.Version(f'0+{sanitized}'))
+
+
 class WheelMaker(object):
     def __init__(
         self,
@@ -48,6 +105,8 @@
         platform,
         outfile=None,
         strip_path_prefixes=None,
+        incompatible_normalize_name=False,
+        incompatible_normalize_version=False,
     ):
         self._name = name
         self._version = version
@@ -60,12 +119,30 @@
             strip_path_prefixes if strip_path_prefixes is not None else []
         )
 
-        self._distinfo_dir = (
-            escape_filename_segment(self._name)
-            + "-"
-            + escape_filename_segment(self._version)
-            + ".dist-info/"
-        )
+        if incompatible_normalize_version:
+            self._version = normalize_pep440(self._version)
+            self._escaped_version = self._version
+        else:
+            self._escaped_version = escape_filename_segment(self._version)
+
+        if incompatible_normalize_name:
+            escaped_name = escape_filename_distribution_name(self._name)
+            self._distinfo_dir = (
+                escaped_name + "-" + self._escaped_version + ".dist-info/"
+            )
+            self._wheelname_fragment_distribution_name = escaped_name
+        else:
+            # The legacy behavior escapes the distinfo dir but not the
+            # wheel name. Enable incompatible_normalize_name to fix it.
+            # https://github.com/bazelbuild/rules_python/issues/1132
+            self._distinfo_dir = (
+                escape_filename_segment(self._name)
+                + "-"
+                + self._escaped_version
+                + ".dist-info/"
+            )
+            self._wheelname_fragment_distribution_name = self._name
+
         self._zipfile = None
         # Entries for the RECORD file as (filename, hash, size) tuples.
         self._record = []
@@ -81,7 +158,10 @@
         self._zipfile = None
 
     def wheelname(self) -> str:
-        components = [self._name, self._version]
+        components = [
+            self._wheelname_fragment_distribution_name,
+            self._version,
+        ]
         if self._build_tag:
             components.append(self._build_tag)
         components += [self._python_tag, self._abi, self._platform]
@@ -330,6 +410,10 @@
         help="Pass in the stamp info file for stamping",
     )
 
+    feature_group = parser.add_argument_group("Feature flags")
+    feature_group.add_argument("--incompatible_normalize_name", action="store_true")
+    feature_group.add_argument("--incompatible_normalize_version", action="store_true")
+
     return parser.parse_args(sys.argv[1:])
 
 
@@ -386,6 +470,8 @@
         platform=arguments.platform,
         outfile=arguments.out,
         strip_path_prefixes=strip_prefixes,
+        incompatible_normalize_name=arguments.incompatible_normalize_name,
+        incompatible_normalize_version=arguments.incompatible_normalize_version,
     ) as maker:
         for package_filename, real_filename in all_files:
             maker.add_file(package_filename, real_filename)
@@ -410,8 +496,15 @@
             with open(arguments.metadata_file, "rt", encoding="utf-8") as metadata_file:
                 metadata = metadata_file.read()
 
+        if arguments.incompatible_normalize_version:
+            version_in_metadata = normalize_pep440(version)
+        else:
+            version_in_metadata = version
         maker.add_metadata(
-            metadata=metadata, name=name, description=description, version=version
+            metadata=metadata,
+            name=name,
+            description=description,
+            version=version_in_metadata,
         )
 
         if arguments.entry_points_file:
commit	382b6785a57ee428fc0ec367bcb380c6266cab7b	[log] [tgz]
author	Christian von Schultz <christian@embedl.com>	Thu Oct 05 16:04:09 2023 +0200
committer	GitHub <noreply@github.com>	Thu Oct 05 14:04:09 2023 +0000
tree	d16bd11f5d1288311fb4edfc329c2754c3ea0a06
parent	423c1de345c32d67dfe1e8d43510399ab10dc2c4 [diff]