fix: Intel ICC C++17 compatibility (#2729)

* CI: Intel icc/icpc via oneAPI

Add testing for Intel icc/icpc via the oneAPI images.
Intel oneAPI is in a late beta stage, currently shipping
oneAPI beta09 with ICC 20.2.

CI: Skip Interpreter Tests for Intel

Cannot find how to add this, neiter the package `libc6-dev` nor
`intel-oneapi-mkl-devel` help when installed to solve this:
```
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - not found
CMake Error at /__t/cmake/3.18.4/x64/cmake-3.18.4-Linux-x86_64/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:165 (message):
  Could NOT find Threads (missing: Threads_FOUND)
Call Stack (most recent call first):
  /__t/cmake/3.18.4/x64/cmake-3.18.4-Linux-x86_64/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:458 (_FPHSA_FAILURE_MESSAGE)
  /__t/cmake/3.18.4/x64/cmake-3.18.4-Linux-x86_64/share/cmake-3.18/Modules/FindThreads.cmake:234 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
  tests/test_embed/CMakeLists.txt:17 (find_package)
```

CI: libc6-dev from GCC for ICC

CI: Run bare metal for oneAPI

CI: Ubuntu 18.04 for oneAPI

CI: Intel +Catch -Eigen

CI: CMake from Apt (ICC tests)

CI: Replace Intel Py with GCC Py

CI: Intel w/o GCC's Eigen

CI: ICC with verbose make

[Debug] Find core dump

tests: use arg{} instead of arg() for Intel

tests: adding a few more missing {}

fix: sync with @tobiasleibner's branch

fix: try ubuntu 20-04

fix: drop exit 1

docs: Apply suggestions from code review

Co-authored-by: Tobias Leibner <tobias.leibner@googlemail.com>

Workaround for ICC enable_if issues

Another workaround for ICC's enable_if issues

fix error in previous commit

Disable one test for the Intel compiler in C++17 mode

Add back one instance of py::arg().noconvert()

Add NOLINT to fix clang-tidy check

Work around for ICC internal error in PYBIND11_EXPAND_SIDE_EFFECTS in C++17 mode

CI: Intel ICC with C++17

docs: pybind11/numpy.h does not require numpy at build time. (#2720)

This is nice enough to be mentioned explicitly in the docs.

docs: Update warning about Python 3.9.0 UB, now that 3.9.1 has been released (#2719)

Adjusting `type_caster<std::reference_wrapper<T>>` to support const/non-const propagation in `cast_op`. (#2705)

* Allow type_caster of std::reference_wrapper<T> to be the same as a native reference.

Before, both std::reference_wrapper<T> and std::reference_wrapper<const T> would
invoke cast_op<type>. This doesn't allow the type_caster<> specialization for T
to distinguish reference_wrapper types from value types.

After, the type_caster<> specialization invokes cast_op<type&>, which allows
reference_wrapper to behave in the same way as a native reference type.

* Add tests/examples for std::reference_wrapper<const T>

* Add tests which use mutable/immutable variants

This test is a chimera; it blends the pybind11 casters with a custom
pytype implementation that supports immutable and mutable calls.

In order to detect the immutable/mutable state, the cast_op needs
to propagate it, even through e.g. std::reference<const T>

Note: This is still a work in progress; some things are crashing,
which likely means that I have a refcounting bug or something else
missing.

* Add/finish tests that distinguish const& from &

Fixes the bugs in my custom python type implementation,
demonstrate test that requires const& and reference_wrapper<const T>
being treated differently from Non-const.

* Add passing a const to non-const method.

* Demonstrate non-const conversion of reference_wrapper in tests.

Apply formatting presubmit check.

* Fix build errors from presubmit checks.

* Try and fix a few more CI errors

* More CI fixes.

* More CI fixups.

* Try and get PyPy to work.

* Additional minor fixups. Getting close to CI green.

* More ci fixes?

* fix clang-tidy warnings from presubmit

* fix more clang-tidy warnings

* minor comment and consistency cleanups

* PyDECREF -> Py_DECREF

* copy/move constructors

* Resolve codereview comments

* more review comment fixes

* review comments: remove spurious &

* Make the test fail even when the static_assert is commented out.

This expands the test_freezable_type_caster a bit by:
1/ adding accessors .is_immutable and .addr to compare identity
from python.
2/ Changing the default cast_op of the type_caster<> specialization
to return a non-const value. In normal codepaths this is a reasonable
default.
3/ adding roundtrip variants to exercise the by reference, by pointer
and by reference_wrapper in all call paths.  In conjunction with 2/, this
demonstrates the failure case of the existing std::reference_wrpper conversion,
which now loses const in a similar way that happens when using the default cast_op_type<>.

* apply presubmit formatting

* Revert inclusion of test_freezable_type_caster

There's some concern that this test is a bit unwieldly because of the use
of the raw <Python.h> functions. Removing for now.

* Add a test that validates const references propagation.

This test verifies that cast_op may be used to correctly detect
const reference types when used with std::reference_wrapper.

* mend

* Review comments based changes.

1. std::add_lvalue_reference<type> -> type&
2. Simplify the test a little more; we're never returning the ConstRefCaster
type so the class_ definition can be removed.

* formatted files again.

* Move const_ref_caster test to builtin_casters

* Review comments: use cast_op and adjust some comments.

* Simplify ConstRefCasted test

I like this version better as it moves the assertion that matters
back into python.

ci: drop pypy2 linux, PGI 20.7, add Python 10 dev (#2724)

* ci: drop pypy2 linux, add Python 10 dev

* ci: fix mistake

* ci: commented-out PGI 20.11, drop 20.7

fix: regression with installed pybind11 overriding local one (#2716)

* fix: regression with installed pybind11 overriding discovered one

Closes #2709

* docs: wording incorrect

style: remove redundant instance->owned = true (#2723)

which was just before set to True in instance->allocate_layout()

fix: also throw in the move-constructor added by the PYBIND11_OBJECT macro, after the argument has been moved-out (if necessary) (#2701)

Make args_are_all_* ICC workarounds unconditional

Disable test_aligned on Intel ICC

Fix test_aligned on Intel ICC

Skip test_python_alreadyset_in_destructor on Intel ICC

Fix test_aligned again

ICC CI: Downgrade pytest

pytest 6 does not capture the `discard_as_unraisable` stderr and
just writes a warning with its content instead.

* refactor: simpler Intel workaround, suggested by @laramiel

* fix: try version with impl to see if it is easier to compile

* docs: update README for ICC

Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja>
Co-authored-by: Henry Schreiner <henryschreineriii@gmail.com>
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 2771035..3e67c8f 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -474,7 +474,7 @@
     name: "🐍 3 • ICC latest • x64"
 
     steps:
-    - uses: actions/checkout@v1
+    - uses: actions/checkout@v2
 
     - name: Add apt repo
       run: |
@@ -488,7 +488,6 @@
       run: sudo apt-get update; sudo apt-get install -y intel-oneapi-compiler-dpcpp-cpp-and-cpp-classic cmake python3-dev python3-numpy python3-pytest python3-pip
 
     - name: Update pip
-      shell: bash
       run: |
         set +e; source /opt/intel/oneapi/setvars.sh; set -e
         python3 -m pip install --upgrade pip
@@ -498,43 +497,69 @@
         set +e; source /opt/intel/oneapi/setvars.sh; set -e
         python3 -m pip install -r tests/requirements.txt --prefer-binary
 
-    - name: Configure
-      shell: bash
+    - name: Configure C++11
       run: |
         set +e; source /opt/intel/oneapi/setvars.sh; set -e
-        cmake -S . -B build     \
+        cmake -S . -B build-11     \
         -DPYBIND11_WERROR=ON    \
         -DDOWNLOAD_CATCH=ON     \
         -DDOWNLOAD_EIGEN=OFF    \
         -DCMAKE_CXX_STANDARD=11             \
         -DCMAKE_CXX_COMPILER=$(which icpc)  \
-        -DCMAKE_VERBOSE_MAKEFILE=ON         \
         -DPYTHON_EXECUTABLE=$(python3 -c "import sys; print(sys.executable)")
 
-    - name: Build
-      shell: bash
+    - name: Build C++11
       run: |
         set +e; source /opt/intel/oneapi/setvars.sh; set -e
-        cmake --build build -j 2
+        cmake --build build-11 -j 2 -v
 
-    - name: Python tests
-      shell: bash
+    - name: Python tests C++11
       run: |
         set +e; source /opt/intel/oneapi/setvars.sh; set -e
         sudo service apport stop
-        cmake --build build --target check
+        cmake --build build-11 --target check
 
-    - name: C++ tests
-      shell: bash
+    - name: C++ tests C++11
       run: |
         set +e; source /opt/intel/oneapi/setvars.sh; set -e
-        cmake --build build --target cpptest
+        cmake --build build-11 --target cpptest
 
-    - name: Interface test
-      shell: bash
+    - name: Interface test C++11
       run: |
         set +e; source /opt/intel/oneapi/setvars.sh; set -e
-        cmake --build build --target test_cmake_build
+        cmake --build build-11 --target test_cmake_build
+
+    - name: Configure C++17
+      run: |
+        set +e; source /opt/intel/oneapi/setvars.sh; set -e
+        cmake -S . -B build-17     \
+        -DPYBIND11_WERROR=ON    \
+        -DDOWNLOAD_CATCH=ON     \
+        -DDOWNLOAD_EIGEN=OFF    \
+        -DCMAKE_CXX_STANDARD=17             \
+        -DCMAKE_CXX_COMPILER=$(which icpc)  \
+        -DPYTHON_EXECUTABLE=$(python3 -c "import sys; print(sys.executable)")
+
+    - name: Build C++17
+      run: |
+        set +e; source /opt/intel/oneapi/setvars.sh; set -e
+        cmake --build build-17 -j 2 -v
+
+    - name: Python tests C++17
+      run: |
+        set +e; source /opt/intel/oneapi/setvars.sh; set -e
+        sudo service apport stop
+        cmake --build build-17 --target check
+
+    - name: C++ tests C++17
+      run: |
+        set +e; source /opt/intel/oneapi/setvars.sh; set -e
+        cmake --build build-17 --target cpptest
+
+    - name: Interface test C++17
+      run: |
+        set +e; source /opt/intel/oneapi/setvars.sh; set -e
+        cmake --build build-17 --target test_cmake_build
 
 
   # Testing on CentOS (manylinux uses a centos base, and this is an easy way
diff --git a/README.rst b/README.rst
index 416fb8d..6c2a255 100644
--- a/README.rst
+++ b/README.rst
@@ -136,11 +136,10 @@
    newer)
 2. GCC 4.8 or newer
 3. Microsoft Visual Studio 2015 Update 3 or newer
-4. Intel C++ compiler 18 or newer
-   (`possible issue <https://github.com/pybind/pybind11/pull/2573>`_ on 20.2)
+4. Intel classic C++ compiler 18 or newer (ICC 20.2 tested in CI)
 5. Cygwin/GCC (previously tested on 2.5.1)
-6. NVCC (CUDA 11.0 tested)
-7. NVIDIA PGI (20.9 tested)
+6. NVCC (CUDA 11.0 tested in CI)
+7. NVIDIA PGI (20.9 tested in CI)
 
 About
 -----
diff --git a/include/pybind11/cast.h b/include/pybind11/cast.h
index ce85d99..009bf8f 100644
--- a/include/pybind11/cast.h
+++ b/include/pybind11/cast.h
@@ -2182,16 +2182,26 @@
     dict m_kwargs;
 };
 
+// [workaround(intel)] Separate function required here
+// We need to put this into a separate function because the Intel compiler
+// fails to compile enable_if_t<!all_of<is_positional<Args>...>::value>
+// (tested with ICC 2021.1 Beta 20200827).
+template <typename... Args>
+constexpr bool args_are_all_positional()
+{
+  return all_of<is_positional<Args>...>::value;
+}
+
 /// Collect only positional arguments for a Python function call
 template <return_value_policy policy, typename... Args,
-          typename = enable_if_t<all_of<is_positional<Args>...>::value>>
+          typename = enable_if_t<args_are_all_positional<Args...>()>>
 simple_collector<policy> collect_arguments(Args &&...args) {
     return simple_collector<policy>(std::forward<Args>(args)...);
 }
 
 /// Collect all arguments, including keywords and unpacking (only instantiated when needed)
 template <return_value_policy policy, typename... Args,
-          typename = enable_if_t<!all_of<is_positional<Args>...>::value>>
+          typename = enable_if_t<!args_are_all_positional<Args...>()>>
 unpacking_collector<policy> collect_arguments(Args &&...args) {
     // Following argument order rules for generalized unpacking according to PEP 448
     static_assert(
diff --git a/include/pybind11/detail/common.h b/include/pybind11/detail/common.h
index cb05bd1..950bd37 100644
--- a/include/pybind11/detail/common.h
+++ b/include/pybind11/detail/common.h
@@ -665,6 +665,10 @@
     std::is_pointer<T>::value && std::is_function<typename std::remove_pointer<T>::type>::value>;
 
 template <typename F> struct strip_function_object {
+    // If you are encountering an
+    // 'error: name followed by "::" must be a class or namespace name'
+    // with the Intel compiler and a noexcept function here,
+    // try to use noexcept(true) instead of plain noexcept.
     using type = typename remove_class<decltype(&F::operator())>::type;
 };
 
@@ -689,8 +693,10 @@
 /// Ignore that a variable is unused in compiler warnings
 inline void ignore_unused(const int *) { }
 
+// [workaround(intel)] Internal error on fold expression
 /// Apply a function over each element of a parameter pack
-#ifdef __cpp_fold_expressions
+#if defined(__cpp_fold_expressions) && !defined(__INTEL_COMPILER)
+// Intel compiler produces an internal error on this fold expression (tested with ICC 19.0.2)
 #define PYBIND11_EXPAND_SIDE_EFFECTS(PATTERN) (((PATTERN), void()), ...)
 #else
 using expand_side_effects = bool[];
diff --git a/include/pybind11/pytypes.h b/include/pybind11/pytypes.h
index d8d58b1..78db794 100644
--- a/include/pybind11/pytypes.h
+++ b/include/pybind11/pytypes.h
@@ -1271,6 +1271,15 @@
     detail::tuple_iterator end() const { return {*this, PyTuple_GET_SIZE(m_ptr)}; }
 };
 
+// We need to put this into a separate function because the Intel compiler
+// fails to compile enable_if_t<all_of<is_keyword_or_ds<Args>...>::value> part below
+// (tested with ICC 2021.1 Beta 20200827).
+template <typename... Args>
+constexpr bool args_are_all_keyword_or_ds()
+{
+  return detail::all_of<detail::is_keyword_or_ds<Args>...>::value;
+}
+
 class dict : public object {
 public:
     PYBIND11_OBJECT_CVT(dict, object, PyDict_Check, raw_dict)
@@ -1278,7 +1287,7 @@
         if (!m_ptr) pybind11_fail("Could not allocate dict object!");
     }
     template <typename... Args,
-              typename = detail::enable_if_t<detail::all_of<detail::is_keyword_or_ds<Args>...>::value>,
+              typename = detail::enable_if_t<args_are_all_keyword_or_ds<Args...>()>,
               // MSVC workaround: it can't compile an out-of-line definition, so defer the collector
               typename collector = detail::deferred_t<detail::unpacking_collector<>, Args...>>
     explicit dict(Args &&...args) : dict(collector(std::forward<Args>(args)...).kwargs()) { }
diff --git a/include/pybind11/stl_bind.h b/include/pybind11/stl_bind.h
index 9d8ed0c..83195ee 100644
--- a/include/pybind11/stl_bind.h
+++ b/include/pybind11/stl_bind.h
@@ -375,10 +375,20 @@
 template <typename Vector>
 struct vector_has_data_and_format<Vector, enable_if_t<std::is_same<decltype(format_descriptor<typename Vector::value_type>::format(), std::declval<Vector>().data()), typename Vector::value_type*>::value>> : std::true_type {};
 
+// [workaround(intel)] Separate function required here
+// Workaround as the Intel compiler does not compile the enable_if_t part below
+// (tested with icc (ICC) 2021.1 Beta 20200827)
+template <typename... Args>
+constexpr bool args_any_are_buffer() {
+    return detail::any_of<std::is_same<Args, buffer_protocol>...>::value;
+}
+
+// [workaround(intel)] Separate function required here
+// [workaround(msvc)] Can't use constexpr bool in return type
+
 // Add the buffer interface to a vector
 template <typename Vector, typename Class_, typename... Args>
-enable_if_t<detail::any_of<std::is_same<Args, buffer_protocol>...>::value>
-vector_buffer(Class_& cl) {
+void vector_buffer_impl(Class_& cl, std::true_type) {
     using T = typename Vector::value_type;
 
     static_assert(vector_has_data_and_format<Vector>::value, "There is not an appropriate format descriptor for this vector");
@@ -416,7 +426,12 @@
 }
 
 template <typename Vector, typename Class_, typename... Args>
-enable_if_t<!detail::any_of<std::is_same<Args, buffer_protocol>...>::value> vector_buffer(Class_&) {}
+void vector_buffer_impl(Class_&, std::false_type) {}
+
+template <typename Vector, typename Class_, typename... Args>
+void vector_buffer(Class_& cl) {
+    vector_buffer_impl<Vector, Class_, Args...>(cl, detail::any_of<std::is_same<Args, buffer_protocol>...>{});
+}
 
 PYBIND11_NAMESPACE_END(detail)
 
diff --git a/tests/test_builtin_casters.cpp b/tests/test_builtin_casters.cpp
index 16a78b1..f4e7756 100644
--- a/tests/test_builtin_casters.cpp
+++ b/tests/test_builtin_casters.cpp
@@ -193,6 +193,15 @@
     m.def("bool_passthrough", [](bool arg) { return arg; });
     m.def("bool_passthrough_noconvert", [](bool arg) { return arg; }, py::arg{}.noconvert());
 
+    // TODO: This should be disabled and fixed in future Intel compilers
+#if !defined(__INTEL_COMPILER)
+    // Test "bool_passthrough_noconvert" again, but using () instead of {} to construct py::arg
+    // When compiled with the Intel compiler, this results in segmentation faults when importing
+    // the module. Tested with icc (ICC) 2021.1 Beta 20200827, this should be tested again when
+    // a newer version of icc is available.
+    m.def("bool_passthrough_noconvert2", [](bool arg) { return arg; }, py::arg().noconvert());
+#endif
+
     // test_reference_wrapper
     m.def("refwrap_builtin", [](std::reference_wrapper<int> p) { return 10 * p.get(); });
     m.def("refwrap_usertype", [](std::reference_wrapper<UserType> p) { return p.get().value(); });
diff --git a/tests/test_callbacks.cpp b/tests/test_callbacks.cpp
index a29a3db..dffe538 100644
--- a/tests/test_callbacks.cpp
+++ b/tests/test_callbacks.cpp
@@ -120,6 +120,8 @@
     class AbstractBase {
     public:
         // [workaround(intel)] = default does not work here
+        // Defaulting this destructor results in linking errors with the Intel compiler
+        // (in Debug builds only, tested with icpc (ICC) 2021.1 Beta 20200827)
         virtual ~AbstractBase() {};  // NOLINT(modernize-use-equals-default)
         virtual unsigned int func() = 0;
     };
diff --git a/tests/test_class.cpp b/tests/test_class.cpp
index 43b3ff9..6ce928c 100644
--- a/tests/test_class.cpp
+++ b/tests/test_class.cpp
@@ -7,6 +7,13 @@
     BSD-style license that can be found in the LICENSE file.
 */
 
+#if defined(__INTEL_COMPILER) && __cplusplus >= 201703L
+// Intel compiler requires a separate header file to support aligned new operators
+// and does not set the __cpp_aligned_new feature macro.
+// This header needs to be included before pybind11.
+#include <aligned_new>
+#endif
+
 #include "pybind11_tests.h"
 #include "constructor_stats.h"
 #include "local_bindings.h"
@@ -324,6 +331,8 @@
     class PublicistB : public ProtectedB {
     public:
         // [workaround(intel)] = default does not work here
+        // Removing or defaulting this destructor results in linking errors with the Intel compiler
+        // (in Debug builds only, tested with icpc (ICC) 2021.1 Beta 20200827)
         ~PublicistB() override {};  // NOLINT(modernize-use-equals-default)
         using ProtectedB::foo;
     };
diff --git a/tests/test_constants_and_functions.cpp b/tests/test_constants_and_functions.cpp
index cd3c8b7..8855dd7 100644
--- a/tests/test_constants_and_functions.cpp
+++ b/tests/test_constants_and_functions.cpp
@@ -46,7 +46,14 @@
 // Test that we properly handle C++17 exception specifiers (which are part of the function signature
 // in C++17).  These should all still work before C++17, but don't affect the function signature.
 namespace test_exc_sp {
+// [workaround(intel)] Unable to use noexcept instead of noexcept(true)
+// Make the f1 test basically the same as the f2 test in C++17 mode for the Intel compiler as
+// it fails to compile with a plain noexcept (tested with icc (ICC) 2021.1 Beta 20200827).
+#if defined(__INTEL_COMPILER) && defined(PYBIND11_CPP17)
+int f1(int x) noexcept(true) { return x+1; }
+#else
 int f1(int x) noexcept { return x+1; }
+#endif
 int f2(int x) noexcept(true) { return x+2; }
 int f3(int x) noexcept(false) { return x+3; }
 #if defined(__GNUG__)