Fix test_profiler_seq_nr flakiness (on macos) (#91019)

Fixes https://github.com/pytorch/pytorch/issues/66893

On MacOS, two `aten::sum` calls are reported sometimes where there should be only one.  This can be easily reproduced by running `pytest test_autograd.py -k test_profiler_seq_nr --verbose  --flake-finder` to see the flakiness.  The profile result when the test fails is as follows (sorted by CPU):

```
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                                            aten::randn        16.67%       3.000us        27.78%       5.000us       2.500us             2
                                              aten::sum        16.67%       3.000us        27.78%       5.000us       2.500us             2
                                          aten::normal_        11.11%       2.000us        11.11%       2.000us       1.000us             2
                                              aten::add        11.11%       2.000us        11.11%       2.000us       2.000us             1
autograd::engine::evaluate_function: torch::autograd...        11.11%       2.000us        27.78%       5.000us       2.500us             2
                        torch::autograd::AccumulateGrad        11.11%       2.000us        16.67%       3.000us       1.500us             2
                                        aten::ones_like         5.56%       1.000us         5.56%       1.000us       1.000us             1
      autograd::engine::evaluate_function: SumBackward0         5.56%       1.000us        11.11%       2.000us       2.000us             1
                                           aten::expand         5.56%       1.000us         5.56%       1.000us       1.000us             1
                                            aten::copy_         5.56%       1.000us         5.56%       1.000us       0.500us             2
                                            aten::empty         0.00%       0.000us         0.00%       0.000us       0.000us             2
                                       aten::as_strided         0.00%       0.000us         0.00%       0.000us       0.000us             2
                                            aten::fill_         0.00%       0.000us         0.00%       0.000us       0.000us             2
                                       aten::empty_like         0.00%       0.000us         0.00%       0.000us       0.000us             1
                                    aten::empty_strided         0.00%       0.000us         0.00%       0.000us       0.000us             3
                                           SumBackward0         0.00%       0.000us         5.56%       1.000us       1.000us             1
      autograd::engine::evaluate_function: AddBackward0         0.00%       0.000us         0.00%       0.000us       0.000us             1
                                           AddBackward0         0.00%       0.000us         0.00%       0.000us       0.000us             1
                                aten::new_empty_strided         0.00%       0.000us         0.00%       0.000us       0.000us             2
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 18.000us
```

When it happens, the two `aten::sum` calls have different inputs:

```
                                              aten::sum         4.35%       1.000us        13.04%       3.000us       3.000us             1                          [[10, 10], []]
                                              aten::sum         8.70%       2.000us         8.70%       2.000us       2.000us             1                  [[10, 10], [], [], []]
```

I'm not sure what is the internal difference between `z.sum()` and `z.sum(dim=None)` here on MacOS, I thought they are the same.

### Testing

`pytest test_autograd.py -k test_profiler_seq_nr --verbose  --flake-finder` to run the test 50 times, all pass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91019
Approved by: https://github.com/malfet
diff --git a/test/test_autograd.py b/test/test_autograd.py
index d5ac114..cc22b02 100644
--- a/test/test_autograd.py
+++ b/test/test_autograd.py
@@ -3474,7 +3474,7 @@
             x = torch.randn(10, 10, requires_grad=True)
             y = torch.randn(10, 10, requires_grad=True)
             z = x + y
-            s = z.sum()
+            s = z.sum(dim=None)
             s.backward()
         print(p.key_averages().table(
             sort_by="self_cpu_time_total", row_limit=-1))
@@ -3501,11 +3501,11 @@
                 self.assertEqual(e.sequence_nr, -1)
                 found_empty = True
 
-        for (fwd_name, bwd_name), ops in autograd_ops.items():
+        for idx, ((fwd_name, bwd_name), ops) in enumerate(autograd_ops.items()):
             self.assertEqual(len(ops), 3)
             self.assertEqual(ops[0].name, fwd_name)
-            self.assertEqual(ops[1].name, f"autograd::engine::evaluate_function: {bwd_name}Backward0")
-            self.assertEqual(ops[2].name, f"{bwd_name}Backward0")
+            self.assertEqual(ops[1].name, f"autograd::engine::evaluate_function: {bwd_name}Backward{idx}")
+            self.assertEqual(ops[2].name, f"{bwd_name}Backward{idx}")
             self.assertGreaterEqual(ops[0].sequence_nr, 0)
             self.assertEqual(ops[1].sequence_nr, ops[0].sequence_nr)
             self.assertEqual(ops[2].sequence_nr, ops[0].sequence_nr)