simpleperf: Switch to use post-unwind by default in record cmd.

When recording google.sample.tunnel app for 30s:
It took 3s to unwind samples and write unwound samples to file.
It took 0.3s to write samples containing stack/reg data to file.

The result shows recording with post unwinding consumes much
less time than unwinding samples immediately. This means we can
record with higher freq and get smaller lose rate when using
post unwinding. So make below changes:
1. Make post unwinding by default.
2. Replace --post-unwind with --no-post-unwind option.
3. Make --trace-offcpu and callchain joiner work with post unwinding.
4. Remove special operations in --log debug mode. Those will be
   supported in a new command.

Bug: http://b/72556486
Test: run simpleperf_unit_test.
Test: run python test.py.

Change-Id: I9a5a5defda9d040985e674c43db19ee68e7aa305
6 files changed