fix: normalize env vars used for input and output to gcp extraction (#3536)

`_OUTPUT_GS_BUCKET` to `_BUCKET_NAME`

`_VERSION` to `_COMMIT`

Also stuff output into a gs bucket subdir based on `_CORPUS` and remove
`_CORPUS-` from the file prefix.
11 files changed