c11_atomics: do not overallocate local memory for embedded devices #690. (#691)

The spec states that the minimum amount of local memory for embedded devices is 1KB. This change clamps work group sizes to 1024 for embedded devices, and sets the number of local variables per thread to 1.

Fixes #690.
1 file changed