1) Do I then need to do the following in thread 3?
What are you trying to do?
2) *thread_var = __atomic_load_n(thread_var, __ATOMIC_ACQUIRE);
This reads the value of *thread_var, returns it, then (non-atomically) stores it back in *thread_var. What's the point of that? If you need the value, you still haven't read it back from *thread_var (and you may have created a race condition by writing to *thread_var.)
3) __atomic_load_n(thread_var, __ATOMIC_ACQUIRE);
This loads the value then discards it again. I think you want:
int var = __atomic_load_n(thread_var, __ATOMIC_ACQUIRE);