Both those types of optimization can be of use, but there's one more important thing that you need to keep in mind when working on performant systems: cache friendliness. Using flat data structures instead of node-based ones means that you need to perform less pointer chasing at runtime, which helps your performance. Using data that's contiguous in memory, regardless of whether you're reading it forward or backward, means your CPU's memory prefetcher can load it before it's used, which can often make a huge difference. Node-based data structures and the mentioned pointer chasing cause random memory access patterns that can "confuse" the prefetcher and make it impossible for it to prefetch correct data.
If you want to see some performance results, please refer to the C++ Containers Benchmark linked in the Further reading section. It compares various usage scenarios of std::vector, std::list, std::deque, and plf::colony. If...