If you start naively without any library that avoids the problem then memory access is the problem. Have a look at how much effort is needed to avoid the problem, for example with blocking algorithms.