Sc00bz wrote:Side note: I still don't understand how his 32 bit version is so fast interlacing 3 with SSE2 to me seems like it would be slower than interlacing SSE2 and MMX. Well I guess it has to do with cache (for interlacing 3 with SSE2) and/or that SSE2 and MMX have a performance hit such as with switching from floating point to integer math with SSE.
let's revive another old topic... but ehm, i'm currently playing around with my NTLM brute forcer, and i have something strange with interlacing SSE2 more and more
my system: Q9450 @ 3.2Ghz, 4 threads:
1x SSE2 : 110 Mhashes/s
2x SSE2 : 150 Mhashes/s
3x SSE2 : 175 Mhashes/s
4x SSE2 : 200 Mhashes/s
(5x gets slower, like 170)
shouldn't it just stop getting faster at like 2 or 3xSSE2 ??
or is this just a way to get VC++ 2008 express to arrange instructions efficient enough to have 2 or 3 simultaneously executed?
with my MD5 brute forcer it doesn't do this, that goes from like 77 to 100 to 114 back to 105 Mhashes/s when doing 1x, 2x, 3x, 4x SSE2)