SSE3, SSSE3, SSE4a, SSE4.1, and SSE4.2 are not just for floating point operations but most of the instructions are pointless for cracking passwords.
(All the other instructions for SSE3 are for floating points):
LDDQU - This is an alternative misaligned integer vector load. It can be helpful for video compression tasks.
MONITOR, MWAIT - These optimize multi-threaded applications, giving processors with Hyper-Threading better performance.
(This added only integer instructions and this one is the best out of them):
PSHUFB - Packed Shuffle Bytes - takes registers of bytes A = [a0 a1 a2 ...] and B = [b0 b1 b2 ...] and replaces A with [a[b0] a[b1] a[b2] ...]; except that it replaces the ith entry with 0 if the top bit of bi is set.
(There are other integer instructions but these are the best):
PMULLD - Packed signed multiplication, 4 packed sets of 32-bit integers multiplied to give 4 packed 32-bit results.
PTEST - This does the same as the TEST instruction, in that it sets the ZF and CF flags to the result of an AND between its operators ... it sets the Z flag if any of the bits matched, and the C flag if all of them did.
have integer instructions but they aren't that useful. POPCNT and LZCNT are nice for parity bit check and estimating the log base 2 of a number. EXTRQ and INSERTQ are kinda cool but worthless in most cases since it only works with one 64 bit segment instead of four 32 bit segments.
: I keep forgetting what these instructions are but it has a few instructions for AES. Right, this has PROTD which will increase speed a lot since this will replace a mov, two shifts, and an or (43 rotates * 3 instructions = 129 instructions less per four MD5s).
: This is the most awesome thing ever, but totally sucks because it could double performance but the initial release in 2010 will not support an 256 bit integer instructions. It still will speed things up because it has 3 operand instructions. This means that you won't need to move a register's contents to another register. Granted there are these floating point instructions ANDPS, ANDPD, ORPS, ORPD, and XORPS, XORPD which I don't know how those work but it looks like they just do and, or, and xor. So you can do DES really fast if it works just like integer and, or, and xor.
* Note that SSE5 will use less instructions than AVX and probably be faster until AVX adds 256 bit integer support because there is no rotate instruction in AVX. Only problem is that AMD is the only one who is doing SSE5 and SSE5 might be dropped by AMD since AVX is much better. This is just like when AMD came up with 3DNow and Intel came up with SSE. Now AMD needs to support 3DNow because old programs still use it. So AMD might drop SSE5 to prevent needing to support SSE5 even though not many people use it.