vampyr wrote:Funny, how close the compiler can come to highly optimized assembly, not?
The problem (for me at least) that it isn't possible to write directly in GPU assembly. Brook+ taking C-like code and translate it into ATI's IL (intermediate language), then this IL compiled by calclCompile function from ATI's drivers pack into GPU ISA and finally used by GPU device. I've just get rid of Brook+ -> IL step. Because Brook itself is really crappy thing and I see no point to fight with Brook to force it produce good IL code rather than writing IL code from scratch by my own. All optimizations performed by calclCompile() function not Brook, so it's kinda no point to optimize kernels one step away from actual optimizer.
Anyway, for me it doesn't looks so weird that SHA1 speed nearly the same for different methods. After all it's absolutely the same algorithm compiled and optimized by absolutely the same CAL layer. The only differences can be in initial "thread id to ansi/unicode password" transformation, sha1 body function is exactly the same and there can be some variations among passwords per thread, threads per block distribution. However, my latest sha1 version got nice speed-up even without bitalign instructions (about 20%) and yours dropped by 10% (though it supports multihashing). So I see a lot of fights "you vs Brook" in future
Unfortunately, I wasn't able to test your program at all. It simply producing a lot of crash messages and pseudographics but nothing useful. May be this caused by my 5770+4770/Vista64 config, may be not but I'm just lazy to figure out what's wrong. I can't call your readme.txt a documentation neither
. May be D3ad0ne will be more lucky.