tag:blogger.com,1999:blog-326664202017-06-17T13:53:48.694-07:00bitRAKE StudioFragments of discovery littered with nuggets of innovationRickey Bowershttps://plus.google.com/103690306975404683409noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-32666420.post-31133712275127427132013-01-27T23:21:00.000-08:002013-01-27T23:21:06.576-08:00Still alive and coding...Al Zimmermann has a new contest. Always fun to flex one's wits. Give it a try:<br /><br /><a href="http://azspcs.net/Contest/Factorials">http://azspcs.net/Contest/Factorials</a>Rickey Bowershttps://plus.google.com/103690306975404683409noreply@blogger.com2tag:blogger.com,1999:blog-32666420.post-14331455108823752612011-03-09T23:16:00.000-08:002011-03-09T23:22:44.069-08:00Automata Captcha Idea<a href="http://demonstrations.wolfram.com/TexturePerceptionPatterns/">http://demonstrations.wolfram.com/TexturePerceptionPatterns/</a><div><br /></div><div>Seeing this demonstration reminded me of a captcha idea I had a few years ago. The human user would need to define the image outline they perceived. bitRAKE.com got hacked so I stopped work on the PHP code, and had almost forgotten about it.</div>Rickey Bowershttps://plus.google.com/103690306975404683409noreply@blogger.com0tag:blogger.com,1999:blog-32666420.post-38979172101359494122007-12-15T12:26:00.000-08:002008-01-01T09:13:48.576-08:00Collatz Conjecture<p>For me the most fasinating thing about the Collatz Conjecture is how it so blantantly points to a deficiency in mathematics. It seems incredibly obvious that it is true. The problem lies in a way to say that no loops are created excluding one - a limited understanding of process dynamics.<br /><br />Recently, I've changed the problem to eliminate the split condition - there is no reason to divide even numbers by two. Without the split we have a simplified algorithm, but it must work with larger numbers. This doesn't seem like an intuitive step.<br /><br /><br />1. y <= 1+Shr[Xor[x,x-1],1]<br /><br />2. if x=y then step 5<br /><br />3. x <= 3x + y<br /><br />4. goto step 1<br /><br />5. number is a power of two<br /><br /><br /><br />The first step calculates the value of least set bit. Instead of dividing by two, one is scaled up to match the bit which would make (x) odd if it were down scaled. Only when a single bit is set does the algorithm end (step 2). (y) can be tracked progressively instead of looking at the complete representation of (x) as step 3 always clears the least set bit (or more). Of course, I've coded this in x86: <span style="font-family:courier new;color:#005500;"><br /></span><span style="font-family:courier new;color:#005500;"><pre>; check Collatz Conjecture on multi-percision number<br />;<br />; carry flag set on array overflow<br />; carry flag clear on pass<br />; (otherwise it loops forever: not likely, but hasn't been proven)<br />;<br />; NOTE: array should be twice the size of number to test - too large<br />; is not a problem as only active bits are operated on.<br />CollatzArrayQ:<br /> label .arr at esp+4+32 ; BigInt<br /> label .len at esp+8+32 ; length (dwords)<br /><br /> pushad<br /> mov ebx,[.arr]<br /> mov eax,[.len]<br /> mov edx,1<br /> mov esi,ebx<br /> lea ebp,[ebx+eax*4]<br />; address past number in EDI (active end marker)<br />; (need at least one zero dword on end!)<br />.z: dec eax<br /> test dword [ebx+eax*4],-1<br /> je .z<br /> lea edi,[ebx+eax*4+4]<br /> cmp edi,ebp<br /> je .over<br /><br />; advance to lowest set bit<br />.0: test [esi],edx<br /> jne .1<br />.0a: rol edx,1<br /> jnc .0 ; HINT: taken<br /> add esi,4<br /> cmp esi,ebp ; end<br /> cmovnc esi,ebx ; reset<br /> jmp .0<br />; ESI[ECX] is first set bit, multiply by three and add ESI[ECX]<br />; all (virtual) lower bits are zero.<br />.1:<br /> ; test if buffer is == ESI[ECX] then done!<br /> cmp [esi],edx<br /> jne .kx ; HINT: taken<br /> ; if EDI==ESI+4 or (ESI+4==EBP && EDI==EBX)<br /> ; (faster than testing all of buffer)<br /> mov ecx,edi<br /> lea eax,[esi+4]<br /> cmp edi,ebx<br /> cmove ecx,ebp<br /> cmp eax,ecx<br /> je .x<br />.kx:<br /> push edx<br /> push esi<br />.2:<br /> mov eax,3<br /> mov ecx,edx<br /> mul dword [esi]<br /> add eax,ecx<br /> adc edx,0<br /> mov [esi],eax<br /><br /> add esi,4<br /> cmp esi,ebp ; end<br /> cmovnc esi,ebx ; reset<br /><br /> cmp esi,edi<br /> jne .2<br /><br /> test edx,edx<br /> je .5<br /><br /> cmp edi,[esp]<br /> je .over<br /> mov [edi],edx<br /> add edi,4<br /> cmp edi,ebp ; end<br /> cmovnc edi,ebx ; reset<br /><br />.5: pop esi<br /> pop edx<br /> jmp .0a ; we just cleared that bit<br /><br />.x: popad<br /> clc<br /> retn 8<br /><br />.over: popad<br /> stc<br /> retn 8</pre></span>Try a really big number - like 2^131072 + 3<br /><br /></span><span style="font-family:courier new;color:#005500;"><pre>TestBuffer rb 32*1024 ; zero bytes<br /><br />mov dword [TestBuffer],3<br />mov dword [TestBuffer+16*1024],1<br /><br />push 32*1024/4 ; L1 cache size (c:<br />push TestBuffer1<br />call CollatzArrayQ<br />; 3829ms on my machine</pre></span>Why is this important? The algorithm is no longer concerned with value of (x) except for the exit condition - the same operation takes place each loop. It is sufficient to show that (y) -> (x) for all (x), and the conjecture is true. The multiplication by three is irrelavant because the addition propells the least set bit at a rate greater than the constant multiple!<br /><br />(c:<br /><br /><strong>Update:</strong><br /><br /></span><blockquote>Status of the 3x+1 class record progress as of: Dec 13, 2007. Intervals are<br />indicated in blocks of 20,000,000,000,000 (20 E+12 or 20 trillion numbers).<br />Calculation of a single block takes approximately 20 days of (idle) time on a<br />Pentium 1000 MHz.</blockquote><a href="http://www.ericr.nl/wondrous/progress.html">Basically, a trillion per day, per Ghz</a>.<br /><br />Naturally, I'm currious how my routine compares:<br />Currently, they are working on 7 * 2^53 block of numbers. I ran a test of my algorithm for 2^24 numbers (in that block range) which took 38781ms. Which means it is roughly 25 times <strong>slower</strong>. Their algorithm is tuned for this size and uses large data structures (to avoid processing numbers). I might be able to double the speed of my algorithm, but confirming ranges of numbers is not what it's designed for.Rickey Bowershttps://plus.google.com/103690306975404683409noreply@blogger.com2tag:blogger.com,1999:blog-32666420.post-63249490468466920892007-11-11T12:08:00.000-08:002007-12-15T15:46:35.512-08:00Slices of Weekend PiIrrational numbers seem to have their hooks into nature somehow. Clearly, they are the product of something fundamental, but I think they also point out limits in the way we represent ideas. The really special ones get a symbol that we use without regard for the unique abundance they represent.<br /><br />There is also a bit of irony in irrational numbers. Mathematics was first designed to track finances and military forces in an attempt to add great stablity. Imagine the surprise when even the simiplest of systems could produce absolute chaos and uncertainty.<br /><br /><a href="http://en.wikipedia.org/wiki/Pi">Pi(π)</a> is the most famous irrational number and countless lifetimes have been spent in it's study and popularization. Yesterday I spent a couple hours devising an x86 version of the <a href="http://en.wikipedia.org/wiki/BBP_formula#BBP_algorithm">BBP algorithm</a>:<pre>; NOTE: This isn't the fastest way to compute pi approximations.<br />;<br />; TESTING STATUS:<br />; confirmed valid for all input [0,10000]<br />;<br />; digit time (1.6Ghz Dothan)<br />; ------------------------------------------<br />; 10^4 16 ms<br />; 10^5 219 ms<br />; 10^6 2860 ms 6C65D566<br />; 10^7 33938 ms<br />; 10^8 407828 ms<br />;<br />; Pi hex digit extraction:<br />;3.<br />; 243F6 A8885 A308D 31319 8A2E0 37073 44A40 93822<br />; 299F3 1D008 2EFA9 8EC4E 6C894 52821 E638D 01377<br /><br />; 16^EBP MOD ECX=8k+z<br />; Special cases:<br />; EBP=0, return 1<br />; ECX=1, return 0<br />;<br />; EDX is return value<br />mpow: xor edx,edx<br /> cmp ecx,1<br /> je .x<br /> bsr ebx,ebp<br /> mov edx,1<br /> jne .1<br />.x: retn<br /><br />.0: mul eax<br /> div ecx<br />.1: bt ebp,ebx<br /> mov eax,16<br /> jnc .2<br /> mul edx<br /> div ecx<br />.2: dec ebx<br /> mov eax, edx<br /> jns .0<br /> retn<br /><br />series:<br /> xor edi,edi ; fraction<br />.1: call mpow ; 16^EBP mod ECX<br /> xor eax,eax<br /> div ecx ; EDX:0 / 8k+m<br /> add ecx,8 ; 8k+m, k[0,EBP]<br /> add edi,eax<br /> dec ebp<br /> jns .1<br /><br /> mov ebp,10000000h<br />.2: mov eax,ebp<br /> xor edx,edx<br /> div ecx<br /> add ecx,8<br /> shr ebp,4<br /> cmp ecx,ebp<br /> lea edi,[edi+eax]<br /> jna .2<br /> retn<br /><br />PiHex:<br /> pushad<br /> mov ebp,[esp+36]<br /> mov ecx,1<br /> call series<br /> lea esi,[edi*4]<br /><br /> mov ebp,[esp+36]<br /> mov ecx,4<br /> call series<br /> sub esi,edi<br /> sub esi,edi<br /><br /> mov ebp,[esp+36]<br /> mov ecx,5<br /> call series<br /> sub esi,edi<br /><br /> mov ebp,[esp+36]<br /> mov ecx,6<br /> call series<br /> sub esi,edi<br /> mov [esp+28],esi<br /> popad<br /> retn 4<br /><br /><br /> invoke PiHexDigit, 1000000<br /></pre>(FASM Syntax)Rickey Bowershttps://plus.google.com/103690306975404683409noreply@blogger.com0tag:blogger.com,1999:blog-32666420.post-33332466081347951392007-10-30T15:49:00.000-07:002007-12-15T15:47:20.114-08:00Crossing Segment IntersectionsI've been away from <a href="http://projecteuler.net">Project Euler</a> for several months, and decided to try some of the new problems for fun. Not being shy about a challenge I just chose the <a href="http://projecteuler.net/index.php?section=problems&id=165">most recent one</a>. Couple hours and implementations later, and the solution remains out of reach. <br /><br />It didn't even seem that difficult, really. My first two tries were completely from my gut. One was a brute force stab with sorting in one dimension: usually these attempts just help me get a feel for the problem. With 5000 lines that means over 12 million intersection test - sorting and early exit dropped it down to about 4 million. Certainly within range of brute forcing on modern processors.<br /><br />The intersection test itself seemed to be problem. So, I used a little topology to devise a test based strictly on triangle area, given the connected graph of the four points. Got a different result, but it was close enough to the last one to give a sense of accomplishment. (It's really fast and merits further research.)<br /><br />Where could the problem be? I searched the web for algorithms - seems line sweeping is all the rage. Quickly, I plugged a couple different intersection tests into my first attempt. Two more different answers!<br /><br />Have to put the problem aside for now...Rickey Bowershttps://plus.google.com/103690306975404683409noreply@blogger.com2tag:blogger.com,1999:blog-32666420.post-49002974281184550732007-10-19T14:38:00.000-07:002007-12-15T15:45:38.754-08:00An Improvement to RandomRecently, I stumbled upon <a href="http://home.southernct.edu/~pasqualonia1/ca/report.html">another PRNG</a> (pseudo-random number generator) which had some very unique properties. Not only does the generator score high on tests, but functionally the operations transfer easily into a compact set of instructions - making it very fast. Based on a rather large cellular automata, I didn't think it would be faster than <a href="http://www.stephenwolfram.com/">Steven Wolfram</a>'s <a href="http://mathworld.wolfram.com/Rule30.html">Rule 30</a> (used in <a href="http://www.wolfram.com/">Mathematica</a> for <a href="http://reference.wolfram.com/mathematica/ref/RandomInteger.html">RandomInteger[]</a>); but the byte granularity of the 1D automata reduces to three loads, an add, and a store for each byte. No multiply, division, long dependancy chain, nor multiple passes. Furthermore, if a large random list is updated only when all the values have been used further speed is gained.<br /><br />I created a macro in MASM to ease direct access to the random array:<br /><br /><span style="font-family:courier new;color:#005500;"><pre> mov ebp, PRNG_CA_Pasqualoni_StateWidth / 4<br />_0: PRNG_CA_Pasqualoni eax, ebp<br /> ; (do somthing with random data)<br /> jXX _0 ; until exit condition reached</pre></span>Here the random data is stored in EAX, and EBP indexes into the array. If using a register to index into the array seems like a bad idea then try to write a function with less overhead than load/storing a register. The intent is to churn through a massive number of random integers.<br /><br />The features of MASM have been used to encapsulate the routine within an object module. Although the code is not lengthy it demonstrates coding techniques which make code reusable and easy to understand. The interface is described in a concise way within the include file - there is no need to look at the code to use the functions. Following the defined interface allows the code to operate in a transparent way. Within the assembly file all the nuances are explained making modification in the future less error prone. The file is small enough to fit on a couple screens of text and be quickly comprehended.<br /><br />(Below is my MASM conversion.)<br /><a href="http://www.box.net/lite/5zi3mjgtx9"><img src="http://www.box.net/lite/image/5zi3mjgtx9.png" border="0"></a>Rickey Bowershttps://plus.google.com/103690306975404683409noreply@blogger.com3