How to write demos that work (Version 5) - 18/3/93 =================================================== (or the Amiga Demo Coders Reference Manual) Edited by Comrade J/SAE (ex demo maniac) Co-Editor post vacant (apply by email) * Please note this is a REPLACEMENT to text files howtocode1.txt through howtocode4.txt. Sysops, please remove these earlier files as they contain many mistakes. Thanks in advance...* Thanks to: Vic Ricker, Grue, Timo Rossi, Jesse Michael, John Derek Muir, Boerge Noest, Christopher Klaus, Doz/Shining, Andrew Patterson, Walter Dao, Chris Green, Magnus Timmerby, Patrik Lundquist, Raymond Penners, the otherwise anonymous u920659@daimi.aau.dk, Matthew Arnold, TGR/Anthrox, Tero Lehtonen, Carl-Henrik Sk}rstedt (that's how it's spelt via 7-bit ASCII!!), Arno Hollosi, Irmen de Jong and Jonas Matton for their comments and contributions. Thanks also to CS who didn't want a credit but I'd like to say thank you anyway... Introduction ============ This file has grown somewhat from the file uploaded over Christmas 1992. I've been very busy over the last two months, so sorry that I haven't been able to update this sooner. It started as an angry protest after several new demos I downloaded refused to work on my 3000, and has ended up as a sort of general how-to-code type article, with particular emphasis on the Amiga 1200. Now, as many of you may know, Commodore have not released hardware information on the AGA chipset, indeed they have said they will not (as the registers will change in the future). Demo coders may not be too concerned about what is coming in a year or two, but IF YOU ARE WRITING COMMERCIAL SOFTWARE you must be. Chris Green, from Commodore US, asked me to mention the following: "I'd like it if you acknowledged early in your text that it IS possible to do quite exciting demos without poking any hardware registers, and that this can be as interesting as direct hardware access. amiga.physik.unizh.ch has two AGA demos with source code by me, AABoing and TMapdemo. These probably seem pretty lame by normal demo standards as I didn't have time to do any nifty artwork or sound, and each only does one thing. but they do show the POTENTIAL for OS friendly demos." I have seen these demos and they are very neat. Currently you cannot do serious copper tricks with the OS (or can you Chris? I'd love to see some examples if you can...), for example smooth gradiated background copperlists or all that fun messing with bitplane pointers and modulos. But for a lot of things the Kickstart 3.0 graphics.library is capable of lots. If you are in desperate need for some hardware trick that the OS can't handle, let Chris know about it, you never know what might make it into the next OS version! Chris mentions QBlit and QBSBlit, interrupt driven blitter access. These are things that will make games in particular far easier to write under the OS now. Chris also says "Note that if I did a 256 color lores screen using this document, it would run fifty times slower than one created using the OS, as you haven't figured out enhanced fetch modes yet. A Hires 256 color screen wouldn't even work." There are some new additions to the AGA chapter that discuss some of this problem, but if you want maximum performance from an AGA system, use the OS. Remember that on the A1200 chipram has wait-states, while the 32-bit ROM doesn't. So use the ROM routines, some of them run faster than anything you could possibly write (on a A1200 with just 2Mb ram). The only drawback is again documentation. To learn how to code V39 OS programs you need the V39 includes and autodocs, which I'm not allowed to include here a) because I've signed an NDA, and b) because they're massive... Perhaps, in a later release, I'll give some highlites of V39 programming... Get Chris Green's example code, it's a good place to start. Register as a developer with your local Commodore office to get the autodocs and includes, it's relatively inexpensive (£85 per year in the UK) --- Most demos I've seen use similar startup code to that I was using back in 1988. Hey guys, wake up! The Amiga has changed quite a bit since then. So. Here are some tips on what to do and what not to do: 1. RTFM. ======== Read the f'ing manuals. All of them. Borrow them off friends or from your local public library if you have to. Read the "General Amiga Development Guidelines" in the dark grey (2.04) Hardware Reference Manual and follow them TO THE LETTER. If it says "Leave this bit cleared" then don't set it! Don't use self-modifying code. A common bit of code I see is: ... in the setup code move.l $6c.w,old ; Store Level 3 interrupt. ; Naughty... Naughty. .. at the end of the interrupt movem.l (sp)+,a0-a6/d0-d7 dc.w $4ef9 ; jmp instruction old dc.l 0 ; self modifying!!!! DONT DO THIS! 68020 and above processors with cache enabled often barf at this piece of code (the cache still contains the JMP 0 instruction which isn't then altered). Interrupts should be set up with the AddIntServer(), SetIntVector() or AddIntHandler() functions. Read the chapter on Interrupts in the Amiga Rom Kernal Manual: Libraries 2. Proper Copper startup. ========================= (Please look at the startup example code at the end of this file). IF you are going to use the copper then this is how you should set it up. The current workbench view and copper address are stored, and then the copper enabled. On exit the workbench view is restored. This guarantees(*) your demo will run on an AGA (Amiga 1200) machine, even if set into some weird screen mode before running your code. Otherwise under AGA, the hardware registers can be in some strange states before your code runs, beware! The LoadView(NULL) forces the display to a standard, empty position, flushing the rubbish out of the hardware registers: Note. There is a bug in the V39 OS on Amiga 1200/4000 and the sprite resolution is *not* reset, you will have to do this manually if you use sprites (See below...) Two WaitTOF() calls are needed after the LoadView to wait for both the long and short frame copperlists of interlaced displays to finish. See the bottom of this file for a full, tested, example startup.asm code, that you can freely use for your own productions. It has been suggested to me that instead of using the GfxBase gb_ActiView I should instead use the Intuition ib_ViewLord view. This will work just as well, but there has been debate as to whether in the future with retargetable graphics (RTG) this will work in the same way. As the GfxBase is at a lower level than Intuition, I prefer to access it this way (but thank's for the suggestion Boerge anyway!). Using gb_ActiView code should run from non-Workbench environments (for example, being called from within Amos) too... * - Nothing is ever guaranteed where Commodore are involved. They may move the hardware registers into chipram next week :-) 3. Your code won't run from an icon. ==================================== You stick an icon for your new demo (not everyone uses the CLI!) and it either crashes or doesn't give back all the RAM it uses. Why? Icon startup needs specific code to reply to the workbench message. With the excellent Hisoft Devpac assember, all you need to do is add the line include "misc/easystart.i" and it magically works! For those without Devpac, here is the relevent code: --------------------------------------------------------- * Include this at the front of your program * after any other includes * note that this needs exec/exec_lib.i IFND EXEC_EXEC_I include "exec/exec.i" ENDC IFND LIBRARIES_DOSEXTENS_I include "libraries/dosextens.i ENDC movem.l d0/a0,-(sp) save initial values clr.l returnMsg sub.l a1,a1 move.l 4.w,a6 jsr _LVOFindTask(a6) find us move.l d0,a4 tst.l pr_CLI(a4) beq.s fromWorkbench * we were called from the CLI movem.l (sp)+,d0/a0 restore regs bra end_startup and run the user prog * we were called from the Workbench fromWorkbench lea pr_MsgPort(a4),a0 move.l 4.w,a6 jsr _LVOWaitPort(A6) wait for a message lea pr_MsgPort(a4),a0 jsr _LVOGetMsg(A6) then get it move.l d0,returnMsg save it for later reply * do some other stuff here RSN like the command line etc nop movem.l (sp)+,d0/a0 restore end_startup bsr.s _main call our program * returns to here with exit code in d0 move.l d0,-(sp) save it tst.l returnMsg beq.s exitToDOS if I was a CLI move.l 4.w,a6 jsr _LVOForbid(a6) move.l returnMsg(pc),a1 jsr _LVOReplyMsg(a6) exitToDOS move.l (sp)+,d0 exit code rts * startup code variable returnMsg dc.l 0 * the program starts here even _main --------------------------------------------------------- 4. How do I tell if I'm running on an Amiga 1200/4000? ====================================================== Do *NOT* check library revision numbers, V39 OS can and does run on standard & ECS chipset machines (This Amiga 3000 is currently running V39). This code is a much better check for AGA than in the last issue!!!!! GFXB_AA_ALICE equ 2 gb_ChipRevBits0 equ $ec ; Call with a6 containing GfxBase from opened graphics.library btst #GFXB_AA_ALICE,gb_ChipRevBits0(a6) bne.s is_aa Chris Green pointed this out to me. He says quite rightly that the $dff07c register bits mentioned last time may very well change if the chip design is changed, even for new production models of the AA chipset. Thanks! This will not work unless the V39 SetPatch command has been executed, so forget about Trackloader demos (and I wish you would! Some of us want to put your demos on our hard disk). Remember you can use Fast File System and Directory Caching System floppy disks on the A1200. The code in the last issue also had major problems when being run on non ECS machines (without Super Denise or Lisa), as the register was undefined under the original (A) chipset, and would return garbage, sometimes triggering a false AGA-present response. 5. Use Relocatable Code ======================= If you write demos that run from a fixed address you should be shot. NEVER EVER DO THIS. It's stupid and completely unnecessary. Now with so many versions of the OS, different processors, memory configurations and third party peripherals it's impossible to say any particular area of ram will be free to just take and use. It's not as though allocating ram legally is dificult. If you can't handle it then perhaps you should give up coding and take up graphics or something :-) If you require bitplanes to be on a 64Kb boundary then try the following (in pseudo-code because I'm still too lazy to write it in asm for you): for c=65536 to (top of chip ram) step 65536 if AllocAbs(c,NUMBER_OF_BYTES_YOU_WANT) == TRUE then goto ok: next c: print "sorry. No free ram. Close down something and retry demo!" stop ok: Run_Outrageous_demo with mem at c Keep your code in multiple sections. Several small sections are better than one large section, they will more easily fit in and run on a system with fragmented memory. Lots of calls across sections are slower than with a single section, so keep all your relevent code together. Keep code in a public memory section: section mycode,code Keep graphics, copperlists and similar in a chip ram section: section mydata,data_c Never use code_f,data_f or bss_f as these will fail on a chipram only machine. And one final thing, I think many demo coders have realised this now, but $C00000 memory does not exist on any production machines now, so stop using it!!! 6. Don't Crunch demos! ====================== Don't ever use Tetrapack or Bytekiller based packers. They are crap. Many more demos fall over due to being packed with crap packers than anything else. If you are spreading your demo by electronic means (which most people do now, the days of the SAE Demodisks are long gone!) then assemble your code, and use LHARC to archive it, you will get better compression with LHARC than with most runtime packers. If you *have* to pack your demos, then use Powerpacker 4+, Turbo Imploder or Titanics Cruncher, which I've had no problems with myself, although I have heard of problems with some of these on 68040 machines. If it will decrunch on a 68040 with caches enabled it will probably work on everything. (found in the documentation to IMPLODER 4.0) >** 68040 Cache Coherency ** > >With the advent of the 68040 processor, programs that diddle with code which is >subsequently executed will be prone to some problems. I don't mean the usual >self-modifying code causing the code cached in the data cache to no longer >be as the algorithm expects. This is something the Imploder never had a >problem with, indeed the Imploder has always worked fine with anything >upto and including an 68030. > >The reason the 68040 is different is that it has a "copyback" mode. In this >mode (which WILL be used by people because it increases speed dramatically) >writes get cached and aren't guaranteed to be written out to main memory >immediately. Thus 4 subsequent byte writes will require only one longword >main memory write access. Now you might have heard that the 68040 does >bus-snooping. The odd thing is that it doesn't snoop the internal cache >buses! > >Thus if you stuff some code into memory and try to execute it, chances are >some of it will still be in the data cache. The code cache won't know about >this and won't be notified when it caches from main memory those locations >which do not yet contain code still to be written out from the data caches. >This problem is amplified by the absolutely huge size of the caches. > >So programs that move code, like the explosion algorithms, need to do a >cache flush after being done. As of version 4.0, the appended decompression >algorithms as well as the explode.library flush the cache, but only onder OS >2.0. The reason for this is that only OS 2.0 has calls for cache-flushing. > >This is yet another reason not to distribute imploded programs; they might >just cross the path of a proud '40 owner still running under 1.3. > >It will be interesting to see how many other applications will run into >trouble once the '40 comes into common use among Amiga owners. The problem >explained above is something that could not have been easily anticipated >by developers. It is known that the startup code shipped with certain >compilers does copy bits of code, so it might very well be a large problem. Look at some new EXEC-functions to solve this problem: CacheClearU() and CacheControl() Both functions are available with Kickstart 2.0 and above. I strongly disadvise trying to 'protect' code by encrypting parts of it, it's very easy for your code to fail on >68000 if you do. What's the point anyway? Lamers will still use Action Replay to get at your code. I never learnt anything by disassembling anyones demo. It's far more dificult to try and understand someone elses (uncommented) code than to write your own code from scratch. 7. Don't use the K-Seka assembler! ================================== It's dead and buried. Get a life, get a real assembler. Hisoft Devpac is probably the best all-round assembler, although I use ArgAsm which is astonishingly fast. The same goes for hacked versions of Seka. Is it any coincidence that almost every piece of really bad code I see is written with Seka? No, I don't think so :-) When buying an assembler check the following: 1. That it handles standard CBM style include files without alteration. 2. That it allows multiple sections 3. That it can create both executable and linkable code 4. 68020+ support is a good idea. Devpac 3.0 is probably the best all-round assembler at the moment. People on a tighter budget could do worse than look at the public domain A68K (It's much better than Seka!). I'd suggest using Cygnus Ed as your Text Editor. 8. Don't use the hardware unless you have to! ============================================= This one is aimed particularly at utility authors. I've seen some *awfully* written utilities, for example (although I don't want to single them out as there are plenty of others) the Kefrens IFF converter. There is NO REASON why this has to have it's own copperlist. A standard OS-friendly version opening it's own screen works perfectly (I still use the original SCA IFF-Converter), and multitasks properly. If you want to write good utilities, learn C. 9. Beware bogus input falling through to Workbench ================================================== If you keep multitasking enabled and run your own copperlist remember that any input (mouse clicks, key presses, etc) fall through to the workbench. The correct way to get around this is to add an input handler to the IDCMP food chain (see - you *do* have to read the other manuals!) at a high priority to grab all input events before workbench/cli can get to them. You can then use this for your keyboard handler too (no more $bfexxx peeking, PLEASE!!!) Look at the sourcecode for Protracker for an excellent example of how to do the job properly. Well done Lars! 10. Have fun! ============= Too many people out there (particularly the American OS-Lamic Fundamentalists) try to tell us that you should never program at a hardware level. If you're programming for fun, ignore them! But try and put a little thought into how your code will work on other machines, nothing annoys people more than downloading 400Kb of demo and then finding it blows up on their machines. I'm not naming any names, but there are quite a few groups who I have no intention of downloading their demos again because I know it's a waste of download. With the launch of the Amiga 1200 you cannot just write for 1.3 Amiga 500's any more. I'd like to apologise to all Americans for blaming OS-Fundamentalism on them. I've since heard from *two* American hardware hackers. :-) I guess I ought to point out that 90% of my programs are now fully OS legal, although I am writing an AGA hardware-hacking demo for the 1200 now... Demo and Source available soon.... As soon as I have finished that I am writing a fully OS AGA demo, because I HAVE SEEN THE LIGHT! SATAN MADE ME USE THAT HARDWARE MANUAL. I WILL NEVER POKE A REGISTER AGAIN, or something like that... As usual full demo *and* source will be uploaded. If anyone has any ideas of what I should do (and I'd also appreciate a nice short tracker module...) you know where to send them... 11. Don't Publish Code you haven't checked! =========================================== Thanks to Timo Rossi for spotting the stupid bug in my copper setup routine (using LOFList instead of copinit). Funnily enough my own setup routine uses the correct copinit code: Please ignore the original file (howtocode[1|2|3|4].txt) and use this instead. 12. Copper End ============== I've remembered where this double copper end comes from: The ArgAsm assembler has copper macros (CMOVE, CWAIT and CEND) built in, and the CEND macro deliberately leaves two copper END instructions, the manual states this is important for compatibility reasons.. Will whoever pinched my ArgAsm manual please return it? I bet it was you Alex.. 13. Using a 68010 processor =========================== The 68010 is a direct replacement for the 68000 chip, it can be fitted to the Amiga 500,500+,1500 and 2000 without any other alterations (I have been told it will not fit an A600). The main benefit of the 68010 over the 68000 is the loop cache mode. Common 3 word loops like: moveq #50,d0 .lp move.b (a0)+,(a1)+ ; one word dbra d0,.lp ; two words are recognised as loops and speed up dramatically on 68010. 14. Using the blitter. ====================== If you are using the blitter in your code and you are leaving the system intact (as you should) always use the graphics.library functions OwnBlitter() and DisownBlitter() to take control of the blitter. Remember to free it for system use, many system functions (including floppy disk data decoding) use the blitter. OwnBlitter() does not trash any registers. I guess DisownBlitter() doesn't either, although Chris may well correct me on this. Another big mistake I've seen is with blitter/processor timing. Assuming that a particular routine will be slow enough that a blitter wait is not needed is silly. Always check for blitter finished, and wait if you need to. Don't assume the blitter will always run at the same speed too. Think about how your code would run if the processor or blitter were running at 100 times the current speed. As long as you keep this in mind, you'll be in a better frame of mind for writing compatible code. Another big source of blitter problems is using the blitter in interrupts. Most demos do all processing in the interrupt, with only a .wt btst #6,$bfe001 ; is left mouse button clicked? bne.s .wt loop outside of the interrupt. However, some demos do stuff outside the interrupt too. Warning. If you use blitter in both your interrupt and your main code, (or for that matter if you use the blitter via the copper and also in your main code), you may have big problems.... Take this for example: lea $dff000,a5 move.l GfxBase,a6 jsr _LVOWaitBlit(a6) move.l #-1,BLTAFWM(a5) ; set FWM and LWM in one go move.l #source,BLTAPT(a5) move.l #dest,BLTDPT(a5) move.w #%100111110000,BLTCON0(a5) move.w #0,BLTCON1(a5) move.w #64*height+width/2,BLTSIZE(a5) ; trigger blitter There is *nothing* stopping an interrupt, or copper, triggering a blitter operation between the WaitBlit call and your final BLTSIZE blitter trigger. This can lead to total system blowup. Code that may, by luck, work on standard speed machines may die horribly on faster processors due to timing differences causing this type of problem to occurr. The safest way to avoid this is to keep all your blitter calls together, use the copper exclusively, or write a blitter-interrupt routine to do your blits for you. Always use the graphics.library WaitBlit() routine for your end of blitter code. It does not change any registers, takes into account any revision of blitter chip and any unusual circumstances, and on an Amiga 1200 will execute faster (because in 32-bit ROM) than from chipram. Another thing concerning blitter: Instead of calculating your LF-bytes all the time you can do this instead A EQU %11110000 B EQU %11001100 C EQU %10101010 So when you need an lf-byte you can just type: move.w #(A!B)&C,d0 15 NTSC ======= As an European myself, I'm naturally biased agains the inferior video system, but even though the US & Canada have a relatively minor Amiga community compared with Europe (Sorry, it's true :-) we should still help them out, even though they've never done a PAL Video Toaster for us (sob!). You have two options. Firstly, you could write your code only to use the first 200 display lines, and leave a black border at the bottom. This annoys PAL owners, who rightly expect things to have a full display. It took long enough for European games writers to work out that PAL displays were better. You could write code that automatically checked which system it is running on and ran the correct code accordingly: (How to check: Note, this is probably not the officialy supported method, but so many weird things happen with new monitors on AGA machines that I prefer this method, it's simpler, and works under any Kickstart) move.l 4.w,a6 ; execbase cmp.b #50,PowerSupplyFrequency(a6) ; 531(a6) beq.s .pal jmp I'm NTSC (or more accurately, I'm running from 60Hz power) .pal jmp I'm PAL (or I'm running from 50hz power). If people have already switched modes to PAL, or if they are running some weird software like the ICD Flicker Free Video Prefs thingy, then this completely ignores them, but that serves them right for trying to be clever :-) Probably better would be to check VBlankFrequency(a6) [530(a6)] as well, if both are 60Hz then it's definately a NTSC machine. If one or more are 50Hz, then it's probably a better idea to run in PAL. VBlankFrequency can give all sorts of weird things on an AGA system (DblPal runs at 48Hz, for example). Chris Green suggests checking GfxBase->DisplayFlags for PAL rather than what I do above. Well, If Commodore had fixed the bug in Kickstart 1.3 that was reported to them while Kickstart 1.2 was in beta (that a PAL machine, especially with a Genlock, often fails to report that it is PAL) then I'd use it. They did fix it in 2.0 though (at last!) along with the "Oh I've got $200000 RAM. I guess that means the user wants *two* mouse pointers in the PAL area" bug.. :-) So, for V1.2/1.3 do the PowerSupplyFrequency() check, on 2.04 or higher use GfxBase->DisplayFlags check as Chris suggests... Under Kickstart 2.04 or greater, the Display Database can be accessed. Any program can enquire of the database what type of displays are available, so for example "I want a 50hz 15Khz PAL screen. Can I display it on this Amiga?" (Unfortunately it doesn't take an ASCII string like that, but it's not much more dificult). Of course many users will have the default monitor installed (PAL or NTSC) and not realise that they can have extra modes by dragging the monitor icon into their Monitors drawer, and of course this doesn't work on Kickstart 1.3 machines. Now, if you want to force a machine into the other display system you need some magic pokes: Here you go (beware other bits in $dff1dc can do nasty things. One bit can reverese the polarity on the video sync, not to healthy for some monitors I've heard...) To turn a NTSC system into PAL (50Hz) move.w #32,$dff1dc ; Magically PAL To turn a PAL system into NTSC (60Hz) move.w #0,$dff1dc ; Magically NTSC Remember: Not all displays can handle both display systems! Commdore 1084/1084S, Philips 8833/8852 and multisync monitors will, and very few US TV's will handle PAL signals. It might be polite for PAL demos to ask NTSC users if they wish to switch to PAL (by the magic poke) or quit. 16 Programming AGA hardware =========================== **** WARNING **** AGA Registers are temporary. They will change. Do not rely on this documentation. No programs written with this information can be officially endorsed or supported by Commodore. If this bothers you then stop reading now. I've rewritten this again, because of big mistakes, things that weren't really necessary, and because no-one realy understood the original. Remember that for most things the OS provides a much better and easier way to access new screen modes, and the OS will be compatible with future chipsets, these registers will change! Bitplanes: Set 0 to 7 bitplanes as before in $dff100. Set 8 bitplanes by setting bit 4 of $dff100, bits 12 to 15 should be zero. (ooops. Big mistake last time!) Colour Registers: There are now 256 colour registers, all accessed through the original 32 registers AGA works with 8 differents palettes of 32 colors each, re-using colour registers from $0180 to $01BE. You can choose the palette you want to access via the bits 13 to 15 of register $0106 bit 15 | bit 14 | bit 13 | Selected palette -------+--------+--------+------------------------------ 0 | 0 | 0 | Palette 0 (color 0 to 31) 0 | 0 | 1 | Palette 1 (color 32 to 63) 0 | 1 | 0 | Palette 2 (color 64 to 95) 0 | 1 | 1 | Palette 3 (color 96 to 125) 1 | 0 | 0 | Palette 4 (color 128 to 159) 1 | 0 | 1 | Palette 5 (color 160 to 191) 1 | 1 | 0 | Palette 6 (color 192 to 223) 1 | 1 | 1 | Palette 7 (color 224 to 255) To move a 24-bit colour value into a colour register requires two writes to the register: First clear bit 9 of $dff106 Move high nibbles of each colour component to colour registers Then set bit 9 of $dff106 Move low nibbles of each colour components to colour registers For example, to change colour zero to the colour $123456 dc.l $01060000 dc.l $01800135 dc.l $01060200 dc.l $01800246 Note: As soon as you start messing with $dff106 forget all your fancy multi-colours-per-line plasma tricks. The colour only gets updated at the end of the scanline. Bummer dudes... Sprites: To change the resolution of the sprite, just use bit 7 and 6 of register $0106 bit 7 | bit 6 | Resolution ------+-------+----------- 0 | 0 | Lowres (140ns) 1 | 0 | Hires (70ns) 0 | 1 | Lowres (140ns) 1 | 1 | SuperHires (35ns) -------------------------- (Now.. 70ns sprites may not be available unless the Interlace bit in BPLCON0 is set. Don't ask me why.... There appears to be much more to this than just these two bits. It seems to depend on a lot of different things...) For 32-bit and 64-bit wide sprites use bit 3 and 2 of register $01FC Sprite format (in particular the control words) vary for each width. bit 3 | bit 2 | Wide | Control Words ------+-------+-------------+---------------------------------- 0 | 0 | 16 pixels | 2 words (normal) 1 | 0 | 32 pixels | 2 longwords 0 | 1 | 32 pixels | 2 longwords 1 | 1 | 64 pixels | 2 double long words (4 longwords) --------------------------------------------------------------- Wider sprites are not available under all conditions. It is possible to choose the color palette of the sprite. This is done with bits 7 and 4 of register $010C. bit 7 | bit 6 | bit 5 | bit 4 | Starting color of the sprite's palette ------+-------+-------+-------+------------------------------------------ 0 | 0 | 0 | 0 | $0180/palette 0 (coulor 0) 0 | 0 | 0 | 1 | $01A0/palette 0 (color 15) 0 | 0 | 1 | 0 | $0180/palette 1 (color 31) 0 | 0 | 1 | 1 | $01A0/palette 1 (color 47) 0 | 1 | 0 | 0 | $0180/palette 2 (color 63) 0 | 1 | 0 | 1 | $01A0/palette 2 (color 79) 0 | 1 | 1 | 0 | $0180/palette 3 (color 95) 0 | 1 | 1 | 1 | $01A0/palette 3 (color 111) 1 | 0 | 0 | 0 | $0180/palette 4 (color 127) 1 | 0 | 0 | 1 | $01A0/palette 4 (color 143) 1 | 0 | 1 | 0 | $0180/palette 5 (color 159) 1 | 0 | 1 | 1 | $01A0/palette 5 (color 175) 1 | 1 | 0 | 0 | $0180/palette 6 (color 191) 1 | 1 | 0 | 1 | $01A0/palette 6 (color 207) 1 | 1 | 1 | 0 | $0180/palette 7 (color 223) 1 | 1 | 1 | 1 | $01A0/palette 7 (color 239) ------------------------------------------------------------------------- Bitplanes, sprites and copperlists should be 64-bit aligned under AGA. Bitplanes should also only be multiples of 64-bits wide, so if you want an extra area on the side of your screen for smooth blitter scrolling it must be *8 bytes* wide, not two as normal. For example: CNOP 0,8 sprite incbin "myspritedata" CNOP 0,8 bitplane incbin "mybitplane" and so on. This also raises another problem. You can no longer use AllocMem() to allocate bitplane/sprite memory directly. Either use AllocMem(sizeofplanes+8) and calculate how many bytes you have to skip at the front to give 64-bit alignment (remember this assumes either you allocate each bitplane individually or make sure the bitplane size is also an exact multiple of 64-bits), or you can use the new V39 function AllocBitMap(). 17. Keyboard Timings ==================== If you have to read the keyboard by hardware, be very careful with your timings. Not only do different processor speeds affect the keyboard timings (for example, in the game F-15 II Strike Eagle on an Amiga 3000 the key repeat delay is ridiculously short, you ttyyppee lliikkee tthhiiss aallll tthhee ttiimmee. You use up an awful lot of Sidewinders very quickly!), but there are differences between different makes of keyboard, some Amiga 2000's came with Cherry keyboards, these have small function keys the same size as normal alphanumeric keys - these keyboards have different timings to the normal Mitsumi keyboards. Use an input handler to read the keyboard. The Commodore guys have spent ages writing code to handle all the different possible hardware combinations around, why waste time reinventing the wheel? 18. How to break out of never-ending loops ========================================== Another great tip for Boerge here: >This is a simple tip I have. I needed to be able to break out of my >code if I had neverending loops. I also needed to call my exit code when I did >this. Therefore I could not just exit from the keyboard interrupt which I have >taken over(along with the rest of the machine). My solution wa to enter >supervisor mode before I start my program, and if I set the stack back then >I can do an RTE in the interrupt and just return from the Supervisor() call. >This is snap'ed from my code: > > lea .SupervisorCode,a5 > move.l sp,a4 ; > move.l (sp),a3 ; > EXEC Supervisor > bra ReturnWithOS > >.SupervisorCode > move.l sp,crashstack ; remember SSP > move.l USP,a7 ; swap USP and SSP > move.l a3,-(sp) ; push return address on stack > >that last was needed because it was a subroutine that RTSes (boy did I have >porblems working out my crashes before I fixed that) >Then I have my exit code: > >ReturnWithOS > tst.l crashstack > beq .nocrash > move.l crashstack,sp > clr.l crashstack > RTE ; return from supervisor mode >.nocrash > >my exit code goes on after this. > >This made it possible to escape from an interrupt without having to care >for what the exception frames look like. I haven't tried this because my code never crashes. ;-) 19. Version numbers! ==================== Put version numbers in your code. This allows the CLI version command to determine easily the version of both your source and executable files. Some directory utilities allow version number checking too (so you can't accidentally copy a newer version of your source over an older one, for example). Of course, if you pack your files the version numbers get hidden. Leaving version numbers unpacked was going to be added to PowerPacker, but I don't know if this is done yet. A version number string is in the format $VER: howtocode5.txt 5.0 (18.03.92) ^ ^ ^Version number (date is optional) | | | | File Name | | Identifier The Version command searches for $VER and prints the string it finds following it. For example, adding the line to the begining of your source file ; $VER: MyFunDemo.s 4.0 (01.01.93) and somewhere in your code dc.b "$VER: MyFunDemo 4.0 (01.01.93)",0 means if you do VERSION MyFunDemo.s you will get: MyFunDemo.s 4.0 (01.01.93) and if you assemble and do Version MyFunDemo, you'll get MyFunDemo 4.0 (01.01.93) Try doing version howtocode5.txt and see what you get :-) This can be very useful for those stupid demo compilations where everything gets renamed to 1, 2, 3, etc... Just do version 1 to get the full filename (and real date) Does this work on Kickstart 1.3? I can't remember, I ditched my 1.3 Kickstart 2 years ago :-) 20. CDTV ======== I've been asked if there is any special advice on how to program demos to work on CDTV, and if hardware access to the CDTV (for playing CD Audio, etc) is possible. The CDTV is essentially a 1Mb chip ram Amiga with a CD-ROM drive. The major difference (apart from lack of fast ram or $c00000 ram) is that the CDTV roms can take up anything from 100-200Kb of ram. Many demos fail on CDTV through lack of memory. You can hack your CDTV to switch on/off these roms (put a switch on JP15), when switched off the CDTV has a full 1Mb of memory and more software works, but you can still play audio CD's in the CD drive.. I have no information on how to program the CDTV at the hardware level. Currently the only supported way to access the CDTV special functions is by the CDTV.DEVICE, a standard ROM device that can be OpenDevice()d and sent IORequests. I don't think I'm allowed to give out the documentation for this, sorry :-( 21. Copper Wait Commands ======================== The Hardware Reference manual states a copper wait for the start of line xx is done with: $xx01,$fffe However (as many of you have found out), this actually triggers just before the end of the previous line (around 4 or 5 low-res pixels in from the maximum overscan border). For most operations this is not a problem (and indeed gives a little extra time to initialise stuff for the next line), but if you are changing the background colour ($dff180), then there is a noticable 'step' at the end of the scanline. The correct way to do a copper wait to avoid this problem is $xx07,$fffe. This just misses the previous scanline, so the background colour is changed exactly at the start of the scanline, not before. 22. Screen Modulos (thanks Magnus for this one...) ================================================== Don't assume bitplane modulos (BPL0MOD and BPL1MOD) will be set to zero. If you require zero modulos set them, at the start of your copperlist is as good a place as any. Under V39 OS the workbench is interleaved by default, so the modulo can be huge... Indeed, do not assume that *any* hardware register is set to a particular value. 23. Open Graphics Library! (Thanks Magnus, CS, and others..) ============================================================ I've never seen this in use before, but Magnus spotted it. It's got to be one of the worst pieces of code I've ever seen! Don't ever do this! move.l 4.w,a0 ; get execbase move.l (a0),a0 ; wandering down the library list... move.l (a0),a0 ; right. I think this is graphics.library ; now goes ahead and uses a0 as gfxbase... Oh yes, graphics.library is always going to be second down the chain from Execbase? If you want to access gfxbase (or any other library base) OPEN the library. Do not wander down the library chain, either by guesswork or by manually checking for "graphics.library" in the library base name. OpenLibrary() will do this for you. Here is the only official way to open a library. MOVEA.L 4,a6 LEA.L gfxname(PC),a1 MOVE.L #39,d0 ; version required (here V39) JSR _LVOOpenLibrary(a6) ; resolved by linking with amiga.lib ; or by include "exec/exec_lib.i" TST.L d0 BEQ.S OpenFailed ; use the base value in d0 as the a6 for calling graphics functions ; remember d0/d1/a0/a1 are scratch registers for system calls gfxname DC.B 'graphics.library',0 Don't use OldOpenLibrary! Always open libraries with a version, at least V33. V33 is equal to Kickstart 1.2. And DON'T forget to check the result returned in d0 (and nothing else). 24. Protracker Replay code bug ============================== I've just got the Protracker 2.3 update, and the replay code (both the VBlank and CIA code) still has the same bug from 1.0! At the front of the file is an equate >DMAWait = 300 ; Set this as low as possible without losing low notes. And then it goes on to use 300 as a hard coded value, never refering to DMAWait! Now, until I can get some free time to write a reliable scanline-wait routine to replace their DBRA loops (does anyone want to write a better Protracker player? Free fame & publicity :-), I suggest you change the references to 300 in the code (except in the data tables!) to DMAWait, and you make the DMAWait value *MUCH* higher. I use 1024 on this Amiga 3000 without any apparent problem, but perhaps it's safer to use a value around 2000. Has anyone tried Protracker on a 68040 machine, if so, what DMAWait value in Prefs is needed to make all modules sound ok? Or, does anyone have a system friendly version of the ProRunner replay? The one I have is awful, it hits the CIA timer hardware directly so nothing can use the CIA's once it quits. 25. Devpac optimise mode produces crap code? ============================================ If you're using Devpac and have found that the OPT o+ flag produces crap code, then you need to add the option o3-. I can't remember what this option does, my Devpac 3 manual is at the office. 26. Argasm produces crap code, whatever happens =============================================== First, Argasm (unlike Devpac) from the Command Line or if called from Arexx using Cygnus Ed (my prefered system) defaults to writing linkable code, so you need to add opt l- (disable linkable code) If you find that your Argasm executables fail then check you haven't got any BSR's across sections! Argasm seems to allow this, but of course the code doesn't work. Jez San from Argonaut software who publish ArgAsm says it's not a bug, but a feature of the linker... Yeah right Jez... But Argasm is *fast*, and it produces non-working code *faster* than any other assembler :-) I still use it though, but Devpac comes in handy for checking code every now and then. Argonaut have abandoned ArgAsm so the last version (1.09d) is the last. There will be no more... 27. Help! I'm starting to code in assembler. Where do I begin? ============================================================== If you are just starting to learn programming, and you want a good place to begin learning assembler, buy Amos!. It's very easy to write assembler code, load it into amos and test it. For example, take this routine: ;simplemaths.s add.l d0,d1 ; add contents of d0 to d1 rts Assemble this with Devpac and what do you get? Not a lot. Now, load AMOS and type this: Pload "ram:simplemaths",1 ' load executable file into bank 1 Input "Enter a number ";n1 Input "Enter another number ";n2 dreg(0) = n1 ' Store n1 in 68000 register d0 dreg(1) = n2 ' Store n2 in 68000 register d1 call(1) ' Run your machinecode routine Print n1;" plus ";n2;" equals ";dreg(1) ' returns result in d1 You can start playing with 68000 instructions this way, seeing how they work, without having to 'jump in the deep end' writing routines to set up displays, copperlists, windows or writing to the console. You can also pass your machine code the address of AMOS's bitplanes (by Phybase(0) to Phybase(n) where n is number of bitplanes - 1), so you can write your own vector/bob code and test it easily before writing your own front end code. Once you have got the hang of 68000, you can drop Amos. Another good way is to write some code in C, and use the inline debugging options with SAS C, and OMD to examine what your C compiler actually generates. To do this with SAS V6.x do the following SC debug=full myprog.c OMD >ram:omdoutput myprog.o myprog.c You will get each line of C code interleaved with the assembler that it generates. Very handy! It's also amazing how good the code generated by SAS C 6.2 really is. 28 How can I tell what processor I am running on? ================================================= Look inside your case. Find the large rectangular (or Square) chip, read the label :-) Or... move.l 4.w,a6 move.w AttnFlags(a6),d0 ; get processor flags d0.w is then a bit array which contains the following bits Bit Meaning if set 0 68010 processor fitted (or 68020/30/40) 1 68020 processor fitted (or 68030/40) 2 68030 processor fitted (or 68040) [V37+] 3 68040 processor fitted [V37+] 4 68881 FPU fitted (or 68882) 5 68882 FPU fitted [V37+] 6 68040 FPU fitted [V37+] The 68040 FPU bit is set when a working 68040 FPU is in the system. If this bit is set and both the 68881 and 68882 bits are not set, then the 68040 math emulation code has not been loaded and only 68040 FPU instructions are available. This bit is valid *ONLY* if the 68040 bit is set. Don't forget to check which ROM version you're running. DO NOT assume that the system has a >68000 if the word is non-zero! 68881 chips are available on add-on boards without any faster processor. And don't assume that a 68000 processor means a 7Mhz 68000. It may well be a 14Mhz processor. So, you can use this to determine whether specific processor functions are available (more on 68020 commands in a later issue), but *NOT* to determine values for timing loops. Who knows, Motorola may release a 100Mhz 68020 next year :-) Does anyone know a system-friendly way to check for MMU? 29. All addresses are 32 bit ============================ "Oh look" says clever programmer. "If I access $dcdff180 I can access the colour0 hardware register, but it confuses people hacking my code!". Oh no you can't. On a machine with a 32-bit address bus (any accelerated Amiga) this doesn't work. And all us hackers know this trick now anyway :-) Always pad out 24-bit addresses (eg $123456) with ZEROs in the high byte ($00123456). Do not use the upper byte for data, for storing your IQ, for scrolly messages or for anything else. Similarly, on non ECS machines the bottom 512k of memory was paged four times on the address bus, eg: move.l #$12345678,$0 move.l $80000,d0 ; d0 = $12345678 move.l $100000,d1 ; d1 = $12345678 move.l $180000,d2 ; d2 = $12345678 This does not work on ECS and upwards!!!! You will get meaningless results if you try this, so PLEASE do not do it! 30. Action Replay Cartridges ============================ These things are great fun, even more so if you get into the 'sysop mode' (Allows disassembly of ram areas not previously allowed by Action Replay, including non-autoconfig ram and the cartridge rom!) To get into sysop mode on Action Replay 1 type: LORD OLAF To get into sysop mode on Action Replay 2 type: MAY THE FORCE BE WITH YOU To get into sysop mode on Action Replay 3 type the same as Action Replay 2. After this you get a message "Try a new one". Then type in NEW and sysop powers are granted! 31. Avoiding Forbid() and Permit() ================================== I've tried it, this works, it's wonderful. Instead of using Forbid() and Permit() to prevent the OS stealing time from your code, you could put your demo or game at a high task priority. The following code at the beginning will do this: move.l 4.w,a6 sub.l a1,a1 ; Zero - Find current task jsr _LVOFindTask(a6) move.l d0,a1 moveq #127,d0 ; task priority to very high... jsr _LVOSetTaskPri(a6) Now, only essential system activity will dare to steal time from your code. This means you can now carry on using dos.library to load files from hard drives, CD-ROM, etc, while your code is running. Try using this instead of Forbid() and Permit(), and insert a new floppy disk while your code is running. Wow... The system recognises the disk change.... But remember to add your input handler!!! Of course this is purely up to you. You may prefer to Forbid() when your code is running (it makes it easier to write). Several people have suggested to me that I needed to do a Forbid() *before* the LoadView(NULL);WaitTOF();WaitTOF(); code, in case something else has run and opened a display (disrupting copper registers) in the meantime. There is no point doing this because WaitTOF() disables the Forbid() state anyway... Ok.. you could write a busy-loop to check for VBlank, but it's much better to specifically check if a view has opened, check if gb_ActiView is not zero. If it's zero, it's ok to carry on, otherwise LoadView(NULL);WaitTOF();WaitTOF() again, and so on... Now... I haven't actually checked this, I haven't had time, but it should work! :-) (I'll live to regret this, I know...) 32. 68020 Optimization (Thanks Chris) ===================================== A1200 speed issues: The A1200 has a fairly large number of wait-states when accessing chip-ram. ROM is zero wait-states. Due to the slow RAM speed, it may be better to use calculations for some things that you might have used tables for on the A500. Add-on RAM will probably be faster than chip-ram, so it is worth segmenting your game so that parts of it can go into fast-ram if available. For good performance, it is critical that you code your important loops to execute entirely from the on-chip 256-byte cache. A straight line loop 258 bytes long will execute far slower than a 254 byte one. The '020 is a 32 bit chip. Longword accesses will be twice as fast when they are aligned on a long-word boundary. Aligning the entry points of routines on 32 bit boundaries can help, also. You should also make sure that the stack is always long-word aligned. Write-accesses to chip-ram incur wait-states. However, other processor instructions can execute while results are being written to memory: move.l d0,(a0)+ ; store x coordinate move.l d1,(a0)+ ; store y coordinate add.l d2,d0 ; x+=deltax add.l d3,d1 ; y+=deltay will be slower than: move.l d0,(a0)+ ; store x coordinate add.l d2,d0 ; x+=deltax move.l d1,(a0)+ ; store y coordinate add.l d3,d1 ; y+=deltay The 68020 adds a number of enhancements to the 68000 architecture, including new addressing modes and instructions. Some of these are unconditional speedups, while others only sometimes help: Adressing modes: o Scaled Indexing. The 68000 addressing mode (disp,An,Dn) can have a scale factor of 2,4,or 8 applied to the data register on the 68020. This is totally free in terms of instruction length and execution time. An example is: 68000 68020 ----- ----- add.w d0,d0 move.w (0,a1,d0.w*2),d1 move.w (0,a1,d0.w),d1 o 16 bit offsets on An+Rn modes. The 68000 only supported 8 bit displacements when using the sum of an address register and another register as a memory address. The 68020 supports 16 bit displacements. This costs one extra cycle when the instruction is not in cache, but is free if the instruction is in cache. 32 bit displacements can also be used, but they cost 4 additional clock cycles. o Data registers can be used as addresses. (d0) is 3 cycles slower than (a0), and it only takes 2 cycles to move a data register to an address register, but this can help in situations where there is not a free address register. o Memory indirect addressing. These instructions can help in some circumstances when there are not any free register to load a pointer into. Otherwise, they lose. New instructions: o Extended precision divide an multiply instructions. The 68020 can perform 32x32->32, 32x32->64 multiplication and 32/32 and 64/32 division. These are significantly faster than the multi-precision operations which are required on the 68000. o EXTB. Sign extend byte to longword. Faster than the equivalent EXT.W EXT.L sequence on the 68000. o Compare immediate and TST work in program-counter relative mode on the 68020. o Bit field instructions. BFINS inserts a bitfield, and is faster than 2 MOVEs plus and AND and an OR. This instruction can be used nicely in fill routines or text plotting. BFEXTU/BFEXTS can extract and optionally sign-extend a bitfield on an arbitrary boundary. BFFFO can find the highest order bit set in a field. BFSET, BFCHG, and BFCLR can set, complement, or clear up to 32 bits at arbitrary boundaries. o On the 020, all shift instructions execute in the same amount of time, regardless of how many bits are shifted. Note that ASL and ASR are slower than LSL and LSR. The break-even point on ADD Dn,Dn versus LSL is at two shifts. o Many tradeoffs on the 020 are different than the 68000. o The 020 has PACK an UNPACK which can be useful. 33. Sprites =========== Some people doesn't initialize the sprites they don't want to use correctly. (This reminds me of Soundtracker.) A common error is unwanted sprites pointing at address $0. If the longword at address $0 isn't zero you'll get some funny looking sprites at unpredictable places. The right way of getting rid of sprites is to point them to an address you for sure know is #$00000000 (0.l), and with AGA you may need to point to FOUR long words of 0 on a 64-bit boundary CNOP 0,8 pointhere: dc.l 0,0,0,0 The second problem is people turning off the sprite DMA at the wrong time. Vertical stripes on the screen are not always beautiful. Wrong time means that you turn off the DMA when it is "drawing" a sprite. It is very easy to avoid this. Just turn off the DMA when the raster is in the vertical blank area. Currently V39 Kickstart has a bug where sprite resolution and width are not always reset when you run your own code. To reset this you must do the following (but only if you detect AGA chipset) move.w #0,$dff1fc move.w #0,$dff106 Remember this will also zero the other bits in these registers, so do this before any of your other setup! 34. Trackloaders ================ Use CIA timers! DON'T use processor timing. If you use processor timing you will MESS UP the diskdrives in accelerated Amigas. Use AddICRVector to allocate your timers, don't hit $bfxxxx addresses!!! On second thoughts. DON'T use trackloaders! Use Dos... 35. Debug with Enforcer ======================= Commodore have written a number of utilities that are *excellent* for debugging. They are great for trapping errors in code, such as illegal memory access and using memory not previously allocated. The down side is they need to things: a) A Memory Management Unit (at least for Enforcer). This rules out any 68000 machine, and (unfortunately) the Amiga 1200 and the Amiga 4000/EC030. If you are seriously into programming insist on a FULL 68030/40 chip, accept no substitute. Amiga 2000 owners on a tight budget may want to look at the Commodore A2620 card (14Mhz 68020 with 68851 MMU fitted) which will work and is now very cheap. b) A serial terminal. This is really essential anyway, any serious programmer will have a terminal (I have an old Amiga 500 running NCOMM for this task) to output debug information with dprintf() from their code. This is the only sensible way to display debug info while messing with copperlists and hardware displays. Enforcer, Mungwall and other utilities are available on Fred Fish Disks, amiga.physik and wuarchive, and probably on an issue of the excellent "The Source" coders magazine from Epsilon. 36. More Accurate Vector Maths ============================== A little (little) math hint for vector calculations: When doing a muls with a value and then downshifting the value, use and 'addx' to get roundoff error instead of truncated error, for example: moveq #0,d7 DoMtxMul . . muls (a0),d0 ;Do a muls with a sin value *256 asr.l #8,d0 addx.w d7,d0 ;trunc > roundoff . . When you do a 'asr' the last outshifted bit goes to the x-flag. if you use an addx with source=0 => dest=dest+'x-flag'. This halves the error, and makes complicated vector objects less 'hacky'... Just an Idea ... And it don't take too many cycles either... Hope it helps. /Carl-Henrik Sk}rstedt (Asterix - Movement) 37. 68000 Optimization ====================== ASSEMBLY CODE OPTIMIZATION (READ: "HOW AS FAST AND SMALL AS POSSIBLE?"). Written by Irmen de Jong, march '93. (E-mail: ijdjong@cs.vu.nl) Some notes added by CJ ----------------------------------------------------------------------------- Original Possible optimization Examples/notes ----------------------------------------------------------------------------- STANDARD WELL-KNOWN OPTIMIZATIONS RULE: use Quick-type/Short branch! Use INLINE subroutines if they are small! ----------------------------------------------------------------------------- BRA/BSR xx BRA.s/BSR.s xx if xx is close to PC MOVE.X #0 CLR.X/MOVEQ/SUBA.X move.l #0,count -> clr.l count move.l #0,d0 -> moveq #0,d0 move.l #0,a0 -> sub.l a0,a0 CLR.L Dx MOVEQ #0,Dx - CMP #0 TST - MOVE.L #nn,dx MOVEQ #nn,dx possible if -128<=nn<=127 ADD.X #nn ADDQ.X #nn possible if 1<=nn<=8 SUB.X #nn SUBQ.X #nn same... JMP/JSR xx BRA/BSR xx possible if xx is close to PC * and in same section!* (what's the use of JMP/JSR nn(PC)?) JSR xx;RTS JMP xx save a RTS BSR xx;RTS BRA xx same... (assuming routine doesn't rely on anything in the stack) LSL/ASL #1/2,xx ADD xx,xx [ADD xx,xx] lsl #2,d0 -> 2 times add d0,d0 MULU #yy,xx where yy is a power of 2, 2..256 LSL/ASL #1-8,xx mulu #2,d0 -> asl #1,d0 -> add d0,d0 BEWARE: STATUS FLAGS ARE "WRONG" DIVU #yy,xx where yy is a power of 2, 2..256 LSR/ASR #.. SWAP divu #16,d0 -> lsr #4,d0 BEWARE: STATUS FLAGS ARE "WRONG", AND HIGHWORD IS NOT THE REMAINDER. ADDRESS-RELATED OPTIMIZATIONS RULE: use short adressing/quick adds! ---------------------------------------------------------------------------- MOVEA.L #nn MOVEA.W #nn Movea is "sign-extending" thus possible if 0<=nn<=$7fff ADDA.X #nn LEA nn( adda.l #800,a0 -> lea 800(a0),a0 possible if -$8000<=nn<=$7fff LEA nn( ADDQ.W #nn lea 6(a0),a0 -> addq.w #6,a0 possible if 1<=nn<=8 $0000nnnn.l $nnnn.w move.l 4,a6 -> move.l 4.w,a6 possible if 0<=nnnn<=$7fff (nnnn is SIGN EXTENDED to LONG!) MOVE.L #xx,Ay LEA xx,Ay try xx(PC) with the LEA MOVE.L Ax,Ay; ADD #nnnn,Ay LEA nnnn(Ax),Ay copy&add in one OFFSET-RELATED OPTIMIZATIONS RULE: use PC-relative addressing or basereg addressing! put your code&data in ONE segment if possible! ---------------------------------------------------------------------------- MOVE.X nnnn MOVE.X nnnn(pc) lea copper,a0 -> lea copper(pc),a0.. LEA nnnn LEA nnnn(pc) ...possible if nnnn is close to PC (Ax,Dx.l) (Ax,Dx.w) possible if 0<=Dx<=$7fff If PC-relative doesn't work, use Ax as a pointer to your data block. Use indirect addressing to get to your data: move.l Data1-Base(Ax),Dx etc. TRICKY OPTIMIZATIONS ---------------------------------------------------------------------------- BSET #xx,yy ORI.W #2^xx,yy 0<=xx<=15 BCLR #xx,yy ANDI.W #~(2^xx),yy " BCHG #xx,yy EORI.W #2^xx,yy " BTST #xx,yy ANDI.W #2^xx,yy " Best improvement if yy=a data reg. BEWARE: STATUS FLAGS ARE "WRONG". SILLY OPTIMIZATIONS (FOR OPTIMIZING COMPILER OUTPUTS ETC) RULE: make the routines in assembly yourself! ---------------------------------------------------------------------------- MOVEM (one reg.) MOVE movem.l d0,-(sp) -> move d0,-(sp) MOVE xx,-(sp) PEA xx possible if xx=(Ax) or constant. 0(Ax) (Ax) - MULU/MULS #0 CLR.L moveq #0,Dx with data-registers. MULU #1,xx SWAP CLR SWAP high word is cleared with mulu #1 MULS #1,xx SWAP CLR SWAP EXT.L see MULU, and sign exteded. BEWARE: STATUS FLAGS ARE "WRONG" LOOP OPTIMIZATION. ---------------------------------------------------------------------------- Example: imagine you want to eor 4096 bytes beginning at (a0). Solution one: move #4096-1,d7 ..1 eori.b d0,(a0)+ dbra d7,.1 Consider the loop from above. 4096 times a eor.b and a dbra takes time. What do you think about this: move #4096/4-1,d7 ..1 eor.l d0,(a0)+ dbra d7,.1 Eors 4096 bytes too! But only needs 1024 eor.l/dbras. Yeah, I hear you smart guys cry: what about 1024 eor.l without any loop?! Right, that IS the fastest solution, but is VERY memory consuming (2 Kb). Instead, join a loop and a few eor.l: move #4096/4/4-1,d7 ..1 eor.l d0,(a0)+ eor.l d0,(a0)+ eor.l d0,(a0)+ eor.l d0,(a0)+ dbra d7,.1 This is faster than the loop before. I think about 8 or 16 eor.l's is just fine, depending on the size of the mem to be handled (and the wanted speed!). Also, mind the cache on 68020+ processors, the loop code must be small enough to fit in it for highest speeds. Try to do as much as possible within one loop (but considering the text above) instead of a few loops after each other. MEMORY CLEARING/FILLING. ---------------------------------------------------------------------------- A common problem is how to clear or fill some mem in a short time. If it is CHIP-MEMORY, use the blitter (only D-channel, see below). In this case you can still do other things with yer 680x0 while blittie-boy is busy erasing. If it is FAST-MEMORY, you can use the method from above, with clr.l instead of eor.l, but there is a much faster way: move.l sp,TempSp lea MemEnd,sp moveq #0,d0 ...for all 7 data regs... moveq #0,d7 move.l d0,a0 ...for 6 address regs... move.l d0,a6 After this, ONE instruction can clear 60 bytes of memory (15*4): movem.l d0-d7/a0-a6,-(sp) ;wham! Now, repeat this instruction as often as required to erase the memory. (memsize/60 times). You may need an additional movem.l to erase the last few bytes. Get sp(=a7) back at the end with (guess..): move.l TempSp,sp If you are low on mem, put a few movem.l in a loop. But, now you need a loop-counter register, so you'll only clear 56 bytes in one movem.l. In the case of CHIP memory, you can use both the blitter and the processor simultaneously to clear much CHIP mem in a VERY short time... It takes some experimentation to find the best sizes to clear with the blitter and with the processor. BUT, ALWAYS USE A WAITBLIT AFTER CLEARING SIMULTANEOUSLY, even if you know that the blitter is finished before your processor is (mind 680x0's) BLITTER SPEEDS. (from the Hardware Reference Manual) ---------------------------------------------------------------------------- Some general notes about blitter speeds. These numbers are for an OCS/ECS blitter only, in 16-bit chip ram (who knows the AGA blitter speed???) n * H * W time taken = ----------- 7.09 (7.15 for NTSC) time is in microseconds. H=blitheight,W=blitwidth(#words),n=cycles n=4+....depends on # DMA-channels used A: +0 (this one is free!) B: +2 C or D: +0 In line-mode, every pixel takes 8 cycles. C and D: +2 So, use A,D,A&D for the fastest operation. Use A&C for 2-source operations (e.g. collision check or so). NOTES (FURTHER NOTES MAY BE ADDED IN FUTURE...) ---------------------------------------------------------------------------- - 68020+ processors are particularly fast at using longwords. Byte access is some sort of brake on the memory access. Use at least words. - 68010 has a loop-cache, it caches 3 word loops like loop move.l (a0)+,(a1)+ dbra d7,loop - When optimizing BIG programs (for instance, compiler outputs...) first try to find the time-critical parts (inner loops, often called procs etc.) In most cases 10% of the code is responsible for 90% of the execution time. I see people using OldOpenLibrary() because it needs one less register set up.. I mean, what's the point? Are people really going to notice if your demo takes two clock cycles less before starting? :-) - Often it is better not to set BLTPRI in DMACON (#10 in $dff09a) as this can keep your processor from calculating things while the blitter is busy. - Use as much registers as possible! I.e. store values in registers rather than in memory, this gives one hell of a performance boost. (NOTE: just this is the power of RISC machines. Very much register access instead of memory access. Fill these 16 registers!) - Related to the last one: unlike many compilers, DONT put your parameters on stack before calling a sub! Instead, put them in well defined registers! - In case you have enough memory, try to remove as many MULU/S and DIVU/S as possible by pre-calculating a multiplication or division table, and reading values from it, rather than each time MULU #10 or so. * Beware on A1200 though, read Chris's section on 68020 optimization. 38. How do I make a RESET ========================= Here is the official routine supported by Commodore: ^^^^^^^^^^^^^^^^^^^^^^ INCLUDE "exec/types.i" INCLUDE "exec/libraries.i" csect text xdef _ColdReboot xref _LVOSupervisor EXECBASE equ 4 ROMEND equ $01000000 SIZE_OFFSET equ -$14 KICK_V36 equ 36 V36_ColdReboot equ -726 _ColdReboot: move.l EXECBASE,a6 cmp.w #KICK_V36,LIB_VERSION(a6) ;which Version of Exec ? blt.s .old_kick ;old one -> goto old_kick jmp V36_ColdReboot(a6) ;else use Exec-Function .old_kick: lea .Reset_Code(pc),a5 jsr _LVOSupervisor(a6) ;get Supervisor-status ;never reaching this point cnop 0,4 ;very important .Reset_Code: lea ROMEND,a0 ;Calc Entrypoint sub.l SIZE_OFFSET(a0),a0 move.l 4(a0),a0 subq.l #2,a0 reset ;Reset peripherie jmp (a0) ;done ; and in the same LONGWORD !!!! END 39. System-Privates: ==================== If anywhere in the manuals, includes or autodocs it says that this or that is PRIVATE or RESERVED or INTERNAL (or something similar) then * don't read this entry * never ever WRITE something to it * if it's a function, then DON'T use it * don't check it for anything Private system points can be changed without reason, or without writing it into any documentation ! (Thanks Arno!) And to add to that, if a system structure member has a routine that allows you to alter it (for example, SetAPen() alters the Pen value in the RastPort. It is currently possible to alter the pen by poking the structure) then USE IT! Do not Poke system structures unless there is no other way to alter the value. 40. Good Books ============== I've been asked to suggest some good books: Hardware Reference Manual ------------------------- Essential for demo and game coders. Rom Kernal Manual: Libraries ---------------------------- Essential for *ALL* Amiga Programmers Rom Kernal Manual: Devices -------------------------- Essential if you plan to do any work with Device IO (input.device, timer.device, trackdisk.device, etc...) Rom Kernal Manual: Includes & Autodocs -------------------------------------- These are available on disk instead, which is a lot cheaper! Essential reference work, All these books are available to developers on the CATS CD 2 as AmigaGuide files.. $50 from CATS US. Amiga User Interface & Style Guide ---------------------------------- Probably the most boring book I've ever read :-) Useful if you intend to write applications, but even then some of the rules have changed for V39 since this book was printed. AmigaDOS manual 3rd Edition (Bantam) ------------------------------------ Truly awful book, unfortunately the ONLY official dos.library reference. Why it can't be integrated into the RKM's I don't know... If you need to program dos.library and want info on AmigaDos file and hunk formats, this is the book. Mapping the Amiga (Compute) --------------------------- One of my favourite books. This is an easy-to-read reference to all system (1.3) functions and structures. Much easier to use than the Includes & Autodocs. I wish there was a V39 update to this! Amiga System Programmers Guide (Abacus) --------------------------------------- Quite handy, it covers a lot of the Hardware Reference manual, Rom Kernal Manuals and more in one book, but I'd suggest you buy the official books instead. Advanced Amiga System Programmers Guide (Abacus) ------------------------------------------------ Slightly more interesting than the first one, covers mainly OS level programming, but again nothing really new. Amiga Disk Drives Inside and Out (Abacus) ----------------------------------------- AVOID THIS BOOK! It has some of the worst code and coding practices I have ever seen in it. Half of the code will only work under Kickstart 1.2, the other half doesn't work at all!!!! 680x0 Programming by Example (Howard Sams & Company) ---------------------------------------------------- Excellent book on 68000 programming. Covers 68000/020/030 instructions, optimization. Aimed at the advanced 68000 user, some really neat stuff in this book. The only 68000 book I've bought, except the Motorola manual. The Discworld Series (Terry Pratchett) -------------------------------------- Nothing to do with Amigas, but excellent books. If you need a break from programming, read one of these! Copper Startup Code =================== I've seperated this out now, cut out this file and keep it safe (you may need a grown up to help you with this :-) 8<-------8<-------8<-------8<-------8<-------8<-------8<-------8<------- * * Startup.asm - A working tested version of startup from Howtocode5.txt * * This code sets up one of two copperlists (one for PAL and one for NTSC) * machines. It shows something to celebrate 3(?) years since the Berlin * wall came down :-) Press left mouse button to return to normality. * Tested on Amiga 3000 (ECS/V39 Kickstart) and Amiga 1200 (AGA/V39) * Read Howtocode5.txt for information on this source! * * Note: You will have to do something about sprites. Each sprite * pointer should point at a valid sprite, or for AGA *FOUR* long * words on a 64-bit boundary, * ie: * CNOP 0,8 * blanksprite: dc.l 0,0,0,0 * * Also, for AGA, sprites need to be fixed (see section on Sprites) * * $VER: startup.asm V5.0gti (18.3.92) * Valid on day of purchase only. No re-admission. No rain-checks. * opt l-,o+ ; auto link, optimise on ; opt o3- ; add this for Devpac Assembler section mycode,code ; need not be in chipram incdir "include:" include "exec/types.i" include "exec/funcdef.i" ; keep code simple and include "exec/exec_lib.i" ; easy to read - use include "graphics/gfxbase.i" ; the includes! include "graphics/graphics_lib.i" include "misc/easystart.i" ; Allows startup from ; icon StartCopper: move.l 4.w,a6 sub.l a1,a1 ; Zero - Find current task jsr _LVOFindTask(a6) move.l d0,a1 moveq #127,d0 ; task priority to very high... jsr _LVOSetTaskPri(a6) move.l 4.w,a6 ; get ExecBase lea gfxname,a1 ; graphics name moveq #0,d0 ; any version jsr _LVOOpenLibrary(a6) tst.l d0 beq End ; failed to open? Then quit move.l d0,gfxbase move.l d0,a6 move.l gb_ActiView(a6),wbview ; store current view address ; gb_ActiView = 34 .loop sub.l a0,a0 ; clear a0 jsr _LVOLoadView(a6) ; Flush View to nothing jsr _LVOWaitTOF(a6) ; Wait once jsr _LVOWaitTOF(a6) ; Wait again. ; now check nothing has run in the meantime! ; ; Please note, I haven't actually checked this bit!! :-) ; Can someone prove if it does or does not work???? ; cmp.l #0,gb_ActiView(a6) ; Any other view appeared? bne.s .loop ; If so wipe it. cmp.b #50,VBlankFrequency(a6) ; Am I *running* PAL? bne.s .ntsc move.l #mycopper,$dff080 ; bang it straight in. bra.s .lp .ntsc move.l #mycopperntsc,$dff080 .lp btst #6,$bfe001 ; ok.. I'll do an input bne.s .lp ; handler next time. CloseDown: move.l wbview,a1 move.l gfxbase,a6 jsr _LVOLoadView(a6) ; Fix view move.l gb_copinit(a6),$dff080 ; Kick it into life ; copinit = 38 move.l a6,a1 move.l 4.w,a6 jsr _LVOCloseLibrary(a6) ; EVERYONE FORGETS THIS!!!! End: rts ; back to workbench/clc section mydata,data_c ; keep data & code seperate! mycopper dc.w $100,$0200 ; otherwise no display! dc.w $180,$00 dc.w $8107,$fffe ; wait for $8107,$fffe dc.w $180,$f00 ; background red dc.w $d607,$fffe ; wait for $d607,$fffe dc.w $180,$ff0 ; background yellow dc.w $ffff,$fffe dc.w $ffff,$fffe mycopperntsc dc.w $100,$0200 ; otherwise no display! dc.w $180,$00 dc.w $6e07,$fffe ; wait for $6e07,$fffe dc.w $180,$f00 ; background red dc.w $b007,$fffe ; wait for $b007,$fffe dc.w $180,$ff0 ; background yellow dc.w $ffff,$fffe dc.w $ffff,$fffe wbview dc.l 0 gfxbase dc.l 0 gfxname dc.b "graphics.library",0 Thanks to everyone who has replied. Any more questions, queries, or "CJ, you got it wrong again!" type mail to the email address below.... What I want: =========== If anyone wants to spend some time writing something on these (especially from a demo coders perspective) I'd be very grateful. I would write some of them myself if I had more time... o 68881/2 Programming o How to Read C code (for Assember programmers reading OS manuals) o Introduction to programming vector graphics o How to set up an input handler o Reading the new Motorola syntax code And anything else you want to write about. Please feel free to write additions/replacements for anything already here... And of course, if anyone spots a *really* bad bit of code or programming practice, let me know and I'll warn people about it here... (Don't send my old code though :-) And a final comment: For those of you who wrote about Amazing Tunes II (a demo I wrote 2 years back) wanting to know how to get it to run on a 1Mb chipram machine... Sorry. It doesn't. It probaby breaks *every* rule in this docfile. I speak from experience. I used to be that evil programmer :-) Disassemble the bootblock to see some nightmare code... You can probably patch it if you're clever. It was meant to support 1Mb chipram but it never worked.. If you have 1.5Mb of ram (be it chip, fast or a mixture) it should work though... I had to totally rewrite it recently for The Demo Collection CDROM (which is totally amazing, costs £19.95 and contains 600Mb of demos, animations, 1000 Modules and much more - It's available from Almathera on (UK) 081 683 6418) It now plays 1000 modules (instead of 20), is much more system friendly and works on a (less than) 1Mb chipram CDTV. And you never know, if enough people ask I may do an AGA version, possibly on an Intuition screen... That would be nice! -------------------------------------------------------------------- This text is Copyright (C) 1993 Share and Enjoy, but may be freely distributed in any electronic form. The copyright of contributions quoted from other authors remains with the original author. If you would like to contribute to this file, email me at the address below... The startup code in this article is freeware and may be used by anyone for any purpose. All trademarks and registered names (Workbench, Kickstart, etc) acknowledged. All opinions expressed in this article are my own, and in no way reflect those of anyone else. Please note that many of the programming practices described in this text are ONLY applicable for demo coding, and should not be used for Games and other programming. I didn't write this for fun, I wrote it for you to use! Hopefully this will grow into a big file that demo coders can use. If you want to make a contribution please email it to me: I prefer plain ASCII set to no more than 75 column width, and no tabs if possible (although I can fix text sent to me..) If you strongly disagree with anything I write, or you want to send me some source or demos to test on Amiga 1200/4000 etc, or you have questions about Amiga programming, or suggestions for future articles, or just want to chat about the best way to optimise automatic copperlist generation code (*), then contact me via email at: Please send mail to comradej@althera.demon.co.uk. I CAN NOW REPLY TO MAIL!!! At last, thanks to AmigaElm and some ARexx I can reply to mail! If you sent me a message and haven't got a reply, it's because I lost the message, please mail me again! Sorry about the delays before! I seem to have lost usenet news now, so I haven't read anything from alt.sys.amiga.demos since early January. * - This is a NIGHTMARE. I really feel sorry for the guys who wrote MrgCop(). I will never swear at MrgCop() or RethinkDisplay() again :-)