===== In-depth look at ports ===== Attentive readers of [[https://docs.beamracer.net/doku.php?id=introduction_to_programming_the_beamracer#local_memory|introductory section on local memory]] may have noticed that out of five registers that constitute a port, only four were described. Now is the time to dive deeper into port operation and explain the role(s) of registers ''REP0'' and ''REP1''. ==== Inside a port ===== The ports are intended to be efficient and flexible means of accessing VASYL's local memory. As such, they have been equipped with extra hardware to accelerate some of the most frequent operations. We have already seen memory pointer post-increment/decrement in action, but there is more. Each of the two ports has its own DMA channel to the local memory, and is able to make one access - either read or write - per system clock cycle. This gives a theoretical transfer rate of ~1 MB/s per port. The 6510 is too slow to saturate even one such channel, because in the best case it can write to IO region once every four cycles (e.g. using a sequence of **STA** instructions). However, since display lists are also able to write to port registers, they can use ports to manipulate local memory at great speed. And since display lists are themselves located in that memory, it opens the possibility for display lists to manipulate both themselves and the data they use. Before we get to that, let's have a closer look at registers ''REP0'' and ''REP1''. ==== REP0 and REP1 ==== As you remember from the introductory chapter, writing a value to register ''PORT0'' will transfer that value to a destination in local memory pointed to by ''ADR0L'' and ''ADR0H''. For instance, this code LDA #target STA VREG_ADR0H LDA #$A0 STA VREG_PORT0 will store value ''$a0'' at location ''target'' in the current memory bank. Following the store, ''ADR0'' will be increased by ''STEP0'''s contents (or decreased, depending on its sign - it's a value between ''-128'' and ''+127''). LDA #target STA VREG_ADR0H LDA #1 STA VREG_STEP0 LDA #$A0 STA VREG_PORT0 STA VREG_PORT0 STA VREG_PORT0 STA VREG_PORT0 will thus result in ''$a0'' being stored in locations ''target'', ''target+1'', ''target+2'', and ''target+3''. Now, rather than repeating ourselves, instead we can use register ''REP0'' - a value stored there **repeats** the last write to ''PORT0'' this many times, i.e. LDA #target STA VREG_ADR0H LDA #1 STA VREG_STEP0 LDA #$A0 STA VREG_PORT0 LDA #3 STA VREG_REP0 will have exactly same result as the previous example, because the initial write of ''$A0'' to ''PORT0'' will be repeated three times. What is different is speed - the three extra writes will be executed in the next three cycles, thus using the full DMA channel throughput of ~1 MB/s. Let's see how we could use it to rapidly clear an 8 KiB screen, and learn a few more things in the process. LDA #screen STA VREG_ADR0H LDA #1 STA VREG_STEP0 LDX #8192 / 256 ; We will be clearing a page at a time. LDA #$00 STA VREG_PORT0 ; First byte here... LDY #255 ; ...then remaining 255 bytes of the first page... loop: STY VREG_REP0 ; ...and 256 on the following ones. waitrep: LDY VREG_REP0 ; Is auto-repetition still ongoing? BNE waitrep ; If so, let's wait for it to end. DEX BNE loop A few things to note: - We want the routine to be accurate, so after clearing the very first byte with a write to ''PORT0'', we only need to clean 255 bytes of the first page. That's why we write ''255'' to ''REP0'' on the first pass through the loop, but ''0'' on subsequent ones. - A value of ''0'' written to ''REP0'' means ''256'', i.e. "repeat the last action 256 times". - As the sequence of writes progresses, CPU is free to do other things. However, if it wants to kick off another repetition, it first needs to wait for the current one to finish. Since ''REP0'' contains the number of bytes that remain to be written, all that needs to be done is repeatedly reading ''REP0'' and checking whether it reached zero. - Each port has its dedicated DMA channel. This means that it does not interfere with other memory operations, display list execution, bitmap sequencer fetches, etc. - PORT0 and PORT1 are also fully independent, so each can execute auto-repetition at full speed. You could thus be clearing the screen using both of them simultaneously, doubling the performance to ~2 MB/s. - If both ports make memory access in the same cycle, PORT0 does it first. ==== Copying data in local RAM ==== In the [[https://docs.beamracer.net/doku.php?id=introduction_to_programming_the_beamracer#local_memory|introductory chapter]] we also explained how to read from the location pointed to by a port - you just need to set ''CTRL_PORT_READ_ENABLE'' bit in ''CONTROL'' register and then proceed reading from ''PORT0'' (or ''PORT1''). This can obviously be combined with writing, and used to copy data around - this routine will copy 256 bytes from location ''source'' to ''destination'': LDA VREG_CONTROL ORA #CTRL_PORT_READ_ENABLE STA VREG_CONTROL LDA #source STA VREG_ADR0H LDA #target STA VREG_ADR1H LDA #1 STA VREG_STEP0 STA VREG_STEP1 LDX #0 loop: LDA VREG_PORT0 STA VREG_PORT1 DEX BNE loop While this approach is faster than copying data in C64 base memory (you can use the simplest addressing modes, and don't have to worry about updating the source and destination addresses), it still is far below theoretical throughput of the ports - even with unrolled loops the best we can do is around 8 cycles per byte (<125KB/s). As you might have already suspected, we can side-step the CPU entirely, and copy data directly using ports' DMA channels. This mode is activated by setting ''CTRL_PORT_COPY_MODE'' in ''CONTROL'' register. Here is the fast way to copy 256-bytes: LDA VREG_CONTROL ORA #CTRL_PORT_MODE_COPY STA VREG_CONTROL LDA #source STA VREG_ADR0H LDA #target STA VREG_ADR1H LDA #1 STA VREG_STEP0 STA VREG_STEP1 LDA #0 ; As previously, "0" means "256". STA VREG_REP1 ; Write to REP1 kicks off hardware copy. The last instruction of the above routine launches memory transfer that will take exactly 256 cycles to complete, copying at ~1 MB/s. As previously, 6510 is free to do other things while the data is being copied, but if it wants to modify any of the port registers again, it should wait for the transfer to complete: waitcopy: LDA VREG_REP1 ; Is the transfer still ongoing? BNE waitcopy ; If so, let's wait for it to end. Once you're done copying and want to use ports for other purposes, remember to turn off copy mode in the ''CONTROL'' register: LDA VREG_CONTROL AND #~CTRL_PORT_MODE_MASK STA VREG_CONTROL Final two comments: - Because of how DMA channels are allocated within a cycle, accelerated copying is only possible from PORT0 to PORT1, not the other way around - ''STEP0'' and ''STEP1'' can naturally be different from each other, enabling you to reorganize your data as you copy it: change order, insert gaps, etc. For comprehensive example demonstrating use of all these features, please see [[https://github.com/madhackerslab/beamracer-examples/blob/master/asm/demo_hirestext.s|demo_hirestext.s]]. ==== Accessing ports from a display list ===== Since display lists are free to write VASYL registers, it should come as no surprise that they can also use the ports. All the operations described above can be performed using a display list instructions, although due to the fact that VASYL is not a general use CPU, there are some important differences. Let's first try to setup a simple memory clearing operation using ''PORT1'': MOV VREG_ADR1L, buffer MOV VREG_STEP1, 1 MOV VREG_PORT1, 0 MOV VREG_REP1, 49 This code will start clearing a total of 50 bytes starting from address ''buffer'' in the local memory. "Start" is an important word here, because like with the 6510, execution of the display list continues while the operation initiated by the write to ''VREG_REP1'' progresses. This is a desirable feature, but what if we want to perform another operation using the same port? VASYL cannot read individual registers, so we cannot loop waiting for ''VREG_REP1'' to reach zero. Fortunately, there is an instruction specially for this situation - [[isa#waitrep|WAITREP]]. It takes one argument - a ''0'' or ''1'' value corresponding to port, and all it does is pause display list execution until an operation controlled by that port completes. So if we wanted to clear 500 bytes of the ''buffer'', we could do this: MOV VREG_ADR1L, buffer MOV VREG_STEP1, 1 MOV VREG_PORT1, 0 ; clear the first byte MOV VREG_REP1, 0 ; clear 256 more bytes WAITREP 1 ; wait for the the operation started above to finish MOV VREG_PORT1, 243 ; now clear remaining 500 - 256 - 1 = 243 bytes... WAITREP 1 ; ...and wait until it completes How would we go about copying a buffer of 200 bytes and reversing it while doing so? MOV VREG_ADR0L, src MOV VREG_ADR1L, <(dst + 199) ; point to the end of the destination MOV VREG_ADR1H, >(dst + 199) MOV VREG_STEP0, 1 ; walk from "src" up MOV VREG_STEP1, -1 ; walk from end of "dst" down MOV VREG_CONTROL, (1 << CONTROL_DLIST_ON_BIT) | CONTROL_PORT_MODE_COPY MOV VREG_REP1, 200 ; start copying... WAITREP 1 ; wait for it to end Note, that we don't have the comfort of 6510 ''ORA'' instruction, so we cannot be selective about what bits we change in ''CONTROL'' register. We can be pretty sure ''CONTROL_DLIST_ON_BIT'' should also be set, because the display list wouldn't be executing without it, but you may also need to set other bits, depending on what you are doing. One more VASYL instruction useful while working with ports is [[isa#xfer|XFER]]. What it does is reading from indicated port and storing the value just read into a VIC-II or VASYL register. So XFER $d020, (1) will read a value from PORT1, and store it in VIC border color register. Here is a more complete example using PORT0. MOV VREG_ADR0L, colors MOV VREG_STEP0, 1 SETA 25 WAIT 51, 0 loop: XFER $d020, (0) DELAYV 8 DECA BRA loop END colors: .byte 2,3,4,5,9,8,7,6,5,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,2,0 will be using values read from array ''colors'' to change border color every 8 lines. {{ :wiki:xfer.png?600 |}} Finally, since you can also use ''XFER'' to write to PORT registers, it is yet another way to transfer data using display lists. Please see [[https://github.com/madhackerslab/beamracer-examples/blob/master/asm/demo_selfmod.s|demo_selfmod.s]] for an in-depth example.