Table of Contents
In-depth look at ports
Attentive readers of introductory section on local memory may have noticed that out of five registers that constitute a port, only four were described. Now is the time to dive deeper into port operation and explain the role(s) of registers REP0
and REP1
.
Inside a port
The ports are intended to be efficient and flexible means of accessing VASYL's local memory. As such, they have been equipped with extra hardware to accelerate some of the most frequent operations. We have already seen memory pointer post-increment/decrement in action, but there is more.
Each of the two ports has its own DMA channel to the local memory, and is able to make one access - either read or write - per system clock cycle. This gives a theoretical transfer rate of ~1 MB/s per port. The 6510 is too slow to saturate even one such channel, because in the best case it can write to IO region once every four cycles (e.g. using a sequence of STA instructions). However, since display lists are also able to write to port registers, they can use ports to manipulate local memory at great speed. And since display lists are themselves located in that memory, it opens the possibility for display lists to manipulate both themselves and the data they use.
Before we get to that, let's have a closer look at registers REP0
and REP1
.
REP0 and REP1
As you remember from the introductory chapter, writing a value to register PORT0
will transfer that value to a destination in local memory pointed to by ADR0L
and ADR0H
. For instance, this code
LDA #<target STA VREG_ADR0L LDA #>target STA VREG_ADR0H LDA #$A0 STA VREG_PORT0
will store value $a0
at location target
in the current memory bank. Following the store, ADR0
will be increased by STEP0
's contents (or decreased, depending on its sign - it's a value between -128
and +127
).
LDA #<target STA VREG_ADR0L LDA #>target STA VREG_ADR0H LDA #1 STA VREG_STEP0 LDA #$A0 STA VREG_PORT0 STA VREG_PORT0 STA VREG_PORT0 STA VREG_PORT0
will thus result in $a0
being stored in locations target
, target+1
, target+2
, and target+3
. Now, rather than repeating ourselves, instead we can use register REP0
- a value stored there repeats the last write to PORT0
this many times, i.e.
LDA #<target STA VREG_ADR0L LDA #>target STA VREG_ADR0H LDA #1 STA VREG_STEP0 LDA #$A0 STA VREG_PORT0 LDA #3 STA VREG_REP0
will have exactly same result as the previous example, because the initial write of $A0
to PORT0
will be repeated three times. What is different is speed - the three extra writes will be executed in the next three cycles, thus using the full DMA channel throughput of ~1 MB/s. Let's see how we could use it to rapidly clear an 8 KiB screen, and learn a few more things in the process.
LDA #<screen STA VREG_ADR0L LDA #>screen STA VREG_ADR0H LDA #1 STA VREG_STEP0 LDX #8192 / 256 ; We will be clearing a page at a time. LDA #$00 STA VREG_PORT0 ; First byte here... LDY #255 ; ...then remaining 255 bytes of the first page... loop: STY VREG_REP0 ; ...and 256 on the following ones. waitrep: LDY VREG_REP0 ; Is auto-repetition still ongoing? BNE waitrep ; If so, let's wait for it to end. DEX BNE loop
A few things to note:
- We want the routine to be accurate, so after clearing the very first byte with a write to
PORT0
, we only need to clean 255 bytes of the first page. That's why we write255
toREP0
on the first pass through the loop, but0
on subsequent ones. - A value of
0
written toREP0
means256
, i.e. “repeat the last action 256 times”. - As the sequence of writes progresses, CPU is free to do other things. However, if it wants to kick off another repetition, it first needs to wait for the current one to finish. Since
REP0
contains the number of bytes that remain to be written, all that needs to be done is repeatedly readingREP0
and checking whether it reached zero. - Each port has its dedicated DMA channel. This means that it does not interfere with other memory operations, display list execution, bitmap sequencer fetches, etc.
- PORT0 and PORT1 are also fully independent, so each can execute auto-repetition at full speed. You could thus be clearing the screen using both of them simultaneously, doubling the performance to ~2 MB/s.
- If both ports make memory access in the same cycle, PORT0 does it first.
Copying data in local RAM
In the introductory chapter we also explained how to read from the location pointed to by a port - you just need to set CTRL_PORT_READ_ENABLE
bit in CONTROL
register and then proceed reading from PORT0
(or PORT1
). This can obviously be combined with writing, and used to copy data around - this routine will copy 256 bytes from location source
to destination
:
LDA VREG_CONTROL ORA #CTRL_PORT_READ_ENABLE STA VREG_CONTROL LDA #<source STA VREG_ADR0L LDA #>source STA VREG_ADR0H LDA #<target STA VREG_ADR1L LDA #>target STA VREG_ADR1H LDA #1 STA VREG_STEP0 STA VREG_STEP1 LDX #0 loop: LDA VREG_PORT0 STA VREG_PORT1 DEX BNE loop
While this approach is faster than copying data in C64 base memory (you can use the simplest addressing modes, and don't have to worry about updating the source and destination addresses), it still is far below theoretical throughput of the ports - even with unrolled loops the best we can do is around 8 cycles per byte (<125KB/s).
As you might have already suspected, we can side-step the CPU entirely, and copy data directly using ports' DMA channels. This mode is activated by setting CTRL_PORT_COPY_MODE
in CONTROL
register. Here is the fast way to copy 256-bytes:
LDA VREG_CONTROL ORA #CTRL_PORT_MODE_COPY STA VREG_CONTROL LDA #<source STA VREG_ADR0L LDA #>source STA VREG_ADR0H LDA #<target STA VREG_ADR1L LDA #>target STA VREG_ADR1H LDA #1 STA VREG_STEP0 STA VREG_STEP1 LDA #0 ; As previously, "0" means "256". STA VREG_REP1 ; Write to REP1 kicks off hardware copy.
The last instruction of the above routine launches memory transfer that will take exactly 256 cycles to complete, copying at ~1 MB/s. As previously, 6510 is free to do other things while the data is being copied, but if it wants to modify any of the port registers again, it should wait for the transfer to complete:
waitcopy: LDA VREG_REP1 ; Is the transfer still ongoing? BNE waitcopy ; If so, let's wait for it to end.
Once you're done copying and want to use ports for other purposes, remember to turn off copy mode in the CONTROL
register:
LDA VREG_CONTROL AND #~CTRL_PORT_MODE_MASK STA VREG_CONTROL
Final two comments:
- Because of how DMA channels are allocated within a cycle, accelerated copying is only possible from PORT0 to PORT1, not the other way around
STEP0
andSTEP1
can naturally be different from each other, enabling you to reorganize your data as you copy it: change order, insert gaps, etc.
For comprehensive example demonstrating use of all these features, please see demo_hirestext.s.
Accessing ports from a display list
Since display lists are free to write VASYL registers, it should come as no surprise that they can also use the ports. All the operations described above can be performed using a display list instructions, although due to the fact that VASYL is not a general use CPU, there are some important differences.
Let's first try to setup a simple memory clearing operation using PORT1
:
MOV VREG_ADR1L, <buffer MOV VREG_ADR1H, >buffer MOV VREG_STEP1, 1 MOV VREG_PORT1, 0 MOV VREG_REP1, 49
This code will start clearing a total of 50 bytes starting from address buffer
in the local memory. “Start” is an important word here, because like with the 6510, execution of the display list continues while the operation initiated by the write to VREG_REP1
progresses. This is a desirable feature, but what if we want to perform another operation using the same port? VASYL cannot read individual registers, so we cannot loop waiting for VREG_REP1
to reach zero.
Fortunately, there is an instruction specially for this situation - WAITREP. It takes one argument - a 0
or 1
value corresponding to port, and all it does is pause display list execution until an operation controlled by that port completes. So if we wanted to clear 500 bytes of the buffer
, we could do this:
MOV VREG_ADR1L, <buffer MOV VREG_ADR1H, >buffer MOV VREG_STEP1, 1 MOV VREG_PORT1, 0 ; clear the first byte MOV VREG_REP1, 0 ; clear 256 more bytes WAITREP 1 ; wait for the the operation started above to finish MOV VREG_PORT1, 243 ; now clear remaining 500 - 256 - 1 = 243 bytes... WAITREP 1 ; ...and wait until it completes
How would we go about copying a buffer of 200 bytes and reversing it while doing so?
MOV VREG_ADR0L, <src MOV VREG_ADR0H, >src MOV VREG_ADR1L, <(dst + 199) ; point to the end of the destination MOV VREG_ADR1H, >(dst + 199) MOV VREG_STEP0, 1 ; walk from "src" up MOV VREG_STEP1, -1 ; walk from end of "dst" down MOV VREG_CONTROL, (1 << CONTROL_DLIST_ON_BIT) | CONTROL_PORT_MODE_COPY MOV VREG_REP1, 200 ; start copying... WAITREP 1 ; wait for it to end
Note, that we don't have the comfort of 6510 ORA
instruction, so we cannot be selective about what bits we change in CONTROL
register. We can be pretty sure CONTROL_DLIST_ON_BIT
should also be set, because the display list wouldn't be executing without it, but you may also need to set other bits, depending on what you are doing.
One more VASYL instruction useful while working with ports is XFER. What it does is reading from indicated port and storing the value just read into a VIC-II or VASYL register. So
XFER $d020, (1)
will read a value from PORT1, and store it in VIC border color register. Here is a more complete example using PORT0.
MOV VREG_ADR0L, <colors MOV VREG_ADR0H, >colors MOV VREG_STEP0, 1 SETA 25 WAIT 51, 0 loop: XFER $d020, (0) DELAYV 8 DECA BRA loop END colors: .byte 2,3,4,5,9,8,7,6,5,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,2,0
will be using values read from array colors
to change border color every 8 lines.
Finally, since you can also use XFER
to write to PORT registers, it is yet another way to transfer data using display lists. Please see demo_selfmod.s for an in-depth example.