User Tools

Site Tools


making_full_use_of_memory_ports

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
making_full_use_of_memory_ports [2020/10/11 23:36] – created laubzegamaking_full_use_of_memory_ports [2020/10/18 21:56] (current) – [Inside a port] laubzega
Line 7: Line 7:
 The ports are intended to be efficient and flexible means of accessing VASYL's local memory. As such, they have been equipped with extra hardware to accelerate some of the most frequent operations. We have already seen memory pointer post-increment/decrement in action, but there is more. The ports are intended to be efficient and flexible means of accessing VASYL's local memory. As such, they have been equipped with extra hardware to accelerate some of the most frequent operations. We have already seen memory pointer post-increment/decrement in action, but there is more.
  
-Each of the two ports has its own DMA channel to the local memory, and is able to make one access - either read or write - per system clock cycle. This gives a theoretical transfer rate of ~1 MB/s per port. The 6502 is too slow to saturate even one such channel, because in the best case it can write to IO region once every four cycles (e.g. using a sequence of **STA** instructions). However, since display lists are also able to write to port registers, they can use ports to manipulate local memory at great speed. And since display lists are themselves located in that memory, it opens the possibility for display lists to manipulate both themselves and the data they use.+Each of the two ports has its own DMA channel to the local memory, and is able to make one access - either read or write - per system clock cycle. This gives a theoretical transfer rate of ~1 MB/s per port. The 6510 is too slow to saturate even one such channel, because in the best case it can write to IO region once every four cycles (e.g. using a sequence of **STA** instructions). However, since display lists are also able to write to port registers, they can use ports to manipulate local memory at great speed. And since display lists are themselves located in that memory, it opens the possibility for display lists to manipulate both themselves and the data they use.
  
 Before we get to that, let's have a closer look at registers ''REP0'' and ''REP1''. Before we get to that, let's have a closer look at registers ''REP0'' and ''REP1''.
Line 13: Line 13:
 ==== REP0 and REP1 ==== ==== REP0 and REP1 ====
  
-As you remember from the introductory chapter, writing a value to register ''PORT0'' will transfer that value to a destination in local memory pointed to by ''ADR0L'' and ''ADR0H'':+As you remember from the introductory chapter, writing a value to register ''PORT0'' will transfer that value to a destination in local memory pointed to by ''ADR0L'' and ''ADR0H''. For instance, this code
  
 <code> <code>
Line 120: Line 120:
 While this approach is faster than copying data in C64 base memory (you can use the simplest addressing modes, and don't have to worry about updating the source and destination addresses), it still is far below theoretical throughput of the ports - even with unrolled loops the best we can do is around 8 cycles per byte (<125KB/s). While this approach is faster than copying data in C64 base memory (you can use the simplest addressing modes, and don't have to worry about updating the source and destination addresses), it still is far below theoretical throughput of the ports - even with unrolled loops the best we can do is around 8 cycles per byte (<125KB/s).
  
-As you might have already suspected, we can side-step the CPU entirely, and copy data directly using ports' DMA channels. This mode is activated by activating ''CTRL_PORT_COPY_MODE'' in ''CONTROL'' register. Here is the fast way to copy 256-bytes:+As you might have already suspected, we can side-step the CPU entirely, and copy data directly using ports' DMA channels. This mode is activated by setting ''CTRL_PORT_COPY_MODE'' in ''CONTROL'' register. Here is the fast way to copy 256-bytes:
  
 <code> <code>
Line 140: Line 140:
                  
         LDA #0           ; As previously, "0" means "256".         LDA #0           ; As previously, "0" means "256".
-        STA VREG_REP1    ; Write to REP1 kick-offs hardware copy.+        STA VREG_REP1    ; Write to REP1 kicks off hardware copy.
 </code> </code>
  
Line 161: Line 161:
 Final two comments: Final two comments:
   - Because of how DMA channels are allocated within a cycle, accelerated copying is only possible  from PORT0 to PORT1, not the other way around   - Because of how DMA channels are allocated within a cycle, accelerated copying is only possible  from PORT0 to PORT1, not the other way around
-  - ''STEP0'' and ''STEP1'' can naturally differ, enabling you to reorganize your data as you copy it: change order, insert gaps, etc.+  - ''STEP0'' and ''STEP1'' can naturally be different from each other, enabling you to reorganize your data as you copy it: change order, insert gaps, etc.
  
 For comprehensive example demonstrating use of all these features, please see [[https://github.com/madhackerslab/beamracer-examples/blob/master/asm/demo_hirestext.s|demo_hirestext.s]]. For comprehensive example demonstrating use of all these features, please see [[https://github.com/madhackerslab/beamracer-examples/blob/master/asm/demo_hirestext.s|demo_hirestext.s]].
Line 217: Line 217:
 </code> </code>
  
-Will read a value from PORT1, and store it VIC border color register. Here is a more complete example using PORT0.+will read a value from PORT1, and store it in VIC border color register. Here is a more complete example using PORT0.
  
 <code vasyl> <code vasyl>
making_full_use_of_memory_ports.1602484608.txt.gz · Last modified: 2020/10/11 23:36 by laubzega