Kernel launches with Circle

Peruse the source yourself: kernel.cu

Mark Harris put together Six Ways to SAXPY, comparing the linear algebra SAXPY call implemented in six different GPU frontends. This document advances a seventh: the SAXPY launch with Circle.

How does Circle differ from NVIDIA's nvcc compiler?

  1. Circle is single pass. The __CUDA_ARCH__ macro is defined once, and reflects the most recent device architecture targeted. It cannot be used to gate between host and runtime code, and cannot be used to target specific architectures. Its only use in Circle is to enable function definitions in the CUDA Toolkit headers that require it. Instead of relying on __CUDA_ARCH__, Circle programmers have access to an nvvm_arch_t named enumeration, with an enumeration sm_XX for each GPU architecture targeted at the command line. The implicitly-defined variable __cuda_arch reflects the device module currently being targeted. Use @codegen if to execute a branch during code generation time (that is, still during compile-time, but after parsing and semantic analysis has run). There are two major benefits to the single-pass design: compile times and code simplicity.
  2. Circle doesn't require tags. __host__ and __device__ tags have similar meanings in Circle as they do in nvcc, but you aren't obligated to tag functions just to use them from kernels. The saxpy helper function below is untagged, allowing it to be called from host or device code. This is especially useful for using big existing libraries, as no porting should be required. Thrust provides tagged versions of complicated STL classes like tuple and array. Circle lets you use those types directly.
  3. Circle lets you metaprogram in C++. Delivering tuning parameters to kernels has been a tricky with CUDA. It requires an unholy union of templates and macros. With Circle, you can use the data structure of your choosing to hold kernel parameters, and query it the exact same way from the host and device code. For the host code, we make a normal query. For the device code, we query from a @meta context, which invokes the interpreter to make a compile-time copy of the underlying data structure, and pulls the values in as constexpr objects. We do this in the device for each targeted device architecture, then execute on the tunings corresponding to the architecture being built by the code generator. No macros or template metaprogramming is required.

Circle's metaprogramming facilities give us the flexibility to parameterize operations based on GPU architecture, data type, problem size and input distribution, and so on. We can use the same constructs to access the kernel parameters from device and host code, even though the data is constexpr on the device side!

#include <cuda.h>
#include <map>

struct details_t {
  int nt;     // number of threads per CTA.
  int vt;     // number of values per thread.

  // Go nuts and put all sorts of other options here. Any data type will
  // work.
};

// For each SM version we care about, provide specific tunings for our 
// kernel. The options need to be accessible to both the host at runtime
// (so that we can size the launches and understand how to call the kernel)
// and the device at compile time (so we get efficient code generation).

// We'll use the same std::map for these uses. The @meta context and integrated
// interpreter will allow us to manipulate a compile-time copy of the map.
// This is much easier than C++ template metaprogramming, which doesn't allow
// for random-access into standard containers.

// You can load from CSV or JSON in your build directory. Or you can load from
// a database or scrape tweets or anything you like. The Circle interpreter
// makes importing data very easy.
const std::map<int, details_t> kernel_config {
  { 35, { 256, 3  } },
  { 52, { 128, 7  } },
  { 61, { 128, 11 } },
  { 70, { 256, 8  } }
};

// For convenience, make a templated version of our function. We don't have
// to give it a __device__ tag, because Circle allows you to call untagged
// functions from kernels and other device code. This is especially convenient
// in that we can use std::array, std::tuple and the like directly from the
// STL, and not rely on their equivalents in thrust.
template<int nt, int vt, typename type_t>
void saxpy(type_t a, const type_t* x, type_t* y, size_t index, size_t count) {
  // First load the values in x and y. Perform a bunch of loads before
  // doing any arithmetic. This lets us saturate the memory bandwidth of the
  // device, since we aren't getting blocked by arithmetic data dependencies.
  // @meta for makes this a true unrolled loop--the contained statements are
  // actually injected through the frontend once for each iteration.
  type_t x2[vt], y2[vt];
  @meta for(int i = 0; i < vt; ++i) {
    if(index + i * nt < count) {
      x2[i] = x[index + i * nt];
      y2[i] = y[index + i * nt];
    }
  }

  // Perform arithmetic in-place.
  @meta for(int i = 0; i < vt; ++i)
    y2[i] += a * x2[i];

  // Write all y values back out.
  @meta for(int i = 0; i < vt; ++i) {
    if(index + i * nt < count)
      y[index + i * nt] = y2[i];
  }
}

template<typename type_t>
__global__ void kernel(type_t a, const type_t* x, type_t* y, size_t count) {

  // Loop over each enumerator for the -sm_XX architectures specified at 
  // the command line. These are not necessarily the ones specified in the
  // kernel_config map!
  @meta for enum(auto sm : nvvm_arch_t) {

    // __nvvm_arch is set by Circle's code generator to reflect the currently-
    // targeted NVVM/NVPTX module. If sm is the currently-targeted 
    // architecture in the backend, this branch will get emitted as LLVM IR
    // and compiled down to PTX. The other iterations in the loop will be 
    // skipped over, leaving us with one call to saxpy per target architecture.
    @codegen if(__nvvm_arch == sm) {
      
      // Read the details from the std::map at compile time. This is not the
      // same instance of the kernel_config map that is read from main.
      // This kernel_config object is lazily created by the interpreter when
      // we try to access it in an expression in a @meta statement. It's 
      // initialized with the same std::initializer_list ctor, so it has the
      // same value as the ordinary runtime object accessed by main.

      // Use lower_bound search to find the best fit.
      @meta auto it = kernel_config.lower_bound((int)sm);
      static_assert(it != kernel_config.end(), 
        "requested SM version has no kernel details!");

      // Call the saxpy helper function with the parameters from details.
      // Since we loaded details in a @meta statement, its members are
      // effectively constexpr, and we can use them to specialized templates, 
      // size arrays, and so on.

      // We don't have to specialize the helper function over the kernel
      // tuning parameter's. If we preferred, we could specialize over 
      // the nvvm_arch_t sm constant and access the kernel_config data
      // structure directly from saxpy.
      @meta details_t details = it->second;

      size_t index = blockIdx.x * details.nt * details.vt + threadIdx.x;
      saxpy<details.nt, details.vt>(a, x, y, index, count);
    }
  }
}

int main(int argc, char** argv) {

  size_t count = 1000000;

  // Assemble the SM version at runtime based on the GPU plugged into your
  // machine.
  int major, minor;
  cudaDeviceGetAttribute(&major, cudaDevAttrComputeCapabilityMajor, 0);
  cudaDeviceGetAttribute(&minor, cudaDevAttrComputeCapabilityMinor, 0);
  int sm = 10 * major + minor;

  // Retrieve the kernel config.
  auto it = kernel_config.lower_bound(sm);
  if(it == kernel_config.end()) {
    fprintf(stderr, "requested SM version has no kernel details!");
    return 1;
  }
  details_t details = it->second;

  // Compute the number of blocks. This is a runtime operation.
  int nv = details.nt * details.vt;
  int num_blocks = (count + nv - 1) / nv;

  float* x, *y;
  cudaMalloc((void**)&x, sizeof(float) * count);
  cudaMalloc((void**)&y, sizeof(float) * count);

  // Launch the kernel. Note we aren't passing the kernel any architecture-
  // specific template arguments. It gets the architecture from the __cuda_arch
  // @codegen variable.
  kernel<<<num_blocks, details.nt>>>(3.14f, x, y, count);

  cudaFree(x);
  cudaFree(y);
        
  return 0;
}

To compile the code, use -cuda-path to point to the Toolkit installation, specify each architecture you're targeting, and link to libcudart.so. Circle makes a single frontend pass, binds all the ptx and cubin data into a single fatbin, and links your code. The chevron launch works just like nvcc's.

$ circle -cuda-path /usr/local/cuda-10.0 -sm_35 -sm_52 -sm_61 -sm_70 kernel.cu -lcudart
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z6kernelIfEvT_PKS0_PS0_m' for 'sm_35'
ptxas info    : Function properties for _Z6kernelIfEvT_PKS0_PS0_m
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 13 registers, 352 bytes cmem[0]
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z6kernelIfEvT_PKS0_PS0_m' for 'sm_52'
ptxas info    : Function properties for _Z6kernelIfEvT_PKS0_PS0_m
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 23 registers, 352 bytes cmem[0]
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z6kernelIfEvT_PKS0_PS0_m' for 'sm_61'
ptxas info    : Function properties for _Z6kernelIfEvT_PKS0_PS0_m
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 37 registers, 352 bytes cmem[0]
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z6kernelIfEvT_PKS0_PS0_m' for 'sm_70'
ptxas info    : Function properties for _Z6kernelIfEvT_PKS0_PS0_m
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 30 registers, 384 bytes cmem[0]

We can use cuobjdump to print out the SASS linked into the executable. We see that our register blocking has produced code where most of the loads are issued, then the compute is performed, then the stores are made. This is the optimal design for GPU kernels, where we want to exploit fine-grained multithreading to hide IO latency.

$ cuobjdump -sass kernel
...
  code for sm_70
    Function : _Z6kernelIfEvT_PKS0_PS0_m
  .headerflags    @"EF_CUDA_SM70 EF_CUDA_PTX_SM(EF_CUDA_SM70)"
        /*0000*/              @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ;                             /* 0x000000fffffff389 */
                                                                                             /* 0x000fe200000e00ff */
        /*0010*/                   IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ;                   /* 0x00000a00ff017624 */
                                                                                             /* 0x000fd000078e00ff */
        /*0020*/                   S2R R22, SR_CTAID.X ;                                     /* 0x0000000000167919 */
                                                                                             /* 0x000e220000002500 */
        /*0030*/                   S2R R3, SR_TID.X ;                                        /* 0x0000000000037919 */
                                                                                             /* 0x000e240000002100 */
        /*0040*/                   IMAD R22, R22, 0x800, R3 ;                                /* 0x0000080016167824 */
                                                                                             /* 0x001fe400078e0203 */
        /*0050*/                   IMAD.MOV.U32 R3, RZ, RZ, 0x4 ;                            /* 0x00000004ff037424 */
                                                                                             /* 0x000fc600078e00ff */
        /*0060*/                   ISETP.GE.U32.AND P6, PT, R22.reuse, c[0x0][0x178], PT ;   /* 0x00005e0016007a0c */
                                                                                             /* 0x040fe40003fc6070 */
        /*0070*/                   SHF.R.S32.HI R21, RZ, 0x1f, R22 ;                         /* 0x0000001fff157819 */
                                                                                             /* 0x000fe20000011416 */
        /*0080*/                   IMAD.WIDE R4, R22, R3, c[0x0][0x168] ;                    /* 0x00005a0016047625 */
                                                                                             /* 0x000fc600078e0203 */
        /*0090*/                   ISETP.GE.U32.AND.EX P6, PT, R21, c[0x0][0x17c], PT, P6 ;  /* 0x00005f0015007a0c */
                                                                                             /* 0x000fe20003fc6160 */
        /*00a0*/                   IMAD.WIDE R2, R22, R3, c[0x0][0x170] ;                    /* 0x00005c0016027625 */
                                                                                             /* 0x000fd600078e0203 */
        /*00b0*/              @!P6 LDG.E.SYS R19, [R4] ;                                     /* 0x000000000413e381 */
                                                                                             /* 0x000ea800001ee900 */
        /*00c0*/              @!P6 LDG.E.SYS R20, [R2] ;                                     /* 0x000000000214e381 */
                                                                                             /* 0x000ea200001ee900 */
        /*00d0*/                   IADD3 R24, P3, R22.reuse, 0x100, RZ ;                     /* 0x0000010016187810 */
                                                                                             /* 0x040fe40007f7e0ff */
        /*00e0*/                   IADD3 R23, P2, R22, 0x200, RZ ;                           /* 0x0000020016177810 */
                                                                                             /* 0x000fe40007f5e0ff */
        /*00f0*/                   ISETP.GE.U32.AND P5, PT, R24, c[0x0][0x178], PT ;         /* 0x00005e0018007a0c */
                                                                                             /* 0x000fe20003fa6070 */
        /*0100*/                   IMAD.X R26, RZ, RZ, R21, P3 ;                             /* 0x000000ffff1a7224 */
                                                                                             /* 0x000fe200018e0615 */
        /*0110*/                   IADD3 R24, P0, R22, 0x300, RZ ;                           /* 0x0000030016187810 */
                                                                                             /* 0x000fc40007f1e0ff */
        /*0120*/                   ISETP.GE.U32.AND P4, PT, R23, c[0x0][0x178], PT ;         /* 0x00005e0017007a0c */
                                                                                             /* 0x000fe40003f86070 */
        /*0130*/                   ISETP.GE.U32.AND P3, PT, R24, c[0x0][0x178], PT ;         /* 0x00005e0018007a0c */
                                                                                             /* 0x000fe20003f66070 */
        /*0140*/                   IMAD.X R24, RZ, RZ, R21.reuse, P2 ;                       /* 0x000000ffff187224 */
                                                                                             /* 0x100fe200010e0615 */
        /*0150*/                   IADD3 R23, P1, R22, 0x400, RZ ;                           /* 0x0000040016177810 */
                                                                                             /* 0x000fe20007f3e0ff */
        /*0160*/                   IMAD.X R27, RZ, RZ, R21.reuse, P0 ;                       /* 0x000000ffff1b7224 */
                                                                                             /* 0x100fe200000e0615 */
        /*0170*/                   ISETP.GE.U32.AND.EX P5, PT, R26, c[0x0][0x17c], PT, P5 ;  /* 0x00005f001a007a0c */
                                                                                             /* 0x000fe40003fa6150 */
        /*0180*/                   IADD3 R26, P0, R22, 0x500, RZ ;                           /* 0x00000500161a7810 */
                                                                                             /* 0x000fe40007f1e0ff */
        /*0190*/                   ISETP.GE.U32.AND P2, PT, R23, c[0x0][0x178], PT ;         /* 0x00005e0017007a0c */
                                                                                             /* 0x000fe20003f46070 */
        /*01a0*/                   IMAD.X R23, RZ, RZ, R21.reuse, P1 ;                       /* 0x000000ffff177224 */
                                                                                             /* 0x100fe200008e0615 */
        /*01b0*/                   ISETP.GE.U32.AND.EX P4, PT, R24, c[0x0][0x17c], PT, P4 ;  /* 0x00005f0018007a0c */
                                                                                             /* 0x000fe20003f86140 */
        /*01c0*/                   IMAD.X R24, RZ, RZ, R21, P0 ;                             /* 0x000000ffff187224 */
                                                                                             /* 0x000fe200000e0615 */
        /*01d0*/                   ISETP.GE.U32.AND P1, PT, R26, c[0x0][0x178], PT ;         /* 0x00005e001a007a0c */
                                                                                             /* 0x000fc40003f26070 */
        /*01e0*/                   IADD3 R26, P0, R22, 0x600, RZ ;                           /* 0x00000600161a7810 */
                                                                                             /* 0x000fe40007f1e0ff */
        /*01f0*/                   ISETP.GE.U32.AND.EX P3, PT, R27, c[0x0][0x17c], PT, P3 ;  /* 0x00005f001b007a0c */
                                                                                             /* 0x000fc60003f66130 */
        /*0200*/                   IMAD.X R27, RZ, RZ, R21.reuse, P0 ;                       /* 0x000000ffff1b7224 */
                                                                                             /* 0x100fe200000e0615 */
        /*0210*/                   IADD3 R22, P0, R22, 0x700, RZ ;                           /* 0x0000070016167810 */
                                                                                             /* 0x000fe40007f1e0ff */
        /*0220*/                   P2R R25, PR, RZ, 0x40 ;                                   /* 0x00000040ff197803 */
                                                                                             /* 0x000fe40000000000 */
        /*0230*/                   ISETP.GE.U32.AND P6, PT, R22, c[0x0][0x178], PT ;         /* 0x00005e0016007a0c */
                                                                                             /* 0x000fe20003fc6070 */
        /*0240*/                   IMAD.X R21, RZ, RZ, R21, P0 ;                             /* 0x000000ffff157224 */
                                                                                             /* 0x000fca00000e0615 */
        /*0250*/                   ISETP.GE.U32.AND.EX P6, PT, R21, c[0x0][0x17c], PT, P6 ;  /* 0x00005f0015007a0c */
                                                                                             /* 0x000fc80003fc6160 */
        /*0260*/                   P2R R21, PR, RZ, 0x40 ;                                   /* 0x00000040ff157803 */
                                                                                             /* 0x000fe40000000000 */
        /*0270*/                   ISETP.NE.AND P6, PT, R25, RZ, PT ;                        /* 0x000000ff1900720c */
                                                                                             /* 0x000fe40003fc5270 */
        /*0280*/                   ISETP.GE.U32.AND P0, PT, R26, c[0x0][0x178], PT ;         /* 0x00005e001a007a0c */
                                                                                             /* 0x000fe40003f06070 */
        /*0290*/                   ISETP.GE.U32.AND.EX P2, PT, R23, c[0x0][0x17c], PT, P2 ;  /* 0x00005f0017007a0c */
                                                                                             /* 0x000fe40003f46120 */
        /*02a0*/                   ISETP.GE.U32.AND.EX P1, PT, R24, c[0x0][0x17c], PT, P1 ;  /* 0x00005f0018007a0c */
                                                                                             /* 0x000fe40003f26110 */
        /*02b0*/                   ISETP.GE.U32.AND.EX P0, PT, R27, c[0x0][0x17c], PT, P0 ;  /* 0x00005f001b007a0c */
                                                                                             /* 0x000fe20003f06100 */
        /*02c0*/              @!P5 LDG.E.SYS R17, [R4+0x400] ;                               /* 0x000400000411d381 */
                                                                                             /* 0x000ee800001ee900 */
        /*02d0*/              @!P4 LDG.E.SYS R16, [R4+0x800] ;                               /* 0x000800000410c381 */
                                                                                             /* 0x000f2800001ee900 */
        /*02e0*/              @!P3 LDG.E.SYS R14, [R4+0xc00] ;                               /* 0x000c0000040eb381 */
                                                                                             /* 0x000f6800001ee900 */
        /*02f0*/              @!P2 LDG.E.SYS R12, [R4+0x1000] ;                              /* 0x00100000040ca381 */
                                                                                             /* 0x000ee800001ee900 */
        /*0300*/              @!P1 LDG.E.SYS R10, [R4+0x1400] ;                              /* 0x00140000040a9381 */
                                                                                             /* 0x000ee800001ee900 */
        /*0310*/              @!P0 LDG.E.SYS R8, [R4+0x1800] ;                               /* 0x0018000004088381 */
                                                                                             /* 0x000ee800001ee900 */
        /*0320*/              @!P5 LDG.E.SYS R18, [R2+0x400] ;                               /* 0x000400000212d381 */
                                                                                             /* 0x000ee800001ee900 */
        /*0330*/              @!P4 LDG.E.SYS R15, [R2+0x800] ;                               /* 0x00080000020fc381 */
                                                                                             /* 0x000f2800001ee900 */
        /*0340*/              @!P3 LDG.E.SYS R13, [R2+0xc00] ;                               /* 0x000c0000020db381 */
                                                                                             /* 0x000f6800001ee900 */
        /*0350*/              @!P2 LDG.E.SYS R11, [R2+0x1000] ;                              /* 0x00100000020ba381 */
                                                                                             /* 0x000f2800001ee900 */
        /*0360*/              @!P1 LDG.E.SYS R9, [R2+0x1400] ;                               /* 0x0014000002099381 */
                                                                                             /* 0x000f2800001ee900 */
        /*0370*/              @!P0 LDG.E.SYS R7, [R2+0x1800] ;                               /* 0x0018000002078381 */
                                                                                             /* 0x000f2200001ee900 */
        /*0380*/              @!P6 FFMA R19, R19, c[0x0][0x160], R20 ;                       /* 0x000058001313ea23 */
                                                                                             /* 0x004fe20000000014 */
        /*0390*/                   P2R R20, PR, RZ, 0x40 ;                                   /* 0x00000040ff147803 */
                                                                                             /* 0x000fc40000000000 */
        /*03a0*/                   ISETP.NE.AND P6, PT, R21, RZ, PT ;                        /* 0x000000ff1500720c */
                                                                                             /* 0x000fd80003fc5270 */
        /*03b0*/              @!P6 LDG.E.SYS R0, [R4+0x1c00] ;                               /* 0x001c00000400e381 */
                                                                                             /* 0x00002200001ee900 */
        /*03c0*/                   ISETP.NE.AND P6, PT, R20, RZ, PT ;                        /* 0x000000ff1400720c */
                                                                                             /* 0x000fd80003fc5270 */
        /*03d0*/              @!P6 STG.E.SYS [R2], R19 ;                                     /* 0x000000130200e386 */
                                                                                             /* 0x0001e2000010e900 */
        /*03e0*/                   ISETP.NE.AND P6, PT, R21, RZ, PT ;                        /* 0x000000ff1500720c */
                                                                                             /* 0x000fd80003fc5270 */
        /*03f0*/              @!P6 LDG.E.SYS R6, [R2+0x1c00] ;                               /* 0x001c00000206e381 */
                                                                                             /* 0x00002200001ee900 */
        /*0400*/              @!P5 FFMA R17, R17, c[0x0][0x160], R18 ;                       /* 0x000058001111da23 */
                                                                                             /* 0x008fd00000000012 */
        /*0410*/              @!P5 STG.E.SYS [R2+0x400], R17 ;                               /* 0x000400110200d386 */
                                                                                             /* 0x0001e2000010e900 */
        /*0420*/              @!P4 FFMA R15, R16, c[0x0][0x160], R15 ;                       /* 0x00005800100fca23 */
                                                                                             /* 0x010fe4000000000f */
        /*0430*/              @!P3 FFMA R13, R14, c[0x0][0x160], R13 ;                       /* 0x000058000e0dba23 */
                                                                                             /* 0x020fe4000000000d */
        /*0440*/              @!P2 FFMA R11, R12, c[0x0][0x160], R11 ;                       /* 0x000058000c0baa23 */
                                                                                             /* 0x000fe4000000000b */
        /*0450*/              @!P1 FFMA R9, R10, c[0x0][0x160], R9 ;                         /* 0x000058000a099a23 */
                                                                                             /* 0x000fe40000000009 */
        /*0460*/              @!P0 FFMA R7, R8, c[0x0][0x160], R7 ;                          /* 0x0000580008078a23 */
                                                                                             /* 0x000fe20000000007 */
        /*0470*/              @!P4 STG.E.SYS [R2+0x800], R15 ;                               /* 0x0008000f0200c386 */
                                                                                             /* 0x0001e8000010e900 */
        /*0480*/              @!P3 STG.E.SYS [R2+0xc00], R13 ;                               /* 0x000c000d0200b386 */
                                                                                             /* 0x0001e8000010e900 */
        /*0490*/              @!P2 STG.E.SYS [R2+0x1000], R11 ;                              /* 0x0010000b0200a386 */
                                                                                             /* 0x0001e8000010e900 */
        /*04a0*/              @!P1 STG.E.SYS [R2+0x1400], R9 ;                               /* 0x0014000902009386 */
                                                                                             /* 0x0001e8000010e900 */
        /*04b0*/              @!P0 STG.E.SYS [R2+0x1800], R7 ;                               /* 0x0018000702008386 */
                                                                                             /* 0x0001e2000010e900 */
        /*04c0*/               @P6 EXIT ;                                                    /* 0x000000000000694d */
                                                                                             /* 0x000fea0003800000 */
        /*04d0*/                   FFMA R0, R0, c[0x0][0x160], R6 ;                          /* 0x0000580000007a23 */
                                                                                             /* 0x001fd00000000006 */
        /*04e0*/                   STG.E.SYS [R2+0x1c00], R0 ;                               /* 0x001c000002007386 */
                                                                                             /* 0x000fe2000010e900 */
        /*04f0*/                   EXIT ;                                                    /* 0x000000000000794d */
                                                                                             /* 0x000fea0003800000 */
        /*0500*/                   BRA 0x500;                                                /* 0xfffffff000007947 */
                                                                                             /* 0x000fc0000383ffff */
        /*0510*/                   NOP;                                                      /* 0x0000000000007918 */
                                                                                             /* 0x000fc00000000000 */
        /*0520*/                   NOP;                                                      /* 0x0000000000007918 */
                                                                                             /* 0x000fc00000000000 */
        /*0530*/                   NOP;                                                      /* 0x0000000000007918 */
                                                                                             /* 0x000fc00000000000 */
        /*0540*/                   NOP;                                                      /* 0x0000000000007918 */
                                                                                             /* 0x000fc00000000000 */
        /*0550*/                   NOP;                                                      /* 0x0000000000007918 */
                                                                                             /* 0x000fc00000000000 */
        /*0560*/                   NOP;                                                      /* 0x0000000000007918 */
                                                                                             /* 0x000fc00000000000 */
        /*0570*/                   NOP;                                                      /* 0x0000000000007918 */
                                                                                             /* 0x000fc00000000000 */
    ....................................