Today’s exercise focuses on the development flow through AWS HDK as shown in the column on the far right on bellow table:
Vitis / SDAccel | Vitis / SDAccel with RTL | AWS HDK | |
---|---|---|---|
Development Environment | Vitis/SDAccel environment command line | Vitis/SDAccel environment command line | Vivado shell script |
Accelerator (kernel) | OpenCL, C, C++ | RTL (Verilog/VHDL) | RTL (Verilog/VHDL) or HLX (IP Integrator+HLS) |
FPGA API | OpenCL API | OpenCL API | AWS SDK API |
Software emulation | Yes | N/A | N/A |
Hardware emulation | Yes | Yes | RTL simulation for accelerators only |
Performance profiling | Yes | Yes | N/A |
See: https://github.com/aws/aws-fpga/blob/master/Vitis and https://github.com/aws/aws-fpga/blob/master/hdk for further reference
For the development flow by Vitis, please refer to “Reference Information Vitis Development Flow”
Understanding the AWS Shell specification is important when developing custom logic. After reviewing the AWS Shell specification, take sample code as an example to see the sequence of steps from build to execution.
The host and FPGA are connected with PCIe (x16 Gen3), and the PCIe interface to each FPGA consists of two physical functions (PF)
Each PF consists of multiple PCIe base address registers (BAR). All PCIe BAR are mapped to the memory mapped I/O (MMIO) space on the EC2 instance.
[centos @ip -172-31-76-63 ~] $ lspci -v -s 00:1d.0
00:1d.0 Memory controller: Amazon.com, Inc. Device f010
Subsystem:Device fedd:1d51
Physical Slot: 29
Flags: bus master, fast devsel, latency 0
Memory at 82000000 (32-bit, non-prefetchable) [size=32M]
Memory at 85400000 (32-bit, non-prefetchable) [size=2M]
Memory at 8560000 (64-bit, prefetchable) [size=64K]
Memory at 20000000000 (64-bit, prefetchable) [size=128G]
Capabilities: <access denied>
Kernel driver in use: xocl
Kernel modules: xocl
BAR0
BAR1
BAR2
BAR4
[centos @ip -172-31-76-63 ~] $ lspci -v -s 00:1e.0
00:1e.0 Memory controller: Amazon.com, Inc. Device 1041
Subsystem: Xilink Corporation Device 0007
Physical Slot: 30
Flags: fast devsel
Memory at 85618000 (64-bit, prefetchable) [size=16k]
Memory at 8561c000 (64-bit, prefetchable) [size=16k]
Memory at 85000000 (64-bit, prefetchable) [size=4m]
Capabilities: <access denied>
BAR0
BAR2
BAR4
Both DMA_PCIS and PCIM interfaces consist of 512-bit wide AXI-4 interfaces. Used for data transfer where performance is required
The Axi-Lite interface consists of 32 bits wide and is used for register access. Three interfaces available depending on the application
Each FPGA card has four 16GB DIMMs each, and the internal logic and DDR4 memory controller are connected by 512-bit bus-width AXI-4 interfaces.
Of the four memory controllers, one is implemented on SH and the other three on CL. Therefore, there is an AXI-4 interface between SH-CL, where the CL is the master.
Required for loading and clearing AFI, checking status, debugging, etc. The following three interfaces are available
Required for read, write access, interrupt processing, DMA access, etc., to CL (Custom Logic)
Required for read, write access, interrupt processing, DMA access, etc., to CL (Custom Logic)