Creating a 12 X 8 MAC Using VHDL and the Xilinx CORE Generator

Creating a 12 x 8 MAC Using VHDL and the Xilinx CORE Generator

Introduction

In this lab, you will create a 12-bit x 8-bit MAC (Multiplier Accumulator) using a combination of VHDL and the Xilinx CORE Generator. You will create a multiplier unit in VHDL and an accumulator using Core Generator, and then connect them together in the top-level design. This lab helps familiarize you with the Xilinx CORE Generator and the Xilinx implementation tools by having you generate the accumulator as an IP core. This lab is completed using the Xilinx ISE 6 software. You will use a typical VHDL flow to black-box (instantiate) the core into a top-level piece of VHDL code, run a functional HDL simulation, synthesize your design with XST, and take the synthesized design through the Xilinx implementation tools. You will then verify the functionality of the design on-chip using Chipscope-Pro.

Note: For this lab, you do not need to know VHDL because the top-level VHDL file is provided. There is a completed example in c:\xup\dsp_flow\labs\lab2\lab1_soln.

Objectives

After completing this lab, you will be able to:

Generate a CORE Generator macro
Simulate a piece of VHDL containing a CORE Generator macro
Synthesize the VHDL and black-box instantiations using XST
Implement a synthesized design through the Xilinx implementation tools

Design Description

Use a CORE Generator to create a 12 x 8 MAC using VHDL that has the following behavior:

Multiplier input data widths of 12-bits and 8-bits of signed data
Multiplier output width of 20 bits
Accumulator output width of 27-bits

Procedure

This lab comprises nine primary steps: you will start the project navigator and open the project; create a 12x8 multiplier unit using VHDL; generate an accumulator core using CORE Generator; add the CORE Generator macro into the provided VHDL code; synthesize the design using XST; insert the ILA and ICON cores into the MAC design; implement the MAC design; use Chipscope-Pro Analyzer to configure the FPGA and specify match units and trigger conditions; and then perform an on-chip verification. Below each general instruction for a given procedure, you will find accompanying step-by-step directions and illustrated figures providing more detail for performing the general instruction. If you feel confident about a specific instruction, feel free to skip the step-by-step directions and move on to the next general instruction in the procedure.

Note: If you are unable to complete the lab at this time, you can download the lab files for this module from the Xilinx University Program site at

Start the Project Navigator and Open the ProjectStep 1

Launch the ISE Project Navigator and open the mac_cgen project.

Open the Xilinx ISE 6 software: Go to Start MenuProgramsXilinx ISE 6Project Navigator

Open the mac_cgen project: In the Project Navigator, select FileOpen Project

Browse to c:\xup\dsp_flow\labs\lab1 using the pull-down arrow

Open the mac_cgen folder and select the mac_cgen.npl project file

Click OK

Generate the VHDL Code for the MultiplierStep 2

Open the mac_cgen.vhd file and modify it to perform the 12 x 8 multiply operation. Refer to Figure 21-1 block diagram to understand the provided code. The comments in the code will guide you to complete this step. Spend 15 minutes working on your VHDL code, then move on and use the solution provided in lab1_soln directory.

Open the mac_cgen.vhd file: In the Sources in Project window, double-click mac_cgen.vhd

Read through the VHDL file and add code to the following sections:

“Generating the Multiplier

Select mac_cgen.vhd in the Sources in Project window

In the Processes forCurrent Source window, expand Synthesis

Double-click the Check Syntax option to perform syntax check

Fix any reported errors

Generate an Accumulator Using the CORE Generator Step 3

Generate an accumulator by invoking the CORE Generator through the project. Make sure that the input data are signed data and the output width is 27 bits

Create a new source: select ProjectNew Source, or right-click, and choose New Source

Figure 12-1. Adding a New Source to an ISE Project

Select IP(CoreGen & Architecture Wizard), type accum in the File Name field, and click Next

Figure 12-2. Adding a CORE Generator to Your ISE Project.

Select Core Type dialog box will be displayed. Expand Math Functions and then Accumulators

Figure 12-4. Selecting Multiply Accumulators function.

Select Accumulator, click Next button and then Finish

Fill in the following options on for the Accumulators GUI and click Generate to create the accumulator.

Component Name: accum
Operation: Add
Port B Input Options: Port B Width 20; signed
Output Options: Width 27, Registered

Register Options: Clock Enable and Asynchronous Clear

Create RPM: checked
Select Display Core Footprint (bottom right of the GUI)

Figure 12-6. Accumulator Options.

You will see a pop-up window indicating that the accum core was generated successfully. Click OK to invoke the Core Viewer

Fill in the following information from the Core Viewer window

The shape of the generated core should look like the following

Figure 12-9. Core Viewer of the Multiplier Accumulator.

1. Fill in the following information from the Core Viewer window:

Number of CLB wide:

Number of CLB tall:

Number of slices:

Close the Core Viewer and the Core Generator by clicking the DISMISS button

Note: For a detailed explanation of the output files, please see the documentation Help  Online Documentation  CORE Generator Guide, Chapter 3 Using the CORE Generator. The section listing inputs and outputs will thoroughly describe the input and output files

Note: A accum.xco file will be added to your project in the mac_cgen hierarchy

Adding the CORE Generator Macro into VHDL CodeStep 4

Using the ISE Language Template, instantiate the multiply accumulator macro, accum, into the supplied top-level VHDL file mac_cgen.vhd

Double-click the VHDL file mac_cgen.vhd in the Sources in Project window

Open the Language Template by clicking on icon or select EditLanguage Template

Expand the CoregenVHDL folder,and select the accum template

The template similar to shown below appears:

Figure 12-10. Selecting the accum template.

Using the template, add the component declaration between the architecture and begin statements as indicated in the mac_cgen.vhd file

Using the template, add the instance of the accum in the mac_cgen.vhd file

Change the instance name to U2

Connect the ports of accum to appropriate signals

Check the syntax and correct any errors before proceeding to the next step

Synthesize the Design Using XSTStep 5

Synthesize the mac_cgen.vhd design using Xilinx Synthesis Technology (XST) tool with default options

Remove the my_mac.xco file from the project.

Select the mac_cgen.vhd file in the Sources in Project Window

Run synthesis: Right-click Synthesis in the Processes for Current Source window and select the Run option

If there are any errors, you can View Synthesis Report by expanding Synthesis, right-click and choose the View option

Fix any errors and re-synthesize, otherwise continue on to the next step

Implement the MAC designStep 6

Implement your mac_cgen.vhd design using Xilinx implementation tools and view the Post-place & Route Static Timing Report. Make sure that the settings are as follows

Device Family: Spartan3

Device: xc3s200

Speed Grade: 4

Package: FT256

Right-click Implement Design, and choose the Run option, or double left-click Implement Design

2. Which netlist files do the Xilinx implementation tools use for the accum black box?

View the placed design in the FPGA Editor by selecting View/Edit Routed Design (FPGA Editor) under the Place and Route

Figure 12-11. Opening the FPGA Editor.

Close the FPGA Editor when you are finished

Use the place and route report and Text Based Post Place & Route Static Timing Report files:

3. Fill in the information requested below.

Number of Slices:

Number of Block Multipliers:

Number of Block RAMs:

Number of BUFGMUXs:

Number of external IOBs:

Maximum clock frequency:

Create New Chipscope-Pro ProjectStep 7

Create a new Chipscope Pro project through the Project Navigator.

Select Project  New Source in Project Navigator to open the new source dialogue, click on Chipscope Definition and Connection, and enter the name mac_cs. Click <Next> to continue.

Figure 12-12. Add New Chipscope Source

Select Chipscope Definition and Connection from the list and enter mac_cgen as the file name and click <next>.

Select mac_cgen as the source. Click <next> and then <finish>. A Chipscope Pro source will be added to the Sources in Project window.

Figure 12-13. Chipscope Definition and Connection

ILA Core Parameters and ConnectionsStep 8

Insert an ICON and ILA core into the design netlist using Chipscope-Pro Core Inserter. Connect the output of the accumulator to the trigger and input data ports of the ILA core.

Double-click the mac_cs.cdc file in the sources in project window to open the core inserter project.

Figure 12-14. Chipscope Pro Core Inserter

Projects saved in the Core Inserter hold all relevant information about source files, destination files, destination files, core parameters and core settings.

Click <next>. Leaving the Disable JTAG Clock BUFG Insertion option unchecked, click New ILA Unit. Notice in the left hand window how an instance of the ILA core, U0:ILA, is added to the system.

Figure 12-15. Insert the New ILA Unit

Note: Disabling the JTAG clock BUFG insertion causes the ISE tools to route the JTAG clock using normal routing resources instead of global clock routing resources. This option should only be selected of global routing resources are scarce.

Click <next> to setup the trigger parameters

Each ILA or ILA/ATC core can have up to 16 separate trigger ports that can be setup independently. The individual trigger ports are buses that are made up of individual signals or bits that can range from 1 to 256 bits. Each trigger port can be connected to 1 to 16 match units. A match unit is a comparator that is connected to a trigger port and is used to detect events on that trigger port. The results of one or more match units are combined together to form the overall trigger condition event that is used to control the capturing of data. The different comparisons or match functions that can be performed by the trigger port match units depend on the type of match unit. The ILA and ILA/ITC cores support six types of match units.

Set the following ILA trigger parameters as follows and then click <next>

Trigger Input and Match Unit Settings

Number of trigger ports: 2
TRIG0:

Trigger width: 1

# Match Units: 1

Counter Width: disabled

Match type: extended

TRIG1:

Trigger width: 1

# Match Units: 1

Counter Width: disabled

Match type: extended

Trigger Condition Settings

Enable Trigger Sequencer: checked

Max Number of Sequencer Levels: 2

Storage Qualification Condition Settings

Enable Storage Qualification: unchecked

Figure 12-16. Trigger Parameters

The maximum number of data sample words that the ILA core can store in the sample buffer is called the data depth. The data depth determines the number of data width bits contributed by each block RAM unit used by the ILA unit. The maximum number of data sample words that can be captured depends on the number and size of block RAM, which varies according to device family and density.

Set the following options and click <next>

Data Depth: 512
Sample On: Rising clock edge
Data Same as Trigger Port: unchecked
Data Width: 47

Figure 12-17. Capture Parameters

The net connections tab allows you to choose the signals to connect to the ILA or ILA/ATC core. If trigger is separate from data, then clock, trigger, and data must be specified. Connections that have not been made will appear in red.

Figure 12-18. Net Connections

Click the Modify Connections tab

Figure 12-19. Net Connections

This dialogue provides an easy interface to choose nets to connect to the ILA, ILA/ATC or ATC2 cores. The hierarchical structure of the design can be traversed using the Structure/Nets pane. All the design’s nets of the selected structure hierarchy appear in the table at the lower left pane. The Clock Signals and Trigger/Data Signals tabs illustrate the net connections between the design and the ILA core.

With the Clock Signals tab under Net Selections selected, highlight the entry for clk_int and click the Make Connections button to connect the clock signal in the design to the clock port of the ILA core.

Figure 12-20. Connect the clock

Click the Trigger/Data Signals tab and make the following connections in each of the sub-tabs:

TP0:CH:0  nd_reg
TP1:CH:0  clr_IBUF

Figure 12-21. Trigger Signal Connection

Click the Data Signals tab, make the following connections, and then click <OK>:

CH:0 – CH:11  a_reg<0> - a_reg<11>
CH:12 – CH:19  b_reg<0> - b_reg<7>
CH:20 – CH:46  Q_0_OBUF - Q_26_OBUF

Figure 12-22. Data Signal Connections

You will notice that the Clock and Trigger ports under Net Connections are highlighted in black, indicating valid connections. Click Return to Project Navigator and save the file.

Implement the MAC DesignStep 9

Implement your mac_cgen.vhd design using Xilinx implementation tools to generate a bitstream for downloading to the FPGA.

Right-click Generate Programming File, and choose the Rerun All option.

Figure 12-23. Generate the Programming File

Note: This runs the design through Place and Route. You will notice green check marks (or warning exclamations) next to the processes that have finished successfully. It will also run Post-Place & Route Static Timing and generate the static timing report.

Use the place and route report and Text Based Post Place & Route Static Timing Report files:

4. Fill in the information requested below.

Number of Slices:

Number of Block Multipliers:

Number of Block RAMs:

Number of BUFGMUXs:

Number of external IOBs:

Maximum clock frequency:

Setup Chipscope-Pro Analyzer OptionsStep 10

The Chipscope-Pro Analyzer tool interfaces directly to the ICON, ILA, ILA/ATC, IBA/OPB, IBA/PLB, VIO, and ATC2 cores. You can configure your device, choose triggers, setup the console, and view the results of the capture on the fly. The data views and triggers can be manipulated in many ways, providing an easy and intuitive interface to determine the functionality of the design. Using Analyzer, you will configure the FPGA, specify the match units, and then setup the trigger conditions.

Open Chipscope-Analyzer by going to Start  Programs  Chipscope Pro 6.3  Chipscope Pro Analyzer

Connect the download cable to the PC parallel port and JTAG connection of the Spartan-3 board, and then power up the board.

Click the Open Cable/Search JTAG Chain button

Figure 12-24. Establish JTAG Connection

The Spartan-3 board contains two devices in the JTAG chain: The Spartan-3 XC3S200 and a Platform Flash PROM XCF00S. Impact will detect these devices and list the device names along with Instruction Register (IR) Lengths and Device ID Codes.

Figure 12-25. Impact Detects Devices in JTAG Chain

Click <OK>. Right Click on the Spartan-3 device, indicated as DEV: 0 MyDevice0 (XC3S200) and select configure.

Figure 12-26. Download Program File to FPGA

Click Select New File, browse to the project directory and select the bitstream file mac_cgen.bit.

The Chipscope Pro Analyzer interface consists of four parts:

Project Tree in the upper part of the split pane on the left side of the window
Signal Browser in the lower part of the split pane on the left side of the window
Message pane at the bottom of the window
Main window area

Figure 12-27. Chipscope Pro Analyzer

Each Chipscope Pro ILA, ILA/ATC, and IBA core has its own Trigger setup window, which provides a graphical interface for the user to setup triggers. The trigger mechanism inside each Chipscope Pro core can be modified at run-time without having to recompile the design. There are three components to the trigger mechanism:

Match Functions: Defines the match or comparison value of each match unit
Trigger Conditions: Defines the overall trigger condition based on a binary equation or sequence of one or more match functions
Capture Settings: Defines how many samples to capture, how many capture windows, and the position of the trigger in those windows

In this design, you will setup the triggers to capture 256 samples of both inputs to the multiplier and the output of the accumulator.