Behavioral Requirements | Provided Files | Coding Tips | Ensuring Quality | Submission Requirements
You may work individually or in groups of 2 people to finish this project. I expect that the programming will be your group's effort and not the effort of other persons.
Write a program that will read in MIPS assembly language and write out the corresponding "pseudo-binary" machine language instructions. ("Pseudo-binary" because each line will actually contain 32 '0' or '1' ASCII characters rather than true binary.)
Your assembler should be a two-pass assembler. In the first pass (already
implemented in
pass1.c
),
the program parses each instruction to see if it contains a label at
the beginning of the instruction.
If it does, the program adds the label and the instruction address
to a table. For example, for the following assembly language code
fragment, if the instruction labelled main
were at address 0
then the first pass of the assembler would generate the following internal
label table data structure.
Sample Assembler Input | Label Table after Pass 1 | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
main: lw $a0, 0($t0) begin: addi $t0, $zero, 0 # beginning addi $t1, $zero, 1 loop: slt $t2, $a0, $t1 # top of loop bne $t2, $zero, finish add $t0, $t0, $t1 addi $t1, $t1, 2 j loop # bottom of loop finish: add $v0, $t0, $zero |
|
The second pass, which you will implement, does the actual translation from assembly to machine language. The machine language output should be in the same format as the input for your disassembler program, i.e., each line should contain 32 characters representing the 32 bits of a single machine instruction.
Sample Assembler Input | Sample Assembler Output | |
---|---|---|
main: lw $a0, 0($t0) begin: addi $t0, $zero, 0 # beginning addi $t1, $zero, 1 loop: slt $t2, $a0, $t1 # top of loop bne $t2, $zero, finish add $t0, $t0, $t1 addi $t1, $t1, 2 j loop # bottom of loop finish: add $v0, $t0, $zero |
10001101000001000000000000000000 00100000000010000000000000000000 00100000000010010000000000000001 00000000100010010101000000101010 00010101010000000000000000000011 00000001000010010100000000100000 00100001001010010000000000000010 00001000000000000000000000000011 00000001000000000001000000100000 |
In a general way, your assembler should provide the reverse functionality of your disassembler in the Disassembler Project. It might seem, in fact, that you could use the MIPS assembly language instructions produced by your disassembler as input to the assembler. This is, in fact, almost true. The statements that are exceptions are the beq, bne, j, and jal instructions, which should have labels instead of addresses in the assembly language input. (These sample instructions come from the testfile described in the Coding Tips below.)
Disassembler Output | Assembler Input with Labels instead of Addresses | |
---|---|---|
bne $t2, $zero, 32 | bne $t2, $zero, FINISH | |
j 12 | j LOOP |
The output of your assembler should be a text file containing strings representing MIPS instructions, one per line. Actual MIPS instructions would be stored in 32-bit integers; instead, your file will contain lines of 32 characters ('0' or '1'), where each line represents a machine language MIPS instruction. This is the same format as the input for your disassembler. In fact, you should be able to use the output of your assembler as the input to your disassembler and get back the MIPS assembly language program that you started with, except that the labels and comments in the original assembly language program will have disappeared and labels in instructions will have been replaced by their corresponding addresses in beq/bne/j/jal instructions.
Sample Assembler Input | Sample Assembler Output | Sample Disassembler Output | ||
---|---|---|---|---|
main: lw $a0, 0($t0) begin: addi $t0, $zero, 0 # beginning addi $t1, $zero, 1 loop: slt $t2, $a0, $t1 # top of loop bne $t2, $zero, finish add $t0, $t0, $t1 addi $t1, $t1, 2 j loop # bottom of loop finish: add $v0, $t0, $zero |
10001101000001000000000000000000 00100000000010000000000000000000 00100000000010010000000000000001 00000000100010010101000000101010 00010101010000000000000000000011 00000001000010010100000000100000 00100001001010010000000000000010 00001000000000000000000000000011 00000001000000000001000000100000 |
lw $a0, 0($t0) addi $t0, $zero, 0 addi $t1, $zero, 1 slt $t2, $a0, $t1 bne $t2, $zero, 32 add $t0, $t0, $t1 addi $t1, $t1, 2 j 12 add $v0, $t0, $zero |
Your program should handle all of the instructions and registers in the MIPS Instructions Table and MIPS Registers Table I have provided online. You should be able to handle all 32 registers and all three instruction formats (R, I, and J), including all forms of addressing (for example, lw addresses, beq addresses, and j addresses). You may find Figures 2.1 (p. 78), 2.6 (p. 100), and 2.14 (p. 121) helpful in addition to the table I have provided. [Optional: You may wish to extend your program to handle the additional load and store instructions listed as part of the "Core Instruction Set" on the green MIPS Reference Data card that comes with the book.]
You may assume that every line contains an instruction, i.e., you can increment the program counter for every line. (This is helpful in determining the address associated with labels.)
You should handle all error conditions gracefully. In other words, an error condition should not cause your program to terminate unless it is an error that cannot be recovered from. Otherwise, your program should print a message indicating the type and location of the error and then continue as best it can (at the very least, with the next instruction).
You should develop a test file of your own (or more than one) for testing pass 2. You may also want to test it using your own disassembler.
You should be able to use the Label Table functions
you wrote for the
Label Table Programming
Project,
as well as the Makefile
, header files, print functions,
and process_arguments
function
used in that program. In the Makefile
,
adjust the all:
target to include the assembler
program. If you create new files beyond the
ones already listed in the assembler
section of the
Makefile
, don't forget to add them to the list of dependencies
and to the GCC
compilation action.
I have provided additional code that you may use:
pass1.c
; can be used as a model for the
assembler if you copy it and then uncomment the statements
that rewind the file back to the beginning and call
pass2
. Note, though, that the documentation
at the top of the file describes a test driver for
pass1
, not an assembler program, so that needs
to be updated as well.
pass2:
(in pass2.c)
This function is complete but calls the
functions below, so you will need to edit it if you decide to
replace them with functions with different names or parameters.
getOpCode:
(stub in instructionNames.c)
Designed to return the I- or
J-format opcode for the given instruction name, or 0 if the
instruction is not a valid I- or J-format instruction.
getFunctCode:
(stub in instructionNames.c)
Designed to return the R-format
funct code for the given instruction name, or -1 if the instruction
is not a valid R-format instruction.getRegNbr:
(stub in registerNames.c)
Designed to return the register number for the given register name,
or -1 if the register name is not a valid register.processR:
(stub in pass2.c) Designed to print R-format instructions in
their machine code format using functions in
printAsBinary.c
. processIorJ:
(stub in pass2.c) Designed to print I- and J-format
instructions in their machine code format using functions in
printAsBinary.c
. pass1
and
pass2
for parsing an instruction into individual
tokens (syntactic units):
pass1
, pass2
, and
getNTokens
to read tokens — labels,
instruction names, register names, integer constants, etc.
pass2
.
pass2
.getNTokens
and provides an illustration of
how to use it.Note: Parsing loosely-formatted input to separate it into meaningful syntactic units ("tokens") is a non-trivial task. The standard C string library includes the functionstrtok
to help with this process, but it is not a very easy function to understand and use. ThegetToken
andgetNTokens
functions provide a somewhat simpler interface to thestrtok
function for this project.
printInt:
(stub) Designed to print the binary version of a
value. (The stub version prints the decimal equivalent, which is
useful while developing and debugging the program, so you may want
to keep that stub behavior until everything else is finished. If
you do, you can use `smallSampleTestfile.mips.decimal` to
check your actual results to expected results as you code.)
printReg:
(stub) Designed to find the register number
for the
given register and print its binary value (or decimal value during
debugging) using printInt
.
printSignedIntInString:
Prints the value of the integer
in a string (e.g., "23" or "-4") in binary format (or decimal format
during debugging). Useful for printing the immediate value in many
I-format instructions. Fully implemented.
printUnsignedIntInString:
Prints the value of an
unsigned integer in a string (e.g., "23") in binary format (or
decimal format during debugging). Useful for processing
sll
, srl
, and lui
instructions. Fully implemented.
printJumpTarget:
(stub) Designed to print an address for
J-format instructions.
printBranchOffset:
(stub) Designed to print a branch offset
for beq
and bne
instructions.
smallSampleTestfile.mips
.
smallSampleTestfile.mips
but in decimal form rather
than binary; useful for development testing if you keep
printInt
in its original form until the last step (see
more about printInt
below).
If you set up a directory with the provided files and the appropriate files
from the Label Table project, and if you make a copy of
testPass1.c
and call it assembler.c
, you should be
able to compile and run a starter version of the Assembler program. The
initial output, though, will mostly be error messages and some starter
information for add
and j
instructions since the
most important functions are just stubs. As you implement each function
you should be able to compile and run the program again, seeing incremental
progress as you go (agile software development).
One approach to completing this project would be:
testPass1.c
to
assembler.c
.assembler.c
to rewind the file and call
pass2
. (And update top-of-file comments!)make ./assembler smallSampleTestfile.mips cat smallSampleTestfile.mips.decimal ./assembler smallSampleTestfile.mips 1Notice that the only instruction it recognizes is 'add', and it only partially implements that.
printReg
in printAsBinary.c
.
Compile and run to see what output you get.processR
and complete the
processing of "normal" R-format instructions like add. Compile,
run, and compare to smallSampleTestfile.mips.decimal
.
smallSampleTestfile.mips
(or copy that file and add your own test cases) to test additional
R-format instructions and registers.printBranchOffset
and
printJumpTarget
in printAsBinary.c
when
you get to the branch and jump instructions.printInt
function
to print an integer in binary format, when you are ready to do
that. I recommend keeping the decimal output as a way of verifying
the binary output until the very end. (You can keep it permanently
if you convert it from printf
to
printDebug
.)As specified in the syllabus, your program should adhere to the Kalamazoo College CS Program Style Guide and Documentation Standards, including use of the Braces Line Up style pattern. You may also use the associated template files: the function template file and the header template file.
To ensure that all function calls are syntactically correct (match the function definitions), you should include function declarations for all of your functions in one or more header files, and include the header file(s) in all appropriate C source files (*.c files).
The Makefile
I have provided specifies a set of compiler options
that will help you catch many errors at compile time.
These options generate warnings about questionable constructions
that often indicate programmer confusion or actual logic errors.
You may have to make adjustments to the Makefile
, though,
if the specific options or
option names for your compiler are somewhat different.
When your program is fully implemented, the
smallSampleTestfile.mips
input file should produce output
equivalent to smallSampleTestfile.mips.out
(which is a copy of
the original smallSampleTestfile.input
file provided as part
of the Disassembler project). Note, though, that these two files only test
some cases; they do not provide a thorough test suite.
Since comparing long strings of binary is difficult, you may want to use the Unix/Linuxdiff
command to compare your output againstsmallSampleTestfile.mips.out
. To do this, you would run your program and save your output to a file rather than print it to the screen:./assembler smallSampleTestfile.mips > myOutput diff myOutput smallSampleTestfile.mips.outIf you are running Windows, the two files might have different line endings (carriage return and line feed vs. just line feed), so you may want to run them throughstripCR
first. For example,make stripCR ./stripCR smallSampleTestfile.mips.out > strippedSmallSample.out ./stripCR myOutput > strippedMyOutput.out diff strippedMyOutput.out > strippedSmallSample.out
Your submission should contain;
README.md
file,
man
page, or other help file)
that a new user could use to know how (and why) to use your program.
It should include a description of your program, along with some
sample input and sample output (which need not be the same as your
test file(s), since the point of the sample input/output is
to help with your description), and instructions on how to run the
program. The Program
Style Guide
has a little more information on what should be included in external
documentation.
make clean
in the directory before submitting; this
will remove the machine-specific executable and intermediate "object
code" files, since your code will have to be re-compiled on my machine
anyway.)
The old rubric for grading the Assembler Programming Project was based on the following general categories. The current rubric is similar, but adjusted to fewer points to match the general grading scheme this quarter.
Compiles and runs 10 pts Correctness (satisfies requirements) 70 pts Internal documentation and coding style 10 pts External Documentation 10 pts Test Cases 10 pts Total: 110 pts