#hardware Part of the [[HTB University 2021 Qualifier]] CTF. # Description We have intercepted an encrypted message with critical information, and also managed to recover the machine that is able to decrypt it, with a copy of the source program it should run to decrypt the message. The crazy scientist that built this machine was accidentally killed during the extraction. It's a very elaborate mechanical machine with tons of pipes and valves but we managed to reverse-engineer its logic and build a simulation out of it, but now we need to convert the source of the program into something that the machine is able to understand and execute! The encrypted message is already loaded into the simulation. Files: [hw_mechanical_madness.zip](https://mega.nz/file/i8EHiQwB#NdByR1k44Hij3f4roO8MJsTI6Wnm2yV3K99JHqy0VCQ) # Analysis ## Introduction We are provided with four files: - `cpu.circ` - `example.asm` - `example.data` - `program.asm` `cpu.circ` appears to be an XML file. It has a line at the top saying it should be loaded into a program called Logism-evolution. If we download the program and load the file we see something like this: ![[mechanical_madness_1.png]] It appears to be a circuit diagram containing a control unit, RAM, registers and a terminal. Logism-evolution allows us to populate the RAM and simulate the circuit. ## File Analysis Judging by the challenge description, `example.asm`: ```assembly :start movl ax, 10 :sub1 movl bx, 1 sub bx cmp ax, ax movl bx, :sub1 jnz rst ``` Assembles to `example.data`: ```plain 10000a 100101 10100 130000 100101 e0000 40000 ``` And our goal is to assemble `program.asm` and run it on the circuit. The assembled instructions are at most 21 bits (the size of a RAM word in the simulator). We can populate the simulator RAM, start the simulation and enable clock ticks to run the program. Looking at the state panel we see `AX` count down from 10 to 0 as we would expect from the assembly. We can already begin to deduce things by comparing the assembly to its assembled form. All of the `movl` instructions start with `10`, suggesting this is the opcode. All instructions with `ax` as the first operand have `00` after the opcode and all instructions involving `bx ` have `01` after, suggesting that the instruction has the format: ```plain Bit: 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |------------| |---------------------| |---------------------| Opcode Operand 1 Operand 2 ``` And that `ax` is represented by `00` whilst `bx` is represented by `01`. We can also deduce the opcodes for a few instructions. These observations can help confirm whether we are correct in our circuit analysis. ## Circuit Analysis Let's begin by loading the example program and stepping a few clock cycles. We'll focus on this part of the circuit: ![[mechanical_madness_2_cropped.png]] We can see that the first instruction, `10000a`, is loaded into the `wr` register. `wr` is then split into `rb`, `ra` and `ir`. The splitter has the least significant bit at the top and the most significant bit at the bottom; this means `rb` gets bits 0-7, `ra` gets bits 8-15 and `ir` gets bits `16-20`. This confirms our previous diagram: ```plain Bit: 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |------------| |---------------------| |---------------------| Opcode (ir) Operand 1 (ra) Operand 2 (rb) ``` Now let's work out the translation for each instruction. Some can be deduced by looking at multiplexers: ![[mechanical_madness_3.png]] We can see that `ir` is used to select which output to take. Following the wires: `00` is associated with `add`, `01` is associated with `sub` (matches the example program observations) etc. The other instructions can be deduced by looking at AND gates. For example, take the circuit for `clr`: ![[mechanical_madness_4.png]] The AND gate equates to: ```plain ir[0] AND ir[1] AND NOT ir[2] AND NOT ir[3] AND NOT ir[4] ``` This means it will only get triggered if `ir` is `00011` (binary). So `clr` is associated with `03`. Doing this for every instruction allows us to build a table: ```plain Instructions (binary, decimal, hex, opcode): 00000 00 00: add 00001 01 01: sub 00010 02 02: mul 00011 03 03: clr 00100 04 04: rst 00101 05 05: jmp 00110 06 06: ljmp 00111 07 07: jlp 01000 08 08: jg 01001 09 09: jge 01010 10 0a: jl 01011 11 0b: jle 01100 12 0c: je 01101 13 0d: jz 01110 14 0e: jnz 01111 15 0f: mov 10000 16 10: movl 10001 17 11: call 10010 18 12: ret 10011 19 13: cmp 10100 20 14: push 10101 21 15: pop 10110 22 16: div 10111 23 17: mmiv 11000 24 18: mmov 11001 25 19: 11010 26 1a: msk 11011 27 1b: mskb ``` Now we just need to work out the register translations and we'll have enough information to assemble the target program. Let's look at the `mov` circuit: ![[mechanical_madness_5.png]] The multiplexer selects a register based on the least significant bits from `rb`, allowing us to deduce this table: ```plain Registers (binary, decimal, hex, opcode): 00000 00 00: ax 00001 01 01: bx 00010 02 02: cx 00011 03 03: dx 00100 04 04: co 00101 05 05: cs 00110 06 06: ds ``` # Solution ## Assembling We'll start by adding in our tables and loading the assembly instructions: ```python REGISTERS = [ "ax", "bx", "cx", "dx", "co", "cs", "ds" ] INSTRUCTIONS = [ "add", "sub", "mul", "clr", "rst", "jmp", "ljmp", "jlp", "jg", "jge", "jl", "jle", "je", "jz", "jnz", "mov", "movl", "call", "ret", "cmp", "push", "pop", "div", "mmiv", "mmov", None, "msk", "mskb" ] with open("program.asm") as f: program = [i.strip() for i in f.read().split("\n") if len(i.strip())] ``` We'll perform a first pass where we resolve labels (lines starting with `:`) to addresses: ```python labels = {} pc = 0 for instruction in program: if instruction[0] == ":": labels[instruction] = pc else: pc += 1 ``` Our second pass will decode instructions to `rb`, `ra` and `ir` values. We'll first write a function to decode operands. This is a little difficult as operands can be: - Registers. - Labels or constants, potentially with arithmetic. Registers can be resolved by searching the table, whilst other things can be resolved using a combination of `.replace()` for labels and `eval` for arithmetic: ```python def decode_r(a): if a is None: return 0 for i, v in labels.items(): a = a.replace(i, str(v)) try: return eval(a) # I know it's lazy. except (NameError, SyntaxError, ValueError): if a[0] == ":": raise ValueError(f"""Unknown label "{a}".""") else: try: return REGISTERS.index(a) except ValueError: raise ValueError(f"""Unknown register "{a}".""") ``` Opcodes are easier to resolve; we just look them up in the table: ```python def decode_ir(ir_): try: return INSTRUCTIONS.index(ir_) except ValueError: raise ValueError(f"""Unknown instruction "{ir_}".""") ``` Now we can run through each instruction and decode: ```python decoded = [] for instruction in program: if instruction[0] != ":": instruction = [i.strip(",") for i in instruction.split()] rb = instruction[2] if len(instruction) > 2 else None ra = instruction[1] if len(instruction) > 1 else None ir = instruction[0] rb = decode_r(rb) ra = decode_r(ra) ir = decode_ir(ir) decoded.append((rb, ra, ir)) ``` The final pass is simple, we just assemble the `rb`, `ra` and `ir` values to hex so that they can be pasted into the simulator: ```python for i in decoded: print(f"{i[2]:02x}{i[1]:02x}{i[0]:02x}", end=" ") ``` Interestingly, an error occurs if we run the program: ```plain ValueError: Unknown register "sub5". ``` Line 53 of the assembly seems to omit a colon before the label reference: ```assembly movl bx, sub5 ``` If we add this in the program runs fine and outputs the assembled instructions: ```plain 10020a 030000 100101 100300 170003 10012e 110100 180000 100137... ``` The full program can be found at the bottom of the page. ## Running To run the program we paste it into the simulator RAM and enable clock ticks. We also increase the clock frequency to the maximum so the simulation doesn't take an age. After a few minutes the flag is outputted to the terminal: ![[mechanical_madness_6_cropped.png]] # Full Script ```python # Program forgot a : before a label. REGISTERS = [ "ax", "bx", "cx", "dx", "co", "cs", "ds" ] INSTRUCTIONS = [ "add", "sub", "mul", "clr", "rst", "jmp", "ljmp", "jlp", "jg", "jge", "jl", "jle", "je", "jz", "jnz", "mov", "movl", "call", "ret", "cmp", "push", "pop", "div", "mmiv", "mmov", None, "msk", "mskb" ] with open("program.asm") as f: program = [i.strip() for i in f.read().split("\n") if len(i.strip())] # First pass: Resolve labels to addresses. labels = {} pc = 0 for instruction in program: if instruction[0] == ":": labels[instruction] = pc else: pc += 1 # Second pass: Decode instructions to rb, ra, ir. def decode_r(a): if a is None: return 0 for i, v in labels.items(): a = a.replace(i, str(v)) try: return eval(a) # I know it's lazy. except (NameError, SyntaxError, ValueError): if a[0] == ":": raise ValueError(f"""Unknown label "{a}".""") else: try: return REGISTERS.index(a) except ValueError: raise ValueError(f"""Unknown register "{a}".""") def decode_ir(ir_): try: return INSTRUCTIONS.index(ir_) except ValueError: raise ValueError(f"""Unknown instruction "{ir_}".""") decoded = [] for instruction in program: if instruction[0] != ":": instruction = [i.strip(",") for i in instruction.split()] rb = instruction[2] if len(instruction) > 2 else None ra = instruction[1] if len(instruction) > 1 else None ir = instruction[0] rb = decode_r(rb) ra = decode_r(ra) ir = decode_ir(ir) decoded.append((rb, ra, ir)) # Third pass: Assemble to hex. for i in decoded: print(f"{i[2]:02x}{i[1]:02x}{i[0]:02x}", end=" ") ```