• Welcome to Final Fantasy Hacktics. Please login or sign up.
 
March 28, 2024, 07:45:12 am

News:

Use of ePSXe before 2.0 is highly discouraged. Mednafen, RetroArch, and Duckstation are recommended for playing/testing, pSX is recommended for debugging.


(App) LEDecoder: Find ASM within binary files (little endian) (v5 Update: 1/21)

Started by Glain, September 17, 2011, 10:15:00 pm

Glain

There are a lot of program files in FFT. Why is anyone's guess, but for a long time I've thought it would be really useful to be able to scan various files for program code (ASM). The problem is, the MIPS disassemblers you find online assume big endian byte ordering.

Enter LEDecoder, my newest program, that will decode files with a little endian option. You just select the input/output files and have it do its thing. This'll let us find the ASM in any FFT file. The output is very similar to what you'd see in the disassembly portion of the pSX debugger.

LEDecoder basically means 'little endian decoder', but of course, I couldn't call it LEDecoder without including an LED. A software LED, perhaps. ...Well, okay, maybe it's just a filled in circle. But anyway, it indicates the status of the program by its color. Most of the time it'll go orange and then green quite quickly when it's told to process. It's fine to run another decode as long as the LED isn't orange.

    Blue = Ready (Program start)
    Red = Decode failed (Usually if the input file/path is invalid)
    Green = Decode succeeded!
    Orange = Processing

So, what would be an interesting target file to run the decoder on? Say... perhaps an effect (animation) file?
...Yup, they've got ASM in them. Might explain why some work one way and others work another regardless of the graphics/palette/what have you!

EDIT: Version 2 added to fix some problems with the decoder... It didn't recognize srav, jalr, or break. This uses the MassHexASM decoder, so anytime I update that, I have to update this, basically. New version is attached.

EDIT: Version 3 added:
     * Fixed a problem where the program counter was incorrect, resulting in incorrect hex for branch instructions (thanks fdc)
     * Used a save file dialog for the output file, so now you should be able to just type in output filenames.

EDIT: Version 4 added:
     * When decoding, immediates will now properly show as signed or unsigned based on the instruction. These are the affected commands:
          Unsigned: andi, ori, xori, sltiu
          Signed: addi, addiu, slti

EDIT: Version 5 added: Check the MassHexASM thread's original post for details.  This now uses the same engine as MassHexASM.  (A shared DLL, not just copied functions between binaries!)

EDIT 1/11/2017: LEDecoder has now been rolled into MassHexASM.  The newest version is available in that thread.
  • Modding version: Other/Unknown

formerdeathcorps

November 02, 2011, 02:32:53 am #1 Last Edit: November 02, 2011, 03:20:38 am by formerdeathcorps
This tool would be slightly more convenient if when you type in the name of a file that doesn't yet exist as your output file, the program automatically makes it in .txt format.

Also, I'm still noticing a good number of unknown commands in the middle of an ASM routine (I'm looking at WORLD.BIN).  A lot of them involve ??????FA and 0D??????, implying there's probably some opcode for some command you haven't covered.

EDIT: 0D?????? is BREAK.  Apparently this is leftover debugging code Square never managed to delete?
EDIT2: You don't seem to be able to read jalr commands, even though they are occasionally used by Square to read address tables.  Similarly, sllv and srav commands aren't useless; they're used to read bytes where each bit = a flag.
EDIT3: ??????FA is an illegal command.  The fact it's in the middle of what seems to be ASM implies that's probably part of a scripting language.  The areas I find it in are otherwise too regular to be a data table or a representation of Kanji brushstrokes.

However, this tool is still fairly great.  It saves me the need of having my pSXfin debugger generate the same information.
The destruction of the will is the rape of the mind.
The dogmas of every era are nothing but the fantasies of those in power; their dreams are our waking nightmares.

Glain

I updated the decoder to recognize jalr, srav, and break. Basically, the same changes as MassHexASM's decoder, as they use the same core function for that.
  • Modding version: Other/Unknown

formerdeathcorps

Noticed another problem that only pertains to LEDecoder (MassHex does not have this error).  It's misreading all the branch offsets by one.  For example,

001148e8: 10800003 beq r4,r0,0x001148fc

is what's printed in a section of BATTLE.BIN.  However, you'll notice that this is (0300-08-10), where 10 is the OPCODE (beq), 08 are the registers used (r4/r0), and 03 is the distance of jump (+3 starting from the next address).  However, that means the branch should be to 0x1148E8 + 4 + 3 * 4 = 0x1148F8.
The destruction of the will is the rape of the mind.
The dogmas of every era are nothing but the fantasies of those in power; their dreams are our waking nightmares.

Glain

I believe I've got a fix for this (Discussed this on IRC with fdc). Strange logic for maintaining the program counter. Uploading a new version with the fix.
  • Modding version: Other/Unknown

formerdeathcorps

I'm actually using your tool not as a means to detect ASM but to compile readable ASM instructions for further analysis.  Thus, I'd like to request the following features:

1) An update of instructions decoded include MIPS II/III/IV instructions, in particular, sign extension and Co-Processor I floating point commands to assist in PSP hacking.
2) The ability to specify the starting offset of a file to remove the need to constantly do RAM offset conversions to find the correct spot in a decompiled file.
The destruction of the will is the rape of the mind.
The dogmas of every era are nothing but the fantasies of those in power; their dreams are our waking nightmares.

Glain

I believe I could add in features like this, but I have a few questions:

Sign extension is something that usually happens at the end of an instruction (e.g. lb is lbu with sign extension at the end); I don't know of any instructions that just do sign-extension and nothing else. Which ones do you mean?

From what I understand, coprocessor calls are very generic because the coprocessors were configurable, so the calls themselves are probably little more than "cop1 (argument)".  I could add a checkbox to the form to disassemble cop1 calls to PSP FPU instructions, but I don't think I can make that the default option, because then the tool would only be correct for the PSP.

For starting offsets, would it make sense to also map to another destination offset? e.g., to start disassembling the file at 0x10000 and map the addresses to 0x78000 (to correspond to a RAM offset, perhaps?)
  • Modding version: Other/Unknown

formerdeathcorps

Quote
Sign extension is something that usually happens at the end of an instruction (e.g. lb is lbu with sign extension at the end); I don't know of any instructions that just do sign-extension and nothing else. Which ones do you mean?

http://personal.denison.edu/~bressoud/cs281-s10/Supplements/ISA_Vol_2.pdf
I was being unclear; I mean the commands labelled under Special3 in this file: the ones labelled EXT (Extend Bit Field), INS (Insert Bit Field), and SEB/H (Sign Extend Byte/Halfword).

Quote
From what I understand, coprocessor calls are very generic because the coprocessors were configurable, so the calls themselves are probably little more than "cop1 (argument)".  I could add a checkbox to the form to disassemble cop1 calls to PSP FPU instructions, but I don't think I can make that the default option, because then the tool would only be correct for the PSP.

Point taken.  I only wanted the floating point and possibly CACHE command calls.

Quote
For starting offsets, would it make sense to also map to another destination offset? e.g., to start disassembling the file at 0x10000 and map the addresses to 0x78000 (to correspond to a RAM offset, perhaps?)

For FFT, this is unnecessary, but in general, for other games that map sections of a file (in the ROM) dynamically into the RAM, that may be useful.

http://math-atlas.sourceforge.net/devel/assembly/mips-iv.pdf
This is another site that contains the opcodes for all the instructions the previous link does not have.
The destruction of the will is the rape of the mind.
The dogmas of every era are nothing but the fantasies of those in power; their dreams are our waking nightmares.

Glain

It looks like coprocessor 1 was always the FPU, so I can just always assume those are FPU calls... no need for a checkbox.
  • Modding version: Other/Unknown

Glain

So I've succeeded in adding about a million instructions to MassHex.  Haven't reached LEDecoder yet; it's the next stop, but I should just be copying what I have over there.  Since I've added so many commands, with their own different rules, the encoding/decoding programs have become a bit messier than I'd like, so I'm trying to clean it up a bit at the same time I add support for more commands.

I haven't even added any cache instructions yet though.  Are you talking about the prefetch ones, like PREF, PREFX, etc?

I think we're jumping straight from MIPS I (PSX) to MIPS32 Release 2 (PSP) (The biggest chasm possible :)). Just what CPU did they put in the PSP?  Some of the documentation says it's based on the R4000 like the N64, but that wasn't even around when some of these instructions were created!  It's like they created an entirely new CPU based on the R4000 but with all the newest instructions to make sure there would be 1290319345 more than the PSX had.  Even with the FPU and sign-extension stuff, there are still a bunch of commands in the documentation we'll be missing.  I think we may just have to give another try at decoding BOOT.BIN and see if we've hit most of what's there!

There's some kind of weird ambiguity with "mul".  It was a pseudoinstruction that did mult and then mflo (PSX), but it looks like the PSP just has it as an instruction with an entirely different encoding! (:D)  That's going to make encoding/decoding it... weird.  Not sure if the PSP BOOT.BIN makes use of this new "mul", but if so, there may be some ambiguity with it.
  • Modding version: Other/Unknown

Glain

By the way, I'm now using this as a reference.  I'll probably just add all the ones listed in there.  I'm seeing some commands that even some of the ISA docs don't seem to have, and some of the commands they do have use different formats.

The PSP code doesn't seem to have the load delay problem (might be more efficient, i.e. less cycles, if you avoid subsequent load/access, but works either way) or the mflo/mult restriction that the PSX has.  From what I'm seeing, boot.bin seems to be mostly one large code segment and one large data segment.

The PSP also may not have that "mul" instruction that I was seeing a reference to, but not totally sure on that...
  • Modding version: Other/Unknown

Glain

I think MassHex and LEDecoder will probably need a toggle between different modes: PSX, PSP, and maybe a generic MIPS mode.  FFTorgASM (mode="ASM" blocks) will just always be in PSX mode, at least for the time being.

There is some overlap between PSX and PSP commands (cop2), the PSP has some exclusive commands (bitrev, min, max), and the PSP uses the SPECIAL2 (011100) opcode in a non-standard way, and shoehorns the commands normally using that opcode into SPECIAL (000000). There are also some differences of naming based on the two different coprocessors in PSX vs PSP, e.g. lwc2 and lv.s.

Basically, there are enough differences that just adding in all the PSP commands makes it start to feel pretty PSP-specific and could conflict with the PSX in a few places, so we'll probably need a way to toggle the mode.

At this point, I have no unknowns in the code of EBOOT.BIN as far as I'm aware (\o/).  It is now possible to get an "illegal" from the decoder too, though you have to mess with arguments to get impossible values (ins/ext). 

I may take this opportunity to also try to put in the PSX cop2 GTE commands, depending on how much I find on how they're encoded.  The old decoder just had them as unknowns, whereas at least now I've got the generic load/store/move cop2 instructions and for most GTE calls you'd see "cop2 (huge hex value)" instead of "unknown".
  • Modding version: Other/Unknown

Glain

Version 5 finally added, corresponding to v11 of MassHexASM.  Changelog is in the MassHexASM thread.  There are a LOT more commands supported.  This should be able to decode the PSP BOOT.BIN with no unknowns as well as finally getting all of SCUS, BATTLE.BIN, etc.
  • Modding version: Other/Unknown