Good to see this thread. I agree with fdc in that we need to rewrite previous patches that don't account for load delay, as they would have to be considered unstable.
Just to be a bit more clear and provide some examples on what we need to do here...
Anytime you use a load command (lw,lh,lb,lhu,lbu,etc) that loads data from memory, the next line is the load delay slot for that command and the value being loaded is not ready yet. The register's value is not ready to be used until the line after the load delay slot (the second line after the load command). Basically, you have to worry about the load delay slot when you use commands that start with l, with the exception of lui, because that particular command doesn't load anything from memory.
As fdc pointed out, you can usually just place a nop command between the load command and where the value is used. However, if you can rearrange code in another way to avoid using values before they're ready, that may be a better solution. As an example, here's an attempt to take the value in register r8, add two values from memory to it, and save it to another location.
Wrong version
lbu r2,0x0022(r4) # [Load byte] : r2 = [First value from memory]
addu r8,r8,r2 # r8 = r8 + r2 (r2's delay slot : Attempt to use r2 here is invalid!)
lbu r3,0x0023(r4) # [Load byte] : r3 = [Second value from memory]
addu r8,r8,r3 # r8 = r8 + r3 (r3's delay slot : Attempt to use r3 here is invalid!)
sb r8,0x0024(r4) # Save r8 (the sum) to memory
Correct version
lbu r2,0x0022(r4) # [Load byte] : r2 = [First value from memory]
lbu r3,0x0023(r4) # [Load byte] : r3 = [Second value from memory] (r2's delay slot)
addu r8,r8,r2 # r8 = r8 + r2 (r3's delay slot, but it's fine to use r2 here)
addu r8,r8,r3 # r8 = r8 + r3 (It's fine to use r3 here)
sb r8,0x0024(r4) # Save r8 (the sum) to memory
There's nothing wrong with having consecutive load statements (subroutines do this all the time when they load values off the stack), as long as you use different destination registers (why wouldn't you?). You just have to make sure you don't use a register if it is in its own delay slot.