Sounds like you've got a good handle on this stuff. I don't have the ISO 9141-2 spec, so I'm just going by what I've seen, but it seems like there has to be something missing because ISO 14230 claims to be backwards compatible to ISO 9141. In other words, since ISO 14230 sends a message right after 0x33, it would cause the ECU to not be able to respond with its ISO 9141 response...
My comment about reading was assuming that the ECU can talk without a request from you if it's talking on the same bus to another device. And reads are harder to determine your baud rate (you can't assume it will be the same as the Tx baud rate even with similar code, and you can't throw it on a scope easily). After multiple bytes, if you aren't constantly re-framing the data (which you aren't because you don't appear to be edge-sensitive), you might get a misread too even if you're only 20Hz off of nominal, depending on when you're sampling...
In any case, it still seems very dangerous to try to implement a serial protocol via CPU. You can maybe get away with sending RS232 @ 9600 baud to debug elsewhere, since the PC will do better at re-framing the data you're outputting than your code will. (It uses dedicated HW to do that.)
Once I get my HW, I'll throw it together and I can do some debug with some real data coming back and forth. I'd say the first question we should answer is, if you send 0x33 properly, wait and see if the ECU sends anything. If it doesn't, your assumption of "it should be sending something" is flawed, and we can go from there. If it does, then it's a matter of figuring out why it isn't being seen. Both should be fairly easy to get to the bottom of
I've got some little USB scope at work that I should be able to borrow and probe around with in my car, so I'll get some data when all this stuff comes together.
EDIT: Okay, I see now about the 0x33 -> 0x55 handshake. I found an article describing it as well. So, my only concern now is that delay(200) followed by a delay(60) -- you already wait 300ms in iso_read_byte() so I don't think you need to delay(60). I'm worried that you might be missing the 0x55, simply since it'll be sent within 1ms (at 10.4kbps).
Again, I'd still say that the safest thing is to use the UART. It has a buffer so that you're guaranteed not to lose data (so long as the data isn't > 128 bytes, but no ISO 9141 message is that long anyway). If you need to print debug info to serial, you can emulate a serial port on a GPIO without worrying about time-sensitivity (if it breaks occasionally, oh well, debug isn't going to be a permanent part of the design).
In any case, I'm really excited to try this out, and many thanks for the work you've done on it thus far
I was going to try to implement this sort of thing in an 8052, but I think the ATMega is the better choice.