This page was last updated on 09/22/2001
Leaky pipes: When Named Pipes are the wrong tool for the job
Rev 2, 07/12/2001
Using named pipes to carry serial I/O data between two separate applications seems to be a bad idea, at least as Windows NT is currently configured and as long as the average computer system does not have two or more processors. Here's why:
First a few factoids about the world of Windows NT as it applies to the Virtual H-8. (These are all things I either knew but didn't think enough about while doing the initial design for the Virtual H-8 project thirty months ago, or things I didn't know then but have since learned and didn't consider how they applied to the project before the latest catastrophe slapped me in the face):
Mea culpa! But in my defense, if I'd had to re-invent the whole IPC layer two months ago while I was trying to get the tape I/O simulation grafted into the H-8, I probably wouldn't even know about the design mistakes that I'm now going to have to correct because I would have had to key test routines and benchmark programs into the simulator by hand and things wouldn't be working even this poorly right now.
At least I now can decide what will work and what won't. And I've decided that the named pipe IPC layer will be fine for low-bandwidth applications such as the Tape Cassette GUI (and, presumably, the H-17 GUI as well) need. But the overhead of task switching between applications and passing of data from reader thread to CPU thread via GUI thread is just unacceptable. So it looks like the only way to get console I/O to operate at reasonable speed will be to "bolt the H-19 onto the side of the H-8" by making the two GUIs different views managed by the same application.
My GUI re-design has yeilded a single, scrolling view which shows both the H-8 and the H-19 as if they were side-by-side. There are other options that produce a more compact arrangement (which is less likely to need to be scrolled) as well. The performance problem is gone and overall the package is easier to operate as well.
Einstein and the Virtual H-8
Rev 1, 07/07/2001
After investing most of a week in an attempt to improve the performance of the Z80 emulation, I'm still unable to achieve an instructions per second (ips) rate as high as the actual Z80 running at 2 MHz can do. Right now, the best my code can manage is around 230,000 instructions per second; the real Z80 does around 300,000. So I have three options:
In the case of the Virtual H-8, if I can't get the processor to do 300,000 ips, well then by gum I'll just re-define the second! This requires only a couple of minor bits of surgery on the code. First, I need to ensure that the processor can only execute a specific number of instructions in one "tau second". For this, I borrowed a technique that Dave Shaw (Project8080) used to ensure that his emulation wasn't running too fast: I insert a counter that will prevent the processor from performing more that N instructions and I reset the counter when a specific number of "tau milliseconds" have elapsed.
In my case, I've tuned the emulation to do 590 instructions in two "tau milliseconds", using the same timer as will trigger the 2ms PAM interrupt to reset the "quota". That keeps my Z80 running at 295,000 instructions per "tau second", no matter what. This makes a benchmark program run at the same time (in "tau seconds") on my virtual H-8 and (in "real seconds") on my actual H-8. The benchmark might not be quite the same as real code, so I will probably need to re-adjust the "quota" at some point. But I know that with a "tau second" being twice as long as a real one, I can go to around 660 for this value and still be stopping the Z80 periodically (at the moment, at least), so I have a bit of "head room". (Dave has his "rev limiter" set to what would correspond to 620 in my code.)
The other bit of surgery is to adjust the real-time process. Normally, this is a one-millisecond periodically signalled event which drives the real-time scheduler so that all simulated mechanisms that require a fixed time period (the 2ms INT10 pulse, UART baud rates and -- when I get to it -- the rotation of the H-17's diskettes) can happen at a constant rate. What I did here, was simply to change the API call that makes the interrupt happen:
nTickId = timeSetEvent(1, 0, (LPTIMECALLBACK) hTick, 0, TIME_PERIODIC | TIME_CALLBACK_EVENT_PULSE);such that instead of happening once a millisecond (i.e.: in 1:1 relation to the time of the observer), I could adjust the time rate by a #define-d constant I called TAU. So the API call now reads:
nTickId = timeSetEvent(TAU, 0, (LPTIMECALLBACK) hTick, 0, TIME_PERIODIC | TIME_CALLBACK_EVENT_PULSE);
Now, I just define TAU to be whatever time rate I want:
Why the title for this section? Well, this adjustment to time rate is like relativity. Time is at a different rate in the reference frame of the Virtual H-8 and the reference frame of the observer, but in neither frame can you perform an experiment to tell which is the "correct" rate. With a TAU of 2, the Virtual H-8 is performing the same as a real H-8 moving at 70% of light-speed with respect to the observer.
Hmm... Maybe I should Doppler-shift the beep sound to 2,000 Hz and make the display blue!
The Spittin' Image: Diskette Media Simulation
Rev 1, 06/16/2001
The simulated H-17 is going to require media to munch on. Since nothing is real here in Virtual Space, that means I have to have some way to represent and manipulate H-17 diskettes as objects in the simulation space. One obvious answer would be to have the H-17 simulator intercept I/O to the drive and substitute the equivelent I/O to the 3-1/4" diskette of the PC. I choose not to contemplate what that would be like! Not only is the media larger, each sector is larger as well, there are more tracks and more sides. And there are fewer holes: what would I do to simulate the sector pulses? Plus, the encoding of the data and sector marks is different.
Instead, I'm planning reading and writing to "media image files" -- simply files on the PC that each represent an entire diskette. This has a couple of advantages in that the file I/O is handled by the C runtime library, so I don't need to worry about it; the I/O to the hard drive is much faster than I/O to a floppy; I can make backup copies of the images; I can organize the images into directory structure and I can consider interchanging images with other users of the simulator. The last item is why Dave Shaw and I jointly decided to implement to the same scheme: we should be able to interchange the image files between our two simulators.
The image file format we came up with differs from the actual H17 diskette in two ways: first, the file consists simply of the data from each sector, expressed as an octal dump of the sector data -- we don't include the sector header information, it's easier to synthesize that when the file is being used; second, we prefix the file with a "header" that provides a media description and we allow for comments in the file, some of which may be parsed by a media-manipulation GUI.
The media file spec is as follows:
The file contains only printable characters, CR and LF.
The file begins with a header record that encodes 16 data bytes. The header can be formed with the following printf() statements:
printf("377 300 %03d %03d %03d %03d %03d ",
Octal(nProtection), // 0, 1 or 2
Octal(nVolumeNumber), // (1 through 255)
Octal(SIDES), // 1 or 2
Octal(TRACKS), // tracks per side
Octal(SECTORS)); // sectors per track
printf("000 000 000 000 000 000 000 000 000\n");
The "377 300" sequence marks this as a diskette image file. Octal() is simply a function to translate a decimal value into something which will look like the octal representation of that value when run through printf(). The Protection code is used to describe the write-protect notch on the media: 0 means that the diskette is not write-protected at the moment. 1 means that the disk is write-protected but that the write-protection can be reversed (by a media manipulation GUI operation). 2 means that the media is write-protected and the protection cannot be reversed. (This code is like a 5-1/4" diskette with no notch, which was sometimes done on distribution disks). This particular example includes spaces between the three-digit octal byte codings and has a line ending (of however the runtime and operating system interprets '\n'). The image reader must be able to cope with files that do or do not include white space.
Comments begin with semicolon. All text after the semicolon until the line ending (which is required) is commentary. Comments may occur anywhere after the sixteen-byte header.
The data representing the contents of the diskette must be written in sequence from side zero, track zero, sector zero onward. Data representation consists of a block of octal values, always expressed as three-digit numbers via the equivelent of:
printf("%03d", Octal(value));
The data may be divided into groups, formatted for readability and comments may be inserted. Or the data may consist entirely of three-digit octal numbers concatenated. The reader must be prepared to cope with the presence or abscence of comments, spaces and line endings in sector data.
Line endings can be either carriage return (CR), line feed (LF), CR and LF or LF and CR. The image reader should be able to cope with any of these alternatives.
The image writer should format data into lines consisting of sixteen three-digit byte encodings
separated by a space and ending with a line ending:
"010 201 000 000 222 317 274 112 211 072 313 222 000 000 102 010\n"
The header should be followed by a comment (or at least a blank line) and then the first physical sector as a continuous group of sixeen-number lines. Each subsequent sector should be a contiguous group, also preceeded by a comment (or blank line).
GUI-parsed comments may be defined. They should follow the syntax:
; keyword : value
The media-manipulation GUI is free to add, remove or ignore such comments.
That's the spec. We may change it slightly as we get deeper into development. (For example, I'm about to propose adding "bytes per sector" to the header; I'm not sure that the Heathkit media always used 256 bytes to the sector.) My GUI design defines a couple of comments and my dump routine imposes a couple of platform-specific and formatting issues:
I decided that we might need to distinguish between what is coded in the volume label and what
might have been written on the outside of the diskette. So I've defined a description comment.
The synatx of this is:
;Description: text
There may be as many such description comments as you need to encode the entire description,
but the text of any one description comment must be 40 characters or less and all the descriptions
must be together and in order (the first thing encountered after a description comment that the
GUI doesn't recognize as also a description comment will end the description).
Description comments must come immediately after the header record.
My dump utility inserts a comment before each sector. The format of this comment is:
;Data: Side x, Track tt, Sector ss
This currently serves as a human-readable identifier to mark the start of each sector,
but it could also be useful for seek operations if the GUI or some other application
needed to transcribe information from H-8 format into a native file for the platform and back.
The dump routine always codes data in groups of 16 three-digit numbers separated by a space. The line ending is CR, LF.
Pipe Dreams -- IPC, Windows Style
Rev 2, 06/16/2001
With the advent of the cassette tape I/O capability, I decided it was time to prototye a feature that would need to be used both for the H-17 and even more extensively for the H-19 simulation. This feature is inter-application communication via named pipes. Inter-application communication (also called Inter-Process Communication or IPC) was going to be critical to Plase II of the Virtual H-8 project.
The concept of named pipes is simple enough: you open a pipe object with a name that all your applications are pre-arranged to know about. Then you send and receive data on the pipe as if it were a serial port.
But I've discovered a problem with using named pipes: The Windows API call for CreateNamedPipe is not compatible with the Windows 9x family of operating systems. (Whether this includes Windows ME or not is yet to be determined.) So the use of this technology restricts the Virtual H-8 project to Windows NT and Windows 2000 for the moment. (I'm expecting that Windows XP, when it arrives this Fall, will include support for named pipes and also that Microsoft will market XP as an upgrade path for Win ME, which should make the incompatibility less painful.)
The execution is less simple. For my project, I decided that rather than try to get one pipe to serve for both sending and receiving data, I'd dedicate two pipes and not worry about reversing direction of data flow. So IPC in the Virtual H-8 environment involves pair of pipes, having the same base name but the suffix "RX" or "TX", depending on the data flow direction. The other interesting wrinkle with the IPC is there's no way to predict when a message will arrive. So in order to free up the PC to do useful work, we have to use asynchronous I/O operations on the receiving pipe and spend a thread to watch for the traffic. Details! Details!
Then there's the data itself: I decided to encode all messages as a numeric command code followed by optional arguments, separated by spaces. This got into all kinds of commonly-found issues such as what to do about spaces in the argument (which I solved by delimiting arguments containing spaces with surrounding quotes... that led to what to do about quote characters in arguments... Anyway, I took the usual backslash escape and surrounding quotes route.) and how to determine when the argument list was exhausted. The gory details of message formatting and parsing are all in MSGS.H and MSGS.CPP, as are the message command code definitions and many other useful facts about the square of the hypotenuse. (Sorry, that just slipped out.)
Anyway, after a brief but violent struggle with some access fault bugs, the package works. For now, all we do is let a "satellite" application provide file open dialogs for getting the paths to simulated tape files, which it then sends back to the H8 emulator so that PAM-37 can do I/O to tape on demand and have pathnames for the files. I suspect the H-17 emulation won't need much more. The H-19, though looks like another matter entirely! Expect to see more about named pipes in a future edition when I get that part of the project working...
Back to Top
"The time has come," the Wallace said.
Rev 1, 05/05/2001
The biggest challenge so far has been the Microsoft documentation for Windows. I've spent several weeks trying to "tune" the emulator to balance the time needed to count in a tight loop for two milliseconds with the CPU requirements of Z80 instruction set emulation. This was because it did not appear that there was any way to get Windows NT to do timing on the order of less than 16 milliseconds. The best performance I could achieve was about 1/3 as fast as an actual H-8. And that was with the real-time interrupt happening about 1/2 as often as it should have.
Thankfully, there turns out to be a way to make Windows NT create an interrupt as
often as one thousand times per second. The API call was buried in the multimedia
library:
With that little discovery, I've succeeded in (a) getting the tick to happen at the right frequency regardless of what PC platform I use and (b) freeing up enough of the processor so that the Z80 code runs at full speed even on a Pentium II/233 system.
Back to Top
CPU / IO Interface Emulation Design
Rev 4, 04/28/2001
The following narrative describes how the Processor object, the CPU Thread, the Realtime Tick Thread, and the Bus object interact to do emulated I/O and interrupt cycles. Essentially, what is being emulated here is the address, data and control signal interactions between the microprocessor and the rest of the system.
Architecture:
Two threads are involved, the CPU thread and the Timer thread. These two threads communicate with a shared data structure called the Bus Object. The Bus Object is protected by a critical section, so only one thread accesses it at a time.
The Processor Object is a subroutine of the CPU thread. The CPU thread calls the processor object when an instruction execution is to be performed or when a prior bus cycle that has prevented the completion of an instruction has been satisfied. The Processor Object owns a data structure which is copied to or from the Bus Object, as necessary, by the CPU thread. This structure is not protected since the CPU thread is the thread that runs the processor object.
The processor object emulates the microprocessor's internal instruction set and registers, but uses a subroutine (called Fetch) to get the opcode and any subsequent bytes of the instruction. This subroutine is "smart" enough to know whether the requested byte should come from the emulated memory space or is from an interrupt response cycle.
Bus Cycles:
1. Instruction Fetch and Execution of Register or Memory Operation
This is the simplest case of a bus cycle, since it involves only the interaction between the CPU Thread and the Processor object. This cycle involves the fetch and execute of an instruction which does not affect the processor's interrupt or run state and does not perform I/O. For this cycle, the CPU_Thread calls UpdateProcesorFromBus but there is no pending interrupt or completed I/O Read to perform. The Processor::Execute function asserts its M1 signal and executes the instruction, returning a status of zero (normal). The CPU_Thread code then calls UpdateBusFromProcessor to make the bus state reflect the current processor state. (The UI is sensitive to the M1 and Interrupt Enable status that is captured by this call -- the bits determine the state of the ION and RUN indicators.
2. Instruction Fetch and Exectuion of HALT, OUT or DI Operation
This is the next simplest case, in that it involves a significant change in processor state which must be then handled. In the case of HALT, the processor's Run bit is turned off. This causes subsequent calls to the processor object to return without asserting M1. The processor will remain in this condition until there is either an interrupt (assuming interrupts are still enabled) or until the processor is reset. Executing DI turns off the processor's interrupt enable bit, making it unresponsive to interrupts until an EI is executed. The OUT operation causes the processor to write information to the processor's address and data variables and to assert IOWrite. UpdateBusFromProcessor recognizes this and does the IO Write operation using the address to determine what hardware port is being written to and using the data to set new values to that port. The IOWrite function decodes the address and modifies the state of the simulated hardware devices, based on the data. (There are some other details involved in coping with the Z80's extra output instructions, but this code isn't solidified at this time.)
3. Instruction Fetch and Execution of IO Read Operation
The processor sets the port number on its address and assert IORead and a flag bit needed by the CPU Thread, PendingRead. The processor then returns to the CPU thread with status of zero. The CPU thread's UpdateBusFromProcessor detects that m_bNewBusCycle is asserted and that an IO Read operation was requested, The IORead function is called. IORead decodes the address and collects data from the specified port. It then asserts NewInput on the bus and returns to the CPU thread. The CPU thread begins another cycle: it calls UpdateProcessorFromBus to copy the bus data to the processor and then executes the processor routine. Since this is the response cycle for a read operation, the bus object's NewInput flag is asserted, so the processor's BusReadData value is updated from the bus. The NewInput flag is then deasserted and control is transferred to the processor routine. The processor routine, before fetching another instruction, detects that PendingRead is set and stores the BusReadDataValue. It then deasserts PendingRead and IORead and goes on to fetch the next instruction. (There are some other details involved in coping with the Z80's extra input instructions, but this code isn't solidified at this time.)
There was be a bug here in the Version 1.0 code, in that because of the order of bus cycle procession in Processor::Execute for that version, an interrupt request would be recognized before the IORead data was accepted. This affected the LOAD Tape operation, causing spurious or missing data on reads when the 2ms tick fired. For version 1.1, the read cycle data is captured before the interrupt request is handled. This seems to have fixed the problem. A similar problem with the interaction between interrupt and OUT operations caused extra copies of some bytes to be written during the DUMP Tape operation. It was also corrected by code changes to the Processor object in version 1.1.
4. Instruction Fetch and Execution of an Interrupt Enable Operation
When the processor executes an EI (enable interrupts) instruction, it asserts a PendingInterruptEnable flag. This flag is not visible to UpdateBusFromProcessor. During the next subsequent call to Processor::Execute, if PendingInterruptEnable is set, the InterruptsEnabled flag is asserted and PendingInterruptEnable is deasserted before the instruction is executed. InterruptsEnabled is the bit that UpdateBusFromProcessor sees. Thus, the state of the interrupt enable bit on the bus is asserted just prior to the execution of the instruction two operations (let N be the EI operation, N+2 is the one I'm talking about) after the EI instruction.
5. Interrupt Request Cycle (e.g.: 2 ms Tick Interrupt)
When an interrupt requestor process (which most likely has to be a different thread -- the only example in the current code is the 2 msec timer interrupt, which is) is ready to interrupt, it asserts one of the Bus object's INTxx flags. If any of these flags are set when the CPU Thread next calls UpdateProcessorFromBus, the InterruptRequest flag will be set. When the processor responds to the asserted, IntReq signal while InterruptsEnabled is set, the interrupt enable and pending interrupt enable bits are immediately deasserted, just as if a DI instruction had been executed. The processor sets PendingRead, IntAck and Run and returns to the CPU loop with a status of zero without executing an instruction.Possible bug: in the real processor, the interrupt is recognized after the current instruction executes, rather than before. So far, this doesn't seem to be a problem except for the single-step simulation. In the SI operation, the interrupt timing is critical; I found it necessary to make a "wiring change" in order to get the SI to function properly -- there's an extra "flip-flop" that wasn't in the hardware. This seems to compensate for the early interrupt request detection.
6. Interrupt Acknowledge Operation
When the CPU Thread code calls UpdateProcessorFromBus on the next cycle after the interrupt request was seen, UpdateProcessorFromBus sees the IntAck and PendingRead flags and simulates the priority encoder logic which determines what RST N signal is correct for the combination of asserted and deasserted INTxx signals. Processor::Execute is called. This function finds IntAck set and sets an internal flag, InterruptFetch before calling the Processor's fetch and execute function. It also grabs the BusReadData value from the bus. (Note: this scheme will not operate properly unless the only possible response to an interrupt acknowledge cycle is a RST N instruction. For the H-8, this is always the case.) When the processor's fetch subroutine executes, it samples the state of InterruptFetch. If this flag is set, instead of reading from memory, the processor gets its opcode from BusReadData. Other than that, the execution proceeds just like any other opcode.
Back to Top