This year I have kept the political
commentary in a separate location.
This year also I have moved thoughts
and observations to a separate file.
This file will attempt to stay
restricted to current history.
I am writing this in the year 2007 -- I really am getting my
time lines screwed up. So on this page rather than giving a
sequential narrative of events, I shall give examples of my life in San
Jose. If I can tie them to dates, I shall.
This is sort of a long story with many facets and gives a view into
our perverted world of embedded software at Siemens in Santa
Clara. You have to have some concept of technical things to get
through this section although I try real hard to not overwhelm you.
I was responsible for the TMDN card in general but not for the
interrupt handling. Boca had informed us that the card failed
under less than half its specified call rate for basic operation.
Basic operation meant actual analog telephone lines rather than
digital lines. We used the same card for both with variances in
the code.
My boss called me into his office one day and told me that both the
shop senior engineer and the person responsible for the interrupt level
were gone for two weeks. He wanted me to rewrite the code in that
two weeks to make it work. So here is the challenge: rewrite a
major segment of the code, test it, and make sure that I was finished
before anyone could complain. This is my kind of challenge.
Now we get a bit technical. ALl good software is written in
logical layers. Layer 1 deals with hardware interface. The
highest layer deals only with the logical application. THere has
been a 7-layer model around for 20 years now and it holds up real
well. However, for this card we only used three layers.
We have the interrupt layer or the the hardware interface
layer. This functions take immediate control from the
higher level layers when the hardware interrupts the programming to be
serviced. If the hardware port is an outbound port, then the next
outbound data is placed in the hardware buffer. You keep moving
data to the hardware buffer until the buffer is full or you have no
more data. If you get an interrupt and there is no data to go
out, you disable the port as it has nothing to do.
On an inbound port, the reverse is true. You keep moving data
into an application buffer until there is not more data to be moved or
there are no more application buffers. This latter is a disaster
unless you can tell the inbound port to slow down and wait.
Mostly you cannot do this. Input port handling becomes your
critical path in such boards.
On our system we had three layers. The hardware/interrupt
layer followed in priority with an interrupt service layer. And
finally we get to the application layer. You can see the circles
that the logic goes through. Interrupts are processed as
needed. They transfer data to and from the interrupt service
layer which in turn passes the information to the application layer.
The application layer sees whole data groups and the buffers are
referred to as messages. So the application layer receives and sends
messages to the service layer. On the outbound side, the service
layer takes the messages and places them into buffers for the interrupt
layer. Similarly, the inbound interrupt places the data in
buffers which are examined by the service layer and formed into
messages for the application layer. As a whole the process should
work quite smoothly.
Now the current paradigm has application layers working as
state-event machines. THis is easy to imagine. You
have various states for the logic. Onhook idle, onhook ringing,
offhook talking, etc. You also have a list of events which can
occur. You make a matrix and determine what processing will occur
fr each event in each state. As you can imagine, there are many
combinations which should not occur. For example, you should not
get an onhook event when you are onhook.
<>There are more rules. You do not want the application layer
to get locked up sending a message to the service layer. After
all, this is the purpose of the service layer. It is OK for the
application layer to be waiting for an incoming event. It is not
OK for the application layer to be waiting for a buffer to become
available so that it can send an event back to the hardware. You
optimize memory into buffers such that this waiting will not occur.
>
<>Oh. The purpose of the TMDN board. It communicates in
one direction with the PBX. On the other side it communicates
with the telephone line. In the case of analog lines, all line
side activity is composed of onhook and offhook actions ,DTMF tones,
and voice connections.
>
<>On the PBX side are messages concerning the logical
operation of the lines. The PBX does not care if the line bounced
onhook 9 times or the DTMF reader heard a 9 tone. It just wants
to know that the user dialed a 9.
>
<>But you can see that the timing on the card is critical. The
lines must be sampled very frequently such that onhook durations can be
measured. Onhook for a long time terminates the call.
Onhook for a shorter time
flashes' the call. Onhook for an even shorter time may be a
dialed digit pulse. Onhook for an instant is glitch and is
ignored.
>
<>The problem of the day is that the interrupt processing and the
service processing for message to and from the PBX are taking so much
time that the line side of the board cannot measure its signal
durations properly.
>
<>Here it gets a bit spicy. The hardware for both sides of the
board are identical hardware chips. The only difference in the
data handling is the port number to be used. That is, only one
copy of the interrupt program needs to be in memory. Just pass
the lowest level routines a port number along with the data buffer and
everything will work just fine.
>
<>Similarly the service function could be the same code for both
sides. This would greatly reduce the size of the software and
simplify the board operation: you only have to debug on set of code.
>
<>But the way Siemens operates there is a Senior Engineer who has
his own ideas on how things should work. They do not match my
ideas. This is why I am assigned this work while he is out of
town. THe current board interrupt processing is the worst code
imaginable. They have designed the service layer to be state
event driven as well as trying to do so to the interrupt layer.
The concept is wrong. When you tell a junior engineer to use a
particular paradigm and the paradigm does not match the problem to be
solved, you end up with a mess. Our Senior ENgineer was proud of
the mess -- and very possessive of changes to it.
>
<>I have never had such loyalties as you know. So I tore into
it. I reduced the size by 20%. I showed what I had done to
my manager. He said I could do better. The one positive
side of technical, micro-managert is that he can inspect what you are
doing and know how to improve it. I followed instructions and
ended up with code less than half the original code. BOth in
terms of lines of code and the actual size.
>
<>This permitted us to expand the number of buffers available and
greatly improve processing time. I replaced the entire interrupt
code with mobile code independent of the port numbers so that the code
could be used on both sides of the board. I had the service layer
wrapping messages into a big buffer rather than starting a new buffer
for each message. In doing this all state-event concepts were
removed from the interrupt routines.
>
<>Similarly, I removed all state-event logic from the service
routines. The outbound service routine just moved message data
into the interrupt routine buffer on a round robin basis. If it
caught up to the interrupt routine, it just quit until the interrupt
routine informed it that it would accept more data. The
application had passed on the message to the service routine and gone
its merry way probably waiting for something else to do. WHen the
service routine had nothing more to send and the interrupt routine said
it was ready for more, the service routine ignored it until the
interrupt routine also indicated it had nothing to do . Then the
service routine shut down the port.
>
<>On the inbound side, the interrupt kept pumping data into the
buffer. If there were no data, nobody did anything. If
there were data, the interrupt routine informed the service routine
that it had work to do. The service routine then moved the buffer
data to and event buffer. WHen it detected the end of an event,
it sent the message buffer to the application layer which hopefully was
waiting for work.
>
<>I think you can see what happened here. The code was
actually as simple as I have just explained it.. ANd it
worked. This made my second line manager very angry.
Why? In SIemens we are expected to divide our time equally
between coding and testing. I had given myself 4 hours on the
last day of the two weeks to test. My manager did not believe it
could be tested in that time frame. It also occurred that the
engineer who was responsible for this section of code returned a day
early.
>
<>I took the engineer through what I had done,. He was flabbergasted
but professionally silent. We then went to the lab for the 4
hours of testing. In an hour we were finished. It works
perfectly except for one register in a short section of assembly
code. MY friend the other engineer found this incorrect register
and corrected it. How did I make such a simple mistake? I
needed my friend the other engineer to accept ownership of the new code
and found that if I let him find an error in my logic, he would accept
this ownership. It worked exactly that way.
>
<>Rather than do extensive testing at this point, I called my
friend, Tom, manager of system test in BOca to try out the new
logic. My challenge to him: overload the card and make it fail
under any load condition. He had one of his engineers set up a
test PBS and generate calls to the board. At 50% above our
specified load limit,, the PBX failed but the card worked fine.
We ahd done it. Load would never be a problem again.
>
<>I was not so lucky on the other side. When Dick, our senior
engineer, returned, he was irate. He had always known what I
thought of our service routines and now I had implemented my ideas into
one of his original designs. He was now in a bind. The line
side of the card had some problems with overload and lost messages
under some conditions. Use my code and remove the problem?
Not a chance. His code was special and not to be replaced by
generic, simplified code.
>
<>He spent the next several weeks trying to find his errors.
Our release was held up waiting for this. The day before we were
to drop dead on the release, we had a department meeting. The
department VP assigned to Dick the responsibility of finding the
problem and marshaling the troops to help. The rest of the room
suggested to the VP that I would be a better person to do this.
He was surprised but he made the switch.
>
<>Here is where the unprofessional sabotage comes into play.
>
<>I worked with the production test labs to find the error.
From everything we had, the error should not have occurred as it was a
known problem that had a known resolution. It turned out that
Dick had not resolved his problems and knew that there was an incorrect
software version in the production test lab -- even though the numbers
that the lab had indicated that they had the latest programs.
>
<>I could find nothing. The lab could find nothing. We
were stumped. The next day when one of the other managers
returned, the two of them marched to the labs with DIck's fixed
software and declared me to be incompetent because I should have found
the errant module.
I resented the smirks on their faces and the statements to the VP about
my incompetence. A couple of the other engineers knew what had
happened. I was not unhappy to see the senior engineer and this
manager move to another department.
>
<>I received several compliments on my code from other
engineers. The code was imported to other boards, including the
German versions. WHere we were, importation into the German
versions was the highest compliment. Almost the highest.
THe highest as when one of the German engineers pulled me aside and
said that the transplant of my code was the easiest transplant he had
ever done and the ease was due to the simple interfaces and the
documentation that went with the code.
>
<>But my management never heard those compliments. My
management was unhappy that I had Boca do the testing. What if
they found something wrong ? Yes, what if? They did this a
s favor to me and not part of their standard test. Therefore if
something had been wrong, we would have caught it before standard test
did. As it was, there were no problems, and Tom, the manager,
could have the confidence that his standard tests would go just fine.
>
<>My managers were angry enough that my next task was to fix our
test bed so that I would not ask Boca next time. This
punishment was depressing. OUr test bed was sort of like taking
the road test for a Model-T and applying it to a Mustang. You
could not ell when it passed. When I could not make them work,
they assigned another engineer to do it and docked my review. The
engineer assigned died while doing the updates and what he did was to
make the test be reflect a pass if the code did what I had programmed
it to to rather than what the published line specification said.
John is dead so bad words are not appropriate but making the test pass
the code is not the purpose of a test.
>
<>Another case where the company knew that I had performed a miracle
and my management believed I was a lazy goof-off. My manager knew
what I had done but bowed to the second line and then to the sabotage
of the senior engineer. After he left the department, his
interrupt processing never did work exactly right but close enough for
SIemens.
>
<>>
<>