| icl2900.org.uk | Anecdotes |
|
OverviewThere are a number of anecdotes related to 2900 systems that are worth sharing. I include them here. Bob Eager"s anecdotesThese are personal experiences. Fooling the managersThe University of Kent"s 2960 was reasonably reliable - rather more so after we retired VME/K in favour of EMAS, an operating system from the University of Edinburgh. The main points of hardware failure seemed to be fans and power supply units. When the machine did break down, it caused great disruption to classes, and these were not easily rescheduled. We were thus under great pressure to get back in operation as soon as possible. We had invaluable assistance in this from our site engineers, in particular a lovely man called Harry Sweet, who lived in Herne Bay. One of the most frustrating things was the time it took a spare power supply to reach us, even with two engineers doing a halfway meet (the trials of being in deepest East Kent). So we had a cunning plan; we kept unofficial on-site spares - unknown to the engineers" manager, who, by judicious use of smoke and mirrors, was persuaded (unwittingly) to provide several spare units, at least one of each kind. The problem was where to store them. They had to be accessible to the engineers, but could not be kept in the room provided for them, because their manager might have noticed. Instead, they were stored under the false floor in the machine room, scattered in various empty spaces. Of course, there was then the problem of finding the right unit without lifting half the floor. This was solved by the production of a "treasure map", the grid corresponding to the floor tile layout. The map was taped to the back of a drawer in the engineers" room...well away from management eyes. We still found a couple of mislaid power supplies when the machine was decommissioned. Trials and tribulations with VME/KThe University of Kent"s 2960 was supplied with the VME/K operating system. To put it mildly, this system was a crock; it was abandoned by ICL not long after Kent moved to using EMAS. It was, I believe thrown together rather quickly for use on machines that were not powerful enough to run the monster known as VME/B. To say that VME/K was unreliable would be putting it kindly. I was tasked with checking that it did what the documentation said, and in little more than two weeks I had submitted over 200 separate bug reports. I was not popular with the development team. The problem was that this was the main University computer, and it was vital to the running of the place. In sheer numerical terms, it was awful. Some of the failures could be attributed to hardware problems (it wasn"t particularly well designed), but the software could not cope well with these. We maintained a rolling average of the combined (hardware and software) mean time between failure over 13 weeks, and it was about 20 hours. This improved to about 2000 hours when we moved to using EMAS. The system used virtual memory, so there was a swap/page area on disk. One common problem was that if there was an error when reading this area, there was no attempt at recovery or mitigation; the system just crashed in its entirety. One day I was innocently running a blameless program, when it got a FSER 350 (as this swap/page error was known) and crashed. One of the few diagnostics was an identification of the user running at the time. Unfortunately, every time I ran that program that day, FSER 350 resulted. I was blamed. Of course, it would have been a lot better if just the user process that encountered the error had been terminated. Another problem was the disk controller. This was a monster that occupied two or three 19 inch racks. It was old technology; ICL had recycled an earlier disk controller and made a few modifications, one of which was to paint it "2900 Tango Orange". This controller was unreliable; it quite often just froze, causing a VME/K crash. EMAS handled this a lot better; it stepped the disk controller into diagnostic mode, to the point where it could issue a command to reboot it and reload the microcode (which was stored on a Compact Cassette). This took about 90 seconds, during which time EMAS queued pending disk transfers; after the reboot and reload was complete, the disk transfers were released and the system continued without loss of work. One more shortcoming of VME/K was its handling of memory errors. The hardware was quite sophisticated, and it had Hamming correction on each 64 bit double word. This meant that if a single bit was in error, the error could be both detected and corrected, the software being notified via an interrupt. The failing memory could then be read (with correct data) and rewritten to fix the erring bit. If two bits were in error, that was fatal, but could still be detected. The problem was that VME/K did nothing more than rewrite the data; it didn"t log or report the error, so the memory chip often just deteriorated further until another bit failed, and the system crashed. Once EMAS arrived, the situation changed; now the errors were logged, and once a day a report was printed for the engineers. This report not only detailed the failing memory board, but the exact chip that had caused the error. The engineer doesn"t always know bestWhen the University of Kent"s 2960 was installed, it came with a site engineer. For quite a while, one of these was someone who was a Kent graduate. He was somewhat of a "company man", and was not keen when we abandoned VME/K in favour of EMAS (from the University of Edinburgh). I managed EMAS; it had a novel way of handling filestore and (for the purposes of this story) peripherals such as printers. These were managed via a Spooler process, which handled all of the exception conditions, farmed out to it by the actual supervisor. Whoever wrote the code at Edinburgh had been a little obsessive about detailed error messages - a good thing, and possible because all of the messages were inside a paged process. One day, we saw a message we had never seen before. I forget the exact text, but it indicated that a particular fuse had blown in the printer. Edit: I just reviewed the source code; it was HAMMER DRIVER FUSE BLOWN. We duly called the engineer from his room. He looked at the message, and shook his head, stating that no such fuse existed and "our" system was wrong. We pressed him on this, and after casting his eye over the defunct printer he retired to his office and manuals. He returned a few minutes later, bearing a fuse. He silently opened a small panel in the printer casing, and changed the fuse. Hacking the hardwareOnce the University of Kent had moved to using EMAS, it was enjoying a rolling mean time between failure averaging about 2000 hours over a 13 week period. This was much better than the 20 hours we had been getting from VME/K. People were very happy. And then one day it all began to fall apart. The machine just stopped. No crash, nothing. The engineer"s panel indicated that the microcode had halted. We re-IPLed the system, and an hour or two later it stopped again. Eventually we called the engineers, and they ran tests. Lots of them. They pronounced that there was nothing wrong. Then the "crashes" stopped, for a couple of weeks. Then they started again. We couldn"t get a handle on what was wrong at all. It was eventually decided that, the next time it happened, I should use the engineer"s panel, for as long as it took, to investigate the state of the machine. In the event, I simply dumped out all the target machine registers, and the microcode PC. Our engineers obligingly left a microcode training manual lying around, together with a microfiche listing of the microcode. Oh, and some circuit diagrams. I retired to a darkened room for much of that day; and the next. Eventually I emerged with the reason for the crashes. Without going into too much technical detail, it seemed that the microcode and the hardware handed off tasks to each other; in particular, a part of the hardware called the "scheduler" was responsible for validating the type field in the descriptor register during the execution of any instruction that used a descriptor to access an operand. Any invalid type was trapped, and sent back to the microcode to force an exception (known as a "contingency"). All other type values were considered valid, and passed back to the microcode to be used in accessing a jump table, thence invoking the right bit of microcode for that descriptor type. So, what was going wrong? It turned out that there was what can only be described as a hardware design error. The scheduler didn"t detect one particular invalid type code, so it handed it back to the microcode, which accessed the jump table with it. This of course accessed an entry marked "can never happen", and the microcode halted. We later discovered that a physicist"s errant FORTRAN program was overwriting a descriptor, and generating the bad type value. If the machine stopped, he just submitted the job again until he got fed up and went off for a week or two. Then he tried again, never noticing the causal connection. We contacted ICL, but we never seemed to reach anyone who either understood what the problem was, or had the power or inclination to get it fixed (which would not have been a quick job, in any case). So I decided I had better fix this another way. Back to the microcode listing. I found an empty patch area, and hand assembled a new bit of microcode which I linked to the right jump table entry. All this did was generate a "descriptor error" contingency with a hitherto unused subtype code. I then wrote a tool to extract the microcode from the system disk, patch it, and put it back again. We IPLed the system, and tested it (by this time I had a test program). Success - it correctly triggered the new contingency and the microcode didn"t halt! The only thing left to do was to modify the various components of the operating system to do the right thing, culminating in a change to the FORTRAN run-time system to generate a suitable message. That only took me a few minutes. We had no more microcode halts and the users were happy. Dual? What dual?The University of Kent"s 2960 was installed in 1976, and it moved to the EMAS operating system in 1979. EMAS was very efficient, but as the years went on the system was being stretched to its limits. By 1983 the system was fully committed pretty well 24/7. Government policy meant that we wouldn"t get a replacement for another three years. We knew we couldn"t afford much of an upgrade, but we found out that there was a spare 2960 OCP lying in a warehouse in Southall (I believe it had been used for the recent Census). It was free to a good home (us) but we had to pay about £350 for transport, etc. ICL kindly supplied the extra bits we needed to hook it up, and by slightly reducing the peripheral configuration (we no longer needed a card reader) we were pretty well able to cover maintenance costs within budget. The day came, and we IPLed the dual system for the first time. EMAS said "Dual OCP found" and went to work. Basically, it worked until anything went wrong, but it turned out that under exception conditions the operating system was unable properly to control the second OCP (e.g. to halt it). EMAS had never before been run on a dual 2960 OCP (there wasn"t one at Edinburgh), and it turned out that the instructions and image store locations needed to communicate between OCPs were not standard across the 2900 range. We asked ICL for documentation. No one knew where it could be found (we assume), or perhaps someone decided we shouldn"t have it. In any case, we were stuck. Without documentation we couldn"t modify the system supervisor to make dual OCPs work as they should. I had previously learned quite a bit about the 2960 microcode, so I retired once again to a darkened room with the microcode training manual, and a microfiche reader. It took me about a day before I emerged, having read a great deal of microcode and essentially reverse engineered all of the image store locations and bit positions needed to do what we needed; I think it was quite short. Armed with this, it was the work of minutes to modify the supervisor, rebuild it and re-IPL. The system worked very well for its final three years. Other anecdotesSome of these are gathered from the book An ICL Anthology, by Hamish Carmichael. Etnic minoritiesAt a meeting of the "New Range Upper Sub-Range Primitive Interface Interpretive Committee", chaired by John Bowthorpe, there was a lengthy debate one day about the proposed implementation of the SWEQ (Scan While Equal) and SWNE (Scan While Not Equal) instructions. Near the end of the meeting, one of the engineers got up and said that he was happy with the implementation but unhappy with the names of the instructions. He explained that he came from Oldham, and that in parts of Lancashire "while" meant "until", so that any Lancastrian engineers and programmers would interpret the instructions in exactly the opposite way to that intended. An action was duly placed for the Architects to rule on this. At the next meeting, the chairman came to this action and solemnly declared that the matter had been duly considered and the instruction names would remain as they were. It was felt that ICL specifications could not be written to cater for ethnic minorities. New Range planningFrom Gordon Cumming George Felton, as Head of Software for New Range Planning, was also responsible for issuing the first full set of manuals, bringing together all the work done by the Planning team. This amounted to some eight or ten four-ring binders of material and it was distributed to a significant number of people around the company for evaluation and comment. The production task was large enough in itself and, once the copying was completed, the insertion into the binders was no mean task. Unfortunately, the settings of the hole punch did not match the ring binders, requiring the whole job to be redone. Some two or three years later, George moved to Bracknell, and initiated a set of standards for software development. The very first document concerned the standards for four-ring binder hole punching. A short while later, Issue 2 was distributed. InspirationFrom Dick Emery My knowledge of New Range in development caused me to be selected for the New Range launch team in 1973. For a year, I shared an office in Computer House at Euston with Ninian Eadie, who was in charge of the small team. I was responsible for the presentation of the technicalities. We judged that the way to do this was to prepare a film which explained the architecture of the new series and I was the technical consultant to the agency chosen to make the film. It was hard work trying to explain to script writers, visual artists and film directors what this new computer series had going for it. I tried analogies of mirrors and prisms to explain virtual storage, but it was all an image too far for them. I was despairing of ever getting the act together till, one day, the phone rang and I was invited to rush down to the studio in Bloomsbury where they had something to show me. I was ushered in and stood before a table on which there was something mysteriously covered by a cloth. They explained that the visual artist, who had until then been the most obdurate in failing to understand what I wanted, had become inspired. They drew back the cloth and there was a glass ball with ground facets all over it. Impatiently they asked whether this was what I meant. I enthused and asked the artist where it had come from. He said that he had been sitting on the toilet and noticed this glass ball on the end of a chain next to his head. Few people knew that the glass ball, which we later saw around the world on the silver screen, had started life as a humble toilet chain pull. Indeed, the imagery used in all the advertising material and standard presentations derived from the same glass ball. We even presented customers with memento glass balls which were vastly more expensive than the original and lacked the eyelet for attaching the chain. The 2900 press launchFrom Hamish Carmichael There were lots of press there on the day, young and old, male and female. One of them was decidedly young, deliciously female and devotedly courted throughout the day by one of the American directors whom we had in those days. So much so (and no-one had been counting the glasses of wine as they flitted by) that she rashly accepted a desperate wager: a successful landing in the moon-landing game or hey for the nearest bed! Do you remember that moon-landing game? A primitive precursor of so many others. It told you, chattering on the teletype, the height of your module above the lunar surface, your current rate of descent and your reserves of fuel. You told it at what rate you wanted to burn fuel for the next ten seconds. Then you got another status report. And so on. A few people landed successfully, but not many of us had NASA training. Anyway, full of confidence, this lovely lass essayed a descent, giving her commands with initial confidence which rapidly tapered through mounting doubt into squeaking dismay. When she got the final response: WE’LL NAME THE CRATER AFTER YOU, and realised the implicit consequences, her shriek could be heard from one end of Bracknell to the other! VME/KFrom John Deas This alternative 2900 Series operating system was Ed Mack"s special baby. It was going to be marvellous and do anything but, actual details were hard to come by. The initial designer was one Ypsilanti, whose name was rumoured to be an acronym: Your Program Specification Is Long And Not Terribly Informative. There came a time, about 1975, when senior management were pressing very hard for VME/K to be proposed to customers, but the middle rank technical staff who had to sign off the Blue Border risk appraisal documents felt that there was still not nearly enough information about the nature and capabilities of the product. At a meeting in Putney, called to explain to sales representatives why S&TS would still not approve any VME/K Blue Borders, the S&TS spokesman said: We simply don"t know whether it"s a square object or a spherical one. Voice from the back: It sounds to me like two spherical objects! Christmas Eve, 1976From Brian Russell The P4 prototype, later to be sold as the 2980, had an Engineer"s Hooter. It was intended, and was indeed useful, as an audible overview of what was happening in the machine. Like its predecessors on 1900 series processors, the P4 hooter was originally connected to the Jump instructions. Unfortunately for the engineers (though fortunately from the machine performance point of view) the jump rate was so great that the hooter was ultrasonic, totally inaudible. Then it was connected to the Call instruction, which made it audible. Also like its predecessors, it was realised that, by a suitable choice of program, one could use the hooter to play tunes. Someone wrote Good King Wenceslas and We Wish You A Merry Christmas and brought them in for the morning shift at 06.00 a.m. on Christmas Eve. The program was built round an inner loop whose frequency would have to be tuned experimentally. The engineer"s handkeys were used to select either "play a tune" or "sound middle C", nominally 440 Hz. The loop was modified until middle C sounded reasonable, but the tunes were not quite right. Another engineer arrived, and was asked: Does this sound like middle C? No, he answered, It sounds a bit low. The loop constant was decreased, but the tunes were still no better. As each engineer arrived he was asked the same question, and the loop constant was increased or decreased when he said Too high or Too low. This continued until Dave Potts arrived and was asked: Does this sound like Middle C? No, he replied, It"s more like B Flat. It"s a very good B Flat. A quick calculation was made to raise the pitch by two semi-tones, 2 × 12th root of 2, and... spot on, perfect pitch! To this day I do not understand why a bunch of engineers didn"t connect an oscilloscope and measure the frequency. Deliveries in DeutschlandFrom Chris Sundt When the building for the European Space Agency in Darmstadt was being designed, the size of the lifts was determined by the size of the largest piece of equipment which would have to be installed in the building. This turned out to be half of a 2970 processor (OCP). The processor was built with two chassis, deliberately to make it easier to transport. Unfortunately, the need to keep the two halves separate for transport was in direct conflict with the need to wire them comprehensively together for system testing. Therefore, when the time came to ship the processor, there were myriads of backplane wires leading direct from solder on one side to solder on the other. The next problem was how to lift both halves simultaneously and smoothly, when the only thing holding them together was the backplane wiring. The solution was to embed them in probably the largest and heaviest wooden crate that ICL ever manufactured. It was, indeed, so large that some wag suggested that when the delivery had been made the crate should be brought back full of illegal immigrants. It would hold about 140 of them, and at £1,000 a head that would constitute quite a good piece of business. So in due course this massive piece of equipment was shipped from West Gorton, in one of a fleet of lorries containing the masses of associated peripherals, cables, and spares required. In the dead of winter they trundled across England, successfully negotiated the ferry crossing, and set off into Germany. Sadly, the drivers reached the limit of their permitted driving hours while they were still some tens of kilometres short of their destination. They had to stop for the night. In the morning, none of the wagons would start. People hadn"t considered that it gets extremely cold in Germany during midwinter, and no provision had been made for keeping the diesel fuel warm enough to prevent it jellifying. However, after various improvisations they got going, one by one, and straggled into Darmstadt to the waiting site, arriving at various intervals instead of as a simultaneous and impressive convoy. Except, that is, for the truck carrying the giant processor. It failed to arrive at all. Search parties were sent out, and the driver was eventually located in the hotel where he and his colleagues had spent the previous night. Where"s the processor?, he was asked. He explained that he hadn"t been able to get started at all, and eventually had had to give up and leave the lorry on the autobahn. Which autobahn?, he was asked. He wasn"t sure. And, of course, in that part of Germany the autobahn network is at its most spaghetti-like; there are autobahnen running in all directions. By now the commissioning team were in a state of high nervous alarm, at the thought of their millions of pounds worth of processor abandoned, lost, and (of course) deeply frozen. But after the searchers had driven for many many miles, up one autobahn and down the next, eventually it was found, the recalcitrant engine was kicked into life, and the mighty cargo arrived on site. Where, of course, it was far too big to be taken up in the lifts. So a complete section of wall, windows, cladding and all, had to be removed, and a giant crane was organised to hoist the machine to its destined floor. Then came the next problem. In order to spread the lifting strain evenly, and to prevent the giant crate from either hogging or sagging, there was quite a complex cradle hanging from the crane"s hook. Although the reception party could lasso the end of the crate and pull it towards them through the opening in the wall, the cradle started to foul the upper part of the wall long before the load"s centre of gravity had been pulled inboard. Impasse! But not for long. The crane driver (let"s call him Helmut) saw the problem, used his initiative and devised his own solution. Having assured himself that the load was at exactly the right height, with only a few inches between its base and the level of the floor, he began to swing his jib gently from side to side. To their horror, the installation crew saw their giant processor, in its giant crate, all multi-million pounds worth of it, starting to oscillate like a giant fairground toy. With each cycle, the end nearest the building came further and further through the gap in the wall. Finally, when the load had gained enough momentum, and was at the extreme end of its inward swing, with positively exquisite timing Helmut released the brake on the cable drum, the crate grounded, and slid to a safe halt inside. People then remembered to breathe. A note of caution here. I am advised that this delivery took place in summer, so the snow seems wrong. I am also advised that two 2980s were being delivered. I wonder if the story relates to an interim 2970. You Can Count on Me
from Anonymous As part of my work on VME/B Instrumentation, I’d written some documentation (SID & OSTC/INs) which inter alia described a system for maintaining counters of interesting system functions. These counters could be interrogated from time to time as required. It was essential that the system incurred as little overhead as possible maintaining the counters. VME/B was of course highly modular, so a separate subsystem was required. However at the CADES Implementation Level, the holon interface to update a counter was an inline macro call rather than a procedure call. Thus in S3 terms the macro contained a single line :-
Counter := Counter + 1
and this compiled in PLI to three instructions:
LSS (for a 32 bit counter)
This was felt to be such a low overhead that although conditional compilation could be used to null it out, for most purposes it was simpler just to leave the counters operational all the time. This arrangement was fine for a local counter but there was the possibility of occasional missed updates with public (or global) counters. However this was felt not to be too severe and not worth incurring great overhead to achieve perfect values. Around this time "Peter" was taking considerable interest in VME/B and he assigned one of his ex-Univac colleagues who specialised in Performance issues to get involved. This was "Paul", who had had some considerable experience of tuning the OS on the Univac Scandinavian Airlines System. I had to liaise with Paul, and generally we got along quite well although I was always a little wary of anyone from the ex-Univac "mafia"! Paul was into counters in a very big way and I think he basically approved of what we trying to built into VME/B. However he couldn’t resist a bit of low level meddling and one day he informed me that he’d read my notes and that it simply wasn’t necessary to take three PLI to update a counter. He’d been also studying PSD 2.5.1 and found that a single instruction would do the job perfectly, this being INCT, and this should be used in all cases both local and public. Now I was somewhat horrified by this as INCT was of course an "atomic" semaphore instruction which would clear slave stores in all ocps on a system. This would cost far more than the two PLI "saved". It wasn’t justifiable even for a public counter let alone a local one. However Paul was not going to budge. This put me in a difficult position as I’d heard some of the ex-Univac guys could get quite spiteful if you crossed them. So I had a chat with "Bill" who was the head of Ostech at that time. He simply laughed, told me to stick to my guns but tell Paul I’d changed to use INCT as demanded. It was unlikely that Paul would ever find out but anyway he (ie Bill) would handle any possible future fallout. I think there was also a frisson of schadenfreude at the thought that VME/K would be using INCT unnecessarily everywhere and so severely impacting its performance. In the event, we never heard any more on the topic. I wondered whether VME/K used INCT and subsequently realised it wasn’t a good idea. If they did they certainly didn’t take the trouble to let us know! Building Blocks
from Anonymous
(names changed to protect the innocent/guilty) An anecdote from an earlier time. I started with ICL in 1968 which was the year of the Company's formation. After quite a long induction course at Moor Hall I joined the George 3/4 Operating Systems Development team based at Carlton Drive. It was a very good place to work amongst friendly like minded people. By 1970 I'd taken on responsibility for a number of routines for managing Object (ie User) Programs - particularly swapping them in and out of main store - or Core as we still called it in those days. I worked closely with "Keith" who, inter alia, looked after the Process Controller/Low Level Scheduler which decided which programs should be swapped in to Core and given (plugged in) to the Executive to run. George 3 managed all the Core that was left over after Exec had taken its relatively modest amount. However it was not a virtual memory OS. George itself consisted partially of a fixed area and then the rest comprised a large number of relatively small activity and ancillary blocks which were linked to one another via chain pointers so could be moved around anywhere in store as George's Core Allocation system deemed fit. However there were constraints. Object Programs needed to be locked in place as they could not be moved once made available for running. And some George blocks needed to be locked although not all that many. This system was to some extent vulnerable to fragmentation so in the case of George's own blocks, if it was known that a block needed to be locked for more than a very short period, the request for Core (GETCORE) would have a parameter (LONGLOCK) set indicating that the Core Allocator should try to position the block at either a very low address or a very high address to minimise any fragmentation. At this time, Sales support were trying to pass a very important benchmark and a great deal hung on its success. A number of programs had to be run within a certain time and one particular program was large compared to the total Core on the system. Sometimes this program would run OK but on other occasions of the same benchmark it would not run at all and the system would grind to a halt. George tracing was switched on and we could see the Low Level Scheduler determining that there was enough total free Core to swap this large program in (having swapped all the others out!) but the SwapIn routine was unable to get a single contiguous block of Core large enough. The Core Unjammer had been working overtime but to no avail. To try to get to the bottom of this Keith wrote a diagnostic MEND (patch) which would detect that this situation had occurred and then deliberately crash the system to get a Postmortem Dump - reams of paper. After a lot of manual analysis this revealed that a small George block was locked down in the middle of Core and was blocking amalgamation of the free Core either side of it. This little block belonged to the Backing Store Transfer System (BSTS) so Keith and I set off to have a chat with the person responsible for the relevant code. It was going to be trivial to write a MEND to set the LONGLOCK parameter when the block was allocated and hopefully that would be the bug fixed. We didn't anticipate any problem. Unfortunately it seemed our colleague was having a bad day and pronounced himself far too busy to even consider the issue. So we suggested we'd write the MEND and then it would be just up to him to approve it. But that suggestion went down really badly as well (don’t you dare touch my code!) and so we retreated wondering what to do next. We didn't want to involve our Management because it could have resulted in bad feeling. Then Keith said he had an idea. He would enhance his diagnostic MEND so that rather than crash George it would search for the errant BSTS block and flip its lock bit off. Then the Core Allocator would move the block out of the way and the large program would be swapped in. I pointed out that there was just one problem with this; the next time after this that the BSTS code ran it would likely crash the system because its block was no longer where it thought it was. Keith's reply was that the resultant Postmortem dump would then be routed to the appropriate desk rather than our own! We thought it only fair to inform our colleague of our plan in advance whereupon he miraculously found the time to immediately produce his MEND and fix the problem. Interestingly the problem would not have arisen with George 4 which employed paging for Object Programs. |
|||||||
This site is copyright
© 2026
Bob Eager
Last updated:
02 Apr 2026