Canadian Carol

A Network Carol  (How EMBnet Canada was Born)

Christoph W. Sensen
National Research Council of Canada
Institute for Marine Biosciences
1411 Oxford Street
Halifax, NS
Canada B3H 3Z1

"Once upon a time in the Great White North, a country so far north of the United States of America that few Americans knew much about it, there lived a people whose scientists were, at best, connected to the Internet by 56 kbit lines." Unfortunately, this is not entirely a fairytale. Canadian scientists have all experienced the frustration of slow lines and many are still in that situation. In Canada, slow networks began to fade into history only as of April 1996 when the world's first GigaPOP (Gigabit Point of Presence) was opened in Halifax, Nova Scotia. Today, all major Canadian research institutions are connected to a network of GigaPOPs. The GigaPOPs are linked via an ATM network with a sustained data transfer rate of 7 Mbit/sec and a burst rate of 45 Mbit/sec. This network, called CA*net2, will serve as Canada's research and development environment for one more year, after which it will be replaced by CA*net3, a fiber optics network that will initially provide connectivity at a speed of 3 Gbit/sec.

For EMBnetters, the most interesting fact about CA*net2 and CA*net3 is probably that the test application and the showcase demonstration project on these high-speed networks is bioinformatics. This may seem surprising but there is a certain history to this, thus we have to go back about three and a half years to the dreadful times when we were all on 56 kbit lines and start our story there.

"The Great White North is a huge country, spanning four and a half time zones with people living mainly along the coastline of two great oceans and along the shores of the largest inland lakes known to storytellers. Some of the people living in the GWN were scientists working in the 18 Institutes of the National Research Council. The Institutes were located in several cities from Newfoundland to Vancouver, thus people in one of the Institutes would be heading off for lunch when those in another were just showing up for work." Canada has very particular challenges when it comes to communication. The National Research Council of Canada, which is Canada's lead Federal Research organisation, operates 18 Institutes, five of which are focussed on Biotechnology. The NRC Biotechnoloy Institutes are located in Montreal (BRI, the Biotechnology Research Institute), Winnipeg (IBD, the Institute for Biodiagnostics), Ottawa (IBS, the Institute for Biological Sciences), Halifax (IMB, the Institute for Marine Biosciences) and Saskatoon (PBI, the Plant Biotechnology Institute). There are two additional NRC units that indirectly support Biotechnology R&D, (CISTI, the Canada Institute for Scientific and Technological Information, Canada's National Library in Ottawa and the NRC Innovation Center in Vancouver).

In 1995, scientists from BRI, IBS, IMB, PBI and CISTI held a series of meetings to discuss models of how to establish bioinformatics within the Biotechnology Institutes. Very early it was clear that this should be done as an inter-institute collaboration; a duplication of effort at each institute was highly undesirable. The service should be provided from a location that was actively doing R&D in genomics and bioinformatics; however, the scientists insisted that each institute should have the same level of services regardless of location. Having said that, it became evident that the existing 56 kbit lines would not be able to accommodate this requirement. The planned bioinformatics services would provide access to over 800 executables, including GCG, GDE, PHYLIP and the Staden package and more than 70 databases, from the entire EMBL distribution to OWL. X-applications would be launched remotely from servers in Halifax and displayed on machines in the other locations across the country. Ideally, hard discs would be cross-mounted between different cities and OS updates and upgrades would be installed remotely.

The scientists concluded that high performance networking was necessary to make that dream come true. There was no precedent for such a network, so people started dreaming about what could be. The first model was to connect all institutes via T1 lines at a speed of 1 Mbit. This turned out to be a very expensive proposition, each institute would have had to pay of the order of $40,000 Can/year for the connection. The bad news was quite discouraging for a little while, but soon there was new hope. The experimental network developers in Canada heard about our problems and we started to talk to CANARIE about their CA*net2 plans. CANARIE, the organisation that had previously implemented the Internet in Canada, was developing a new high-speed network, called CA*net2, which was planned as an ATM backbone throughout Canada. The ATM backbone would carry TCP/IP protocols at a sustained rate of 7 Mbit/sec with a burst rate of 45 Mbit/sec. CANARIE was looking for a demonstration application that could be used to showcase the new opportunities and at the same time could be used to debug problems in the network.

Until 1995, no scientific group had ever approached CANARIE to get access to high-speed networks, and the only application that was discussed as having a need for CA*net2 was videoconferencing. This situation was frustrating for CANARIE; thus, when the opportunity to help establish a high-speed bioinformatics network materialised, it was as exciting to them as getting high-level bioinformatics connectivity was for the scientists. The bioinformatics network was "adopted" by CANARIE and, from that point, their CA*net2 was developed in close collaboration with the bioinformatics network, which had become dubbed "The Canadian Bioinformatics Resource - Ressource de Bioinformatique Canada", or CBR-RBC. Initially, CBR-RBC was planned and implemented as an Intranet for NRC. Approximately 40 UNIX workstations and servers were purchased, configured and installed at BRI, IBS, IMB, PBI and CISTI and later also at the NRC Innovation Center.

CBR-RBC is managed by a User Committee, headed by David Thomas at BRI, and having members from each of the institutes participating in CBR-RBC. Two full-time computer administrators, Rob Hutten, the UNIX system manager and Marc Boutilier, the Web site, database and applications manager operate CBR-RBC under the supervision of project manager Christoph Sensen. Terry Dalton is the security manager for CBR-RBC. He is also co-ordinating all of NRC's CA*net2 and CA*net3 implementations.

Once the NRC Intranet was in place, users were pleased that the system worked very well indeed. Scientists had access to bioinformatics applications and databases as never before. Nevertheless, the network performance was still not good enough to readily accommodate the high-end operations that had been identified in the initial proposal. Accordingly, in 1998, a new plan was developed to move the entire CBR-RBC network to CA*net3, which will operate at 3 Gbit/sec, approximately 500 times faster than CA*net2. We are looking forward to the summer of 1999, when all of the changes will have been implemented and we can write another report about networking in Canada.

Bioinformatics is an international science, and Canadians are very aware that many of the pioneering efforts in this field are coming from Europe. EMBnet is an excellent model of collaboration among bioinformaticians, and very early on, Canadian bioinformatics experts identified membership in EMBnet as a highly desirable objective for CBR-RBC.

The second phase of CBR-RBC, CBR II was implemented in 1998 to provide bioinformatics services to scientists at not-for-profit organisations in Canada. With strong support from the President and the Vice Presidents of NRC and SUN Microsystems Canada, a high performance SUN Enterprise 4002 with twelve 250 MHz CPUs, 2 Gbyte of main memory, 128 Gbye of hard disk space and 210 Gbyte of DLT tape space were added to CBR-RBC. This system became the official server for EMBnet Canada when membership was granted in September 1998. Accounts on this machine are maintained by CISTI using a Toronto-based, private-sector bioinformatics company, Base4 Bioinformatics Inc, as its agent. There is a flat fee of $195 Can/year for access to CBR II. We anticipate having several hundred users on this machine within a year.

There is still a lot to do within and for CBR-RBC but we are well on our way. We are quite happy to share our unique knowledge of distributed bioinformatics facilities with others who might want to implement a similar model in their country.

"Over time, the people of the Great White North discovered that the distributed bioinformatics facility had not only networked their computers, but had also created new friendships and fostered many new collaborations among scientists, - collaborations that never would have happened without the improved communication among the institutes."

This is the happy ending for our story. A Merry Christmas to all EMBnetters!