[File; TRANSGID.TXT Revision date; April 23, 1990] A SHORT GUIDE TO NETWORKING AND FILE TRANSMISSION Erich Neuwirth Institute of Statistics and Computer Science University of Vienna Austria (A4422DAB@AWIUNI11.BITNET) GENERAL PRINCIPLES OF SENDING FILES IN ELECTRONIC NETWORKS Networking is mainly used in 2 ways: Electronic mail Sending (binary) files This paper tries to explain what some of the differences are and how one of the two transmission methods sometimes can be (mis)used for tasks which seem to belong to the other method. Electronic Mail Electronic mail means you are sending text from one computer site to another site. Letters of text are coded as numbers internally within computers. Problems arise from the fact that the same letter may be represented by different numbers on different computer systems and vice versa the same number may yield a different letter on different computer systems. Mostly we are concerned with two such representation systems for letters by numbers. ASCII (which is used on IBM-compatible PCs and on most non-IBM mainframe computers) EBCDIC (which is used on IBM (and compatible) mainframe computers) When you are sending text from one computer to another computer the computers "think" they only are sending numbers. People reading or writing text, on the other hand, expect characters, so some interpretation of the numbers producing the text must take place. Simply transferring the text file as a sequence of numbers (which is what it looks like to the computers involved) would result in an unreadable file on the receiving computer system. Therefore when using computers with different character representation systems the transmission usually involves a "translation process" which has the net effect of yielding a different "sequence of numbers" (= file) on the receiving machine, but this file usually gives the same letters when read as a text file. Usually these translation processes work quite well for letters (lowercase and uppercase) and digits. Quite often you will encounter problems with special characters like parentheses, brackets, tildes, carets and so on. If you are interested in merely transferring texts this is not much of a problem, because even if some special characters get scrambled it is usually not too hard to reconstruct the original text by normal editing. If you are setting up a new communications link it is a good idea to send a file containing all printable characters with descriptions and to test if they arrive at the other end as they should. At the end of this paper you will find an example of how such a test file could look. Of course such a file should be sent from both ends of the line because the scrambling process in many cases is asymmetrical, so different transpositions happen in the two different communication directions. Closely inspecting the file you receive will show you which characters are changed during the transmission process. Now three different events can happen: 1) You receive all the characters as they should be: Action: Don't worry, be happy 2) Some characters are not what they should be, but different characters still are different (even when not identical with their original) Action: Do worry, but not too much. In this case you can use the FIND and REPLACE function of your text editing program to restore the original meaning of the file. You even could program a macro in your text editor (if you don't know what that means just ignore this sentence) which automatically performs the "retranslation" process. 3) Some characters are scrambled and different characters in the source text file come out as identical characters at the receiving end. Action: Do worry, because this is the worst possible situation. It is not possible to construct an automatic "retranslation" process. As long as you are only concerned with text you will not have too many problems, because letters, digits, commas and periods usually are not scrambled when sent between different computer systems. If these characters also are scrambled the transmission process does not deserve the name "communication process" any more and you should talk to the technical people in charge of the transmission channel to take care of these problems. Things become more difficult when you want to send data files or program source files. Files of this kind usually contain special characters like parentheses and to reconstruct the original text of the file you usually have to edit the file you received by hand and to infer from the context the original meaning of a recognizably incorrect character. The automatic file transfer usually takes place between mainframe computers. So the most simple situation with text file transfer is that you use the editor on your mainframe computer to create your text and then you use the mailing program on the mainframe to send the text file (sometimes called e-mail or note) to its destination. At the destination site the receiver then can receive the file and read it with the help of the text editor program on the receiving mainframe computer. Sometimes the situation is more difficult. The file you want to send may exist on your PC, but not yet on the mainframe which is your entrance to the international computer networks. There is an important detail you have to take care of here. Usually you can write texts on a PC using two different kinds of programs to write with: Text editor programs or word processing programs Text files produced by text editing programs usually give no problems when you try to send them over a network. With most word processor files you will experience difficulties. But most word processing programs have a special way of saving your text as a "plain ASCII file". Remember to save your texts with this option if you intend to send them over networks. And if you are still considering which word processing program you should select for your personal use, only select a program which offers this option. If you do not know yourself how to verify the existence of such an option ask somebody more experienced than you to help you to find out. Now you have to find a way to transfer the file from your PC to your mainframe computer. For this purpose you need a file transfer program on the PC and on the mainframe. Different varieties of programs of this kind exist, but the prevalent program in an academic environment at the moment is KERMIT. To use KERMIT to transfer files you need the version of KERMIT for your PC and an installed version of KERMIT on the mainframe. The mainframe KERMIT is not your responsibility, you just have to find out from the staff of your computing center if they already have installed this program. If they have not done so yet you should tell them to do so because KERMIT is one of the very few hardware independent standards and it should be supported. Additionally, all KERMIT versions are in the Public Domain, so they do NOT COST MONEY. Your local computing center also should help you to find the version of KERMIT you need for your PC. KERMIT is a program used for 2 purposes; namely for using your PC as a terminal to your mainframe computer and for transferring files between these two systems. Now things start to be complicated (even more complicated? I hear you complain!). In this paper we will not deal with using KERMIT as a terminal emulator. There are many ways to do this and it mainly depends on which kind of mainframe you are using. You should try to get some help from the people from you local computing center who can show you exactly how to use KERMIT for this purpose. An additional remark: If you only want to use KERMIT as a "terminal emulator", which means using your PC as a terminal, you do not need KERMIT on the mainframe computer you are connecting to. The mainframe version is only needed for file transfer between the mainframe and your PC. Now things become really complicated! The PC KERMIT has only one way of transferring files. But the mainframe version usually has two ways (called "modes" by computer scientists). One way is text mode, the other way is binary mode. Text mode is used to transfer text files. E-mail consists of text files so it is this mode you need for downloading e- mail from your mainframe to your PC. Usually you need not care too much because practically all mainframe versions of KERMIT use text mode for file transfer if not told otherwise explicitly. So simply transferring a text file from your PC to the PC of somebody else you want to send it to can be done using the following steps: 1) Upload the text file from your PC to your mainframe with KERMIT in text mode 2) Use the mail facilities of your mainframe to send the text file as mail to the intended receiver 3) The receiver finally has to download this mail file (it still is text) with KERMIT in text mode to his/her PC In most cases the received file is identical with the original file. Letters and digits arrive as they should. The idea behind text mode of KERMIT is that the meaning of characters is preserved, so when transferring in text mode KERMIT automatically adjusts for different systems of character representations on the mainframe and on the PC. You might find that some of the special characters do not arrive as they should, but this usually is no problem when the text is only intended for reading and not as input to some computer program. Later we will see what you can do if you have to send a text file containing special characters and want to make sure that these characters arrive unchanged. TRANSFERRING NON-TEXT FILES It is becoming even more difficult in this section, but if you want to send programs and data files usable on other machines it is important that you understand this section. Networks can also be used to send PC programs over the network. If you want to send a program to somebody with the same kind of PC you have, the basic procedure is very much like the procedure for transferring text files from your PC via the network to somebody else's PC. The steps involved are: Uploading to a mainframe Using the sending facilities of the network Downloading from the target mainframe to the target PC The difficulties arising with program files are that programs contain more different symbols than text files. They especially contain lots of so called "nonprintable" characters. You can see this if you try to look at your program file with a text editor program or a word processing program. The simplest solution to transferring program files and like things (called binary files in computer terminology) is to use the binary transfer mode of your mainframe KERMIT to upload the program to your mainframe. Binary mode means that no translation whatsoever takes place while sending the file (remember, sending text files often involves a translation process). Now you can use the facilities of your mainframe for sending files over the network. Sending a file is not the same as sending a text as mail. Mailing implies that your text is put into the electronic equivalent of an envelope. Sending a files does not add the envelope, so the file being sent is (almost) identical with what you have on your PC. The receiver then can download the file to his/her PC also using the binary transfer mode of his/her mainframe KERMIT and the PC version of KERMIT. This file transfer quite often does not work. Some reasons may be: the two mainframes involved come from different manufacturers, some intermediate mainframe makes problems or the file is passing through different networks. One situation where it makes sense to try this way of sending binaries is when both mainframes are members of the EARN, BITNET or NETNORTH networks. It usually does not work when the mainframes belong to different networks like EARN and JANET. Now what can we do when we want to send a program or a data file from an EARN site to a JANET site? The main idea is translating your binary file (the one you cannot read because it contains nonprintable characters) into a file consisting only of printable characters. The most popular scheme for doing such a translation is the UUENCODE/UUDECODE process. It implies 2 programs, one usually called UUENCODE and the other one UUDECODE. UUENCODE takes a binary file and converts it into a file consisting only of printable characters. UUDECODE reverses this process and restores the original binary files from the encoded file. So what do you need these programs for? You UUENCODE the binary file and upload it to your mainframe (using the text mode of your mainframe KERMIT). Since it consists of printable characters only, you can incorporate it into a mail file you send. This mail file hopefully arrives at its destination and the receiver can download the mail from his/her mainframe to the local PC. Then it is mandatory to remove the "electronic envelope" from the mail file. An appendix will describe how an UUENCODEd file looks and how to recognize the parts forming the "envelope". Then the UUDECODE program can be used to translate the UUENCODEd version of the file back into its binary version. If you want to use this process you have to get hold of a copy of the UUENCODE and UUDECODE program. It is not possible (at least not in an easy way) to send this programs over networks if you have no experience with encoding and decoding binary files. These programs are binary files themselves and we cannot send unencoded binary files. So we would need the binary files already to translate the encoded versions into the binary version. It is a "who is first, the hen or the egg" kind of situation. There are ways of solving these problems, but the solutions involve a nontrivial amount of technical knowledge and also depend very much on the circumstances of the PCs and mainframes involved. (For the more technically inclined: we could send the source files of the translation programs as text files, but then we have to be sure that the recipient has a compiler for the programming language we are using.) So quite often the easiest way of setting up an environment where file transfer is possible involves sending a disk with the UUNCODE/UUDECODE programs to the sites involved. Once the programs are available file transfer can start. Now let us look what an UUENCODED file looks like: ------- the file starts directly below this line ------------ begin 644 erich.com MZV>0("`@("`@("`@("`@("`@4V\@>6]U(&UA;F%G960@=&\@555$14-/1$4@ M=&AE('1E<W0@9FEL92X-"B`@("`@("`@("`@("`@("`@("`@("`@0V]N9W)A ;='5L871I;VYS(2$-"B0:N@,!M`G-(;@`3,TA ` end ------ the UUENCODED file ended just above this line ------- The first line always contains the word 'begin' starting in the first column. The next item is a number which you can safely ignore and the last item is the name of the UUENCODEd file. The last line of the encoded file consists of the word "end" starting in the first column and nothing else. Some encoding programs add a line containing size information about the encoded file, but this is not really necessary. If you use the UUENCODing program on your PC the encoded version of the file usually has the same first part of the file name as the file being encoded and the file extension .UUE So encoding a program ERICH.COM would produce a file ERICH.UUE . This file ERICH.UUE is the one that should be uploaded and sent using the mail facilities of the network. At the receiving site the mail file sent can be downloaded to the PC. The downloaded file usually looks similar to the following example: ---------------- this line is not part of the file ----------- Date: Sat 14 Jan 89 06:51:59-EST From: John R. Somebody <SOMEBODY@SOMESITE> Subject: File transfer demonstration To: The catcher in the rye <CATCHRYE@MYSITE> begin 644 erich.com MZV>0("`@("`@("`@("`@("`@4V\@>6]U(&UA;F%G960@=&\@555$14-/1$4@ M=&AE('1E<W0@9FEL92X-"B`@("`@("`@("`@("`@("`@("`@("`@0V]N9W)A ;='5L871I;VYS(2$-"B0:N@,!M`G-(;@`3,TA ` end John R. Somebody 1/14/89 SOMEBODY@SOMESITE CATCHRYE@MYSITE 1/14/89 file transfer demonstration --------- this line does not belong to the file any more --- From this example it should be easy to see what the next step is: Every line above the "begin" line and every line below the "end" line has to be removed. The remaining file the can be decoded using UUDECODE. If no additional problems occurred the decoded program is identical with the binary program the sender wanted to send. Now for possible difficulties: UUENCODEd files contain special characters like brackets. Now when you are reading a text file you usually can recognize the intended special character even if it has been changed in a file transfer process. But it is not possible to recognize changed characters in an UUENCODEd file. So you have to find out if all the characters arrived unchanged. For this you can use the method described at the beginning of this paper, namely sending a file with all characters together with a verbal description of the characters. All remarks from the earlier part of the paper apply. Inspecting such a file closely might help you to find out which characters were changed and into what and with luck you can reverse this exchange process. The main problem with the UU scheme is that the set of characters being used contains special characters. So a variant of this method has been devised. It is call the XXENCODE/XXDECODE process. Essentially it functions like UUENCODE/UUDECODE, but the encoded file only contains letters, digits, and the plus and the minus sign. The advantage is that these characters usually are not changed when passed through different computers, so chances are higher that such a file will arrive unchanged. As with UUENCODE/UUDECODE you need the programs before you can start transmission of binary files. The XX scheme is relatively new, so usually it is easier to find programs for the UU scheme than for the XX scheme. It is important to be aware of the fact that UUENCODEd and XXENCODEd files are more than 30 percent larger than the original file. This is the price we have to pay for better transportability. There is one more important concept you should be aware of when transferring more than one file at a time and/or transferring big files. It is the concept of an archive. An archive essentially in one file created by pasting together and compressing one or more files. Usually when transferring a few files you use an archiving program which creates just one file out of a few files. This archived file also is smaller than all the "source" files together. In the archiving process you need two programs: the archiving program creating the archive and the dearchiving program reconstructing the original files. The advantages of using archives are: 1) It is impossible to forget a file belonging to a set of files when transferring copies of an archive 2) The amount of data to be transferred is smaller and therefore uses less disk space and less connect time for transferring them electronically. So if you want to send a few files belonging together it is quite common to create an archive, then to send the archive and then have the recipient reconstruct the original files by archiving. When you receive a file with file name extension ARC it is highly probable that it is an archive file. In this case the extension ARC denotes a special archiving (= pasting together and compressing) scheme. There is a new scheme around now which usually can be recognized by the file name extension ZIP. The 2 programs needed to be able to work with the ZIP scheme are PKZIP and PKUNZIP. Let us look at an example of how to use this set of programs. Let us assume we want to send 3 file named FILEA.TXT, FILEB.DTA and FILEC.COM. If we execute the command line PKZIP ARCHIVE FILEA.TXT FILEB.DTA FILEC.COM PKZIP will create a file ARCHIVE.ZIP. This file is our archive and contains all 3 "source" files in a condensed form. To reconstruct the original files we execute the command line PKUNZIP ARCHIVE which will create the 3 original file FILEA.TXT, FILEB.DTA and FILEC.COM. There are different programs around for the ARC variant of the process. ARC and ARCX are a pair performing essentially the same function as PKZIP and PKUNZIP, PKARC and PKXARC are another pair. There also is a program called LHARC which performs archiving and dearchiving functions with just one program. The difference is that PKZIP and PKUNZIP use the ZIP scheme whereas ARC, ARCX, PKARC and PKXARC use the ARC scheme and LHARC uses the LZH scheme. All these different schemes are incompatible. If you want to create an LZH-archive similar to the ZIP archive of the previous example you can do so with the following command: LHARC A ARCHIVE FILEA.TXT FILEB.DTA FILEC.COM This will create a file ARCHIVE.LZH. Extracting the files from the archive is done with the following command: LHARC E ARCHIVE There is a special variant of archive files, so-called self extracting archives. In this special case the archive and the dearchiving program are pasted together. The result is an executable file (usually with extension EXE) which, when executed, reconstructs the original files contained in the archive. It is not possible to recognize self-extracting archives from the file name extension, so you have to be told that a certain file is a self-extracting archive. So we have met two important concepts: Encoding for creating "mailable" files Archiving for creating smaller files It is quite common to combine these 2 processes. So if we want to send a set of files, first we create an archive containing all the files and then encode this archive. This hybrid product is sent via E-mail. The recipient first decodes the mail file into the archive file and then dearchives the archive into the original files. In this way we combine the advantages of compressing for reducing costs and of encoding to allow better transportability. APPENDIX A: CHARACTER TABLE Next is a list of all printable characters together with descriptions: Characters of the ASCII table blank ! exclamation mark " double quote # number sign $ dollar sign % percent sign & ampersand ' (closing) single quote ( left parenthesis ) right parenthesis * star + plus , comma - minus . period / slash digits 0123456789 : colon ; semicolon < less = equal > greater ? question mark @ at-sign uppercase letters ABCDEFGHIJKLMNOPQRSTUVWXYZ [ left bracket \ backslash ] right bracket ^ caret _ underscore ` left single quote lowercase letters abcdefghijklmnopqrstuvwxyz { left curly brace : vertical bar } right curly bracket ~ tilde ASCII 127 is nonprintable APPENDIX B: TECHNICAL DETAILS OF ENCODING AND DECODING The rest of the paper is very technical, so you should read it only if you have some knowledge of the mathematics underlying the functioning of computers. How do UUECODE and UUDECODE work? For UUENCODing, the bytes forming the file are grouped in groups of three. Every byte is an 8-bit binary number, so every group of three bytes is a 24-bit binary number. This number then is split into four groups of 6 bits each, i.e. into 4 6-bit binary numbers. The 6-bit binary numbers give all decimal numbers from 0 to 63. To every such 6-bit number 32 (decimal) is added, giving numbers in the range from 32 to 95. Every number then is replaced by the ASCII character associated with this value. (32 becomes (a blank), 33 becomes !,... 95 becomes _ (an underscore)). So the translation process converts each group of 3 bytes into 4 printable characters. Additionally every group of 45 bytes (giving 60 characters) is grouped into a line in the file to be sent. Then a leading character is added to this line. The leading character is calculated by using the encoding scheme we just discussed onto the number of bytes represented by the line. (45+32=77, so for a line representing 45 bytes the leading character is M (M is ASCII character 77)). Usually the last line is shorter and therefore the leading character of the last line also is different from M. Finally a first line containing "begin", a 3 digit number (giving access privileges on UNIX systems and meaningless on other systems) and the name of the original file and a last line containing the word "end" is added. The decoding program then mainly has to convert each group of 4 characters back into a group of three bytes (using the byte count given by the first character of each line for consistency checks). There are some problems with this scheme. We already discussed the possibility of special characters being scrambled. Additionally some "smart" mailing programs assume that trailing blanks always are unnecessary. Therefore they strip trailing blanks from every mail file. If it is only text you want to read you will not notice the difference. But an UUDECODing program will find out that the lines are too short (the first character of the line gives information about the line length!). There are different solutions for this problem. 1) Replace blanks by ` (the single opening quote having ASCII value 32+64=96) 2) Add an additional nonblank character at the end of each line 3) Make the decoding program smart enough to produce the missing blanks by itself. All the solutions are nonstandardized, so if you have some troubles when decoding you have to analyze them carefully. Solution number 2 usually works better than the two other solutions. So you should try to get an encoding program adding that additional character. Using an editor also makes it possible to transform the different "extended" formats of UUENCODEd files into one another. How do XXENCODE and XXDECODE work? XXENCODE uses the same splitting technique as the UU scheme (3 bytes into 4 6-digit binary numbers). Then every such number is converted into a character according to the following sequence: +-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz So (decimal) 0 becomes +, 1 becomes -, (number) 2 becomes (character) 0, .... 63 becomes z. The mechanism for adding byte counts to lines is identical to the UU scheme with the difference the the numbers again are coded according to the above sequence of letter, digits, + and -. So it even is possible to convert UUENCODEd files into XXENCODEd files using the replace feature of a text editor. ACKNOWLEDGEMENTS The author wishes to thank Ted Werntz whose comments and suggestions helped enourmously to improve the paper.