Sun RPC is an important implementation of a Remote Procedure Call mechanism, because it is included in almost all UNIX versions. This paper uses the construction of a sample Sun RPC program to examine issues of transparency and performance in Sun RPC.
In this paper I shall provide an assessment of Sun RPC in terms of how well it provides a transparent remote procedure call mechanism, and in how much use of Sun RPC affects performance as opposed to using lower level network programming primitives.
Having written a random number generating function and a program that uses it, I altered the program so that it would access the random number function over the network. Two such versions of the server and client were produced, one that uses Sun RPC, and another that uses the BSD socket interface to TCP (Stevens, 1990). Thus a comparison can be made in terms of transparency in the programmer's interface (going from local procedure call to RPC), and ease of programming (RPC versus socket programming).
The source code for these examples can be obtained from auug.org.au's ftp site.
At its simplest, XDR described issues such as byte ordering for integers [1] and the format for floating point numbers.[2] Character strings are sent with an ASCII encoding.[3] The standard goes on to describe how arrays, variable length arrays, structures and unions can be encoded. Although there is no direct support for pointers (as noted in Tanenbaum (1992:425) modern pointers only really make sense in the address space of the caller) XDR does support encoding for the recursive data structures they are often used to construct, such as linked lists or trees. This uses the "Optional-data" construct, and the syntax for describing it looks like the C syntax for pointers. This construct, however, does not support data structures of arbitrary complexity, for example it is not trivial to encode doubly linked lists.
XDR is an implicit encoding, that is, the encoded byte stream contains no explicit data typing information. In the context of remote procedure calls this makes sense, as the lack of type information reduces the overhead of the protocol, and the procedure caller and the procedure being called should agree on the number and type of arguments, so that there is no ambiguity in how to the byte stream should be interpreted.
RPC (Sun, 1988) extends the XDR protocol and language by adding messages that represent procedure calls and replies. Individual procedures are identified by three numbers: the program number, used to identify a service; the version number, so that changes to the interface for a particular service can be recognised; and a procedure number identifying the required procedure in that service.
The RPC language allows a programmer to define the interface to the service offered by assigning program, version and procedure numbers and describing the arguments and results for each procedure.
The RPC protocol provides messages for procedure calls and server replies, and includes the option of an authentication mechanism. There are fours authentication mechanisms defined in the protocol standard (Null, UNIX, Short and DES), but the protocol is open ended so other mechanisms can be defined.
RPC can be implemented over any transport layer protocol, but the lower protocol may affect the semantics of RPC, for example by imposing a maximum message size.
In my example, the RPC description of the random number service is shown in the file random.x.
It is instructional for the programmer to examine the generated header file, as the data structures that rpcgen generates from the XDR description may not be exactly what the programmer expected. For example, look at random.h.
A standard piece of binding software supplied with Sun RPC is the portmapper. The portmapper is itself an RPC service (program number 100000, version 2) that listens on a well known port (port 111 for TCP and UDP). When RPC server programs start, they register their existence with the portmapper. Client programs tell the portmapper what program number, version and protocol (TCP or UDP) they want, and the portmapper replies with the port number that the server is listening on. The client can then use the protocol specified to contact that port.
On the server side, it was sufficient to wrap the service procedure with another function to interface it the server stub generated by rpcgen. This wrapper is shown in rpc_server_wrap.c. On the client side, however, some more significant changes were required because RPC needs some initialisation before the service can be called. These changes are highlighted with "#ifdef RPC_CLIENT" in rpc_client.c.
It should really be the system's responsibility to ensure unique numbers are assigned, if numbers are used at all. In the ANSA system for example, a service registers its properties with a trader, and clients look for servers with the required properties.
If UDP is chosen, then total size of arguments and the total size of results must each be less than the maximum UDP packet size, which in most implementations is 8192 bytes. TCP has no size restriction on parameters or results.
The RPC library supports a broadcast procedure call, so any server on the local network can reply, but only for RPC over UDP.
Of course, remote procedures cannot have side effects - they cannot access global variables.
Note that the representation chosen in XDR mainly suits Sun machines, so that XDR is inefficient when used to transmit data between two machines that, for example, both use little endian integers.
In Sun RPC this is accomplished by calling the portmapper service. The RPC library contains functions that will do a lookup on the portmapper when the client is initialised, so the client can then send its RPC requests to the appropriate TCP or UDP port.
Note, however, that the caller must specify which machine it wants to connect to to find the service. If it is not known in advance which machine the service will be on, then the programmer must include code to send a broadcast RPC to the portmapper on all machines.[5]
If TCP is used as the transport protocol, then the client will get an error return if the server breaks its connection (crashes). The client has no way of knowing if the crash occurred before or after the request was executed. The TCP protocol handles time-outs and retransmissions.
If the UDP protocol is used, then the client stub automatically provides timeout and retransmission (the client programmer can set the value of the timeout). The RPC protocol includes a transaction identifier, and the server stub can be instructed to remember responses sent and to send a cached response again if it receives a request it has seen before. This implements an at-most-once semantics in place of the default at-least-once semantics.
The SunRPC implementation is not resilient to failures of the portmapper. If the portmapper fails, new servers will not be able to register themselves, and clients will not be able to find running servers, so all RPC's will time out unsatisfied. In practice, even if the portmapper was restarted, all running servers would have to be restarted so they could register themselves with the new portmapper, so the only way to recover properly is to reboot the system.
To assure clients and server that they are talking to authorised servers and clients, Sun RPC includes an authentication mechanism. The client sends its credentials and a verifier to the server with its RPC call, the server returns its own verifier with the results. The standard authentication methods provided by the library are Null, UNIX, Short[7] and DES, but it is easy to add new methods.
Using the authentication mechanisms is not transparent; it requires some extra programming on the client and server sides.
In addition, the poertmapper introduces security risks as it can be used, in certain circumstances, to spoof the origin of a request (Cheswick, 1994:35). It is recommended that firewalls should block packets sent to the portmapper.
The tests were run at night, so both machines and the network had no other significant load. This helps to reduce the variance between runs. Each run of the program requested 1,048,576 random numbers in total, meaning 4 Mbytes of data was transferred between the server and the client. Each program was run ten times, with requests for blocks of 32, 128, 512, 1024 and 4096 bytes of random numbers to be sent, and the execution time for each of these was averaged. The results of this experiment appear in table 1 below. By noting that in all cases 4 megabytes of useful data were transferred, we can work out effective transfer rates in kilobytes per second, these are shown in table 2.
These results are illustrated graphically in figures 2. Figure 3 compares the transfer rates for tcp and Sun RPC.
As can be seen from figure 3, using Sun RPC gives approximately a 50% penalty in data transfer rates. This runs contrary to expectation, as the RPC and XDR protocols should not introduce that much extra traffic, and there should not be excessive amounts of processing on either side to convert formats, so I would expect the transfer rates for TCP and RPC to converge for large message sizes.
An examination of the RPC implementation is required to explain this, and this goes beyond the time I can devote to this paper. However, profiling suggests that the reason for the extra time taken by the RPC version is due to the number of calls made on XDR routines to encode and decode the integers sent.
Due to the performance penalty, Sun RPC should not be used in applications that require large amounts of data transfer between clients and servers.
[2] IEEE Standard 754-1985, another coincidence?
[3] You get the message. :-)
[4] Shared libraries are the exception to this. The functions in the shared library are bound at run time. However, you can trust the contents of a shared library as much as you trust the binary you are running, as they are just as easy to replace. You may not have control over the remote server that your RPC's bind to.
[5] Although the portmapper does have a procedure to automatically handle broadcast RPC (it looks up the server the RPC is destined for and sends it to that server itself if the server is registered, and drops the request if it is not) this only works for RPC over UDP. In the general case, the programmer must do a broadcast RPC to find which machine can provide the service, then initialise the client to talk to that machine.
[6] In fact, many classic attacks on networks of Sun systems relied on providing a false RPC service.
[7] Short is a shorthand version of UNIX, used as short cut when the client has already identified itself using UNIX authentication.