Search NIST's ICAT
Effective use of high performance computing (HPC) systems can be a daunting task. Users must deal with an array of constantly changing hardware fronted by diverse operating systems. The tools for similar tasks-for example, job queuing-can vary from system to system, even when the operating systems (UNIX variants) are nominally the same. One barrier to using HPC systems can be overcome by providing users with a single, easy-to-use interface that insulates them from direct contact with the operating systems, tools, and applications on the HPC systems. WebSubmit, a browser-based gateway to remote applications, is one way to do this. It provides a friendly, system-neutral environment in which trusted users can access application software on HPC systems.
Trusted is the key word here. The Web is used mainly to transmit documents and images; the introduction of Java has made it also a way to distribute client-side executables safely. Apart from this, programs can be run on remote Web server systems using the Common Gateway Interface (CGI). CGI programming tasks have been restricted to those that can be accomplished anonymously. They are executed as the server user and have only the limited privileges of this account. In most cases, this is appropriate; allowing random users to execute any command on the server system would be giving away the keys to the store. Still, it would be nice to be able to identify valid, trusted users and give them the same privileges they would get with a regular login.
The contribution of WebSubmit is that it provides a novel framework for establishing just this sort of trust relationship in a CGI environment. In this way, WebSubmit adds a telnet-like functionality to the ftp-like functionality of the existing Web. The client-side execution facility of Java is supplemented by a remote execution facility that can run user-owned jobs on existing, unmodified legacy systems, including HPC systems, the application discussed here. The familiar, pleasant user interface used in Web browsing is extended from document retrieval to remote execution of user programs using their own files.
The primary goal of the WebSubmit project is to provide users with seamless access to a collection of HPC resources. The ideal has been to create an environment in which, from the user's viewpoint, the distributed nature and heterogeneity of the resources disappear. The system is anticipated to have several impacts on the user community: HPC resources should be accessible to a wider class of users; customized execution environments should simplify and speed tasks; and users should be insulated from changes in operating or queuing systems associated with various HPC resources. WebSubmit is not intended to be a distributed computing system, although it is extensible in that direction. In this sense, the scope of WebSubmit is not as large as that of metacomputing projects that create and provide access to a distributed computer.
At present WebSubmit provides access to batch queues and to a range of interactive utilities including command execution, file editing, and file transfer on several HPC systems at the National Institute for Standards and Technology (NIST). The currently supported systems are an IBM SP2 running LoadLeveler, two SGI Origin 2000 systems running SGI's Network Queuing System (NQS), and a Linux-based Pentium array running the Load Sharing Facility (LSF). We hope it will be obvious that WebSubmit is not limited to these HPC systems or to HPC applications generally, or, indeed, to any particular kind of application. This bulletin provides a general discussion of WebSubmit and a detailed analysis of its security infrastructure.
An Overview of WebSubmit
The WebSubmit Transaction Model
The WebSubmit server is configured to interact with a specific group of one or more target systems (hereafter referred to as the WebSubmit cluster) specified by the WebSubmit administrator. For security reasons, a particular WebSubmit server can interact only with systems within its configured cluster.
The user uses a Web browser on the client system to obtain a secure connection with the WebSubmit server's master page, then follows a link on that page to the application module page for the task of interest. The module page is the user interface to a task. It is an HTML form that the user completes and submits to the WebSubmit server. Modules can be in generic format, requiring the user to specify the target system in the form, or in specific format, restricted to a particular target.
The WebSubmit server processes the submitted form, performing any target-side error checking of the input data, and executes the specified task on the proper target system. Execution may consist of submitting a job to the job queue on the target or of running a command script. Output from whichever is the case is then returned to the user's browser for viewing. If the task is a job queue submission, the output returned is that produced by the act of submitting the job, not the final output of the job itself. As we have formulated our interface, it is up to the user to keep track of the progress of the job and to retrieve the final output and direct it to subsequent jobs.
Primary Software Features
Authentication and Security
WebSubmit utilizes a combination of existing secure protocols to accomplish authentication and to allow users to execute commands on remote systems. The basic transaction begins when the client requests access to remote resources with their browser. The client provides authentication to the server, and the server then propagates this authentication to connect to the remote resource. This authentication process can be broken into three stages: client-to-server authentication, identity establishment and authentication translation, and server-to-remote execution of client requests.
The client provides authentication to the browser once at the outset of a session (usually by giving a password for a local certificate database). This single authentication then offers access to any one of the remote resources on which the client has privileges. This is in distinct contrast to the standard model of login-password authentication with applications such as telnet, ftp, and rlogin. In these systems, a login and password are normally presented for each resource accessed. In the present scheme, the server is responsible for establishing the client's identity, translating this identity into a login name on the remote system, and then executing the client's request. A single password usually suffices to access all systems with a security superior to login-password authentication. At present, it does not appear that any other systems use this novel form of authentication.
Client authentication based on public-key cryptography can be implemented using a Web server that implements the Secure Sockets Layer (SSL) protocol. This protocol allows for strong authentication (superior to traditional methods) and also provides data encryption over the duration of the session. It has become the de facto standard for secure communication on the Internet and is in the process of being upgraded to an Internet standard (TLS - Transport Layer Security). Finally, all recent versions of the two dominant Web browsers support SSL. Based on these facts, SSL-based client authentication was chosen to perform the Client-to-Server stage of the authentication process.
SSL and Digital Certificates
Digital certificates are basically containers for public keys, and they act as a means of electronic identification. The certificate and public key are public documents that, in principle, anyone can possess. An associated private key, possessed only by the entity to whom the certificate was issued, is used as a means of binding the certificate to that entity. Users not in possession of this private key cannot use the certificate as a means of authentication. Entities can prove their possession of the private key by digitally signing known data or by demonstrating knowledge of a secret exchanged using public-key cryptographic methods.
In practice, anyone can generate public-private key pairs and digital certificates; consequently, it is necessary to determine whether the holder of a certificate is to be trusted. History has demonstrated that trusting clients is often ill advised, and centralizing trust simplifies matters greatly. Hence, a trusted-third-party model is utilized with digital certificates. The trusted third party used in the realm of digital certificates is a Certificate Authority (CA). A CA can either issue certificates using public keys provided by clients, or it can generate a public-private key pair for the client and then issue the certificate along with the key pair. In either case, the client must demonstrate their identity to the CA by some trusted means. For example, the client could arrange a face-to-face meeting with the CA and present proof of identity. The CA can then issue a certificate with its digital signature that contains this client's public key, as well as information about the identity of the client. This digital signature can be verified by people who have the public key of the CA, thus establishing the chain of trust from client to CA to server.
Establishing a Secure
and Authentication Translation
The certificate contains information about the client's identity (e.g., name, organization, e-mail), but this information may not necessarily be unique. One would like to construct a userID that is based not only on this information, but also on the public key of the client. One simple solution is to require clients to possess specially formatted certificates that contain information about their userID on the system. This does not correlate the userID and public-key, however, and creates logistical difficulties with issuing certificates in the required format. The entire certificate itself cannot be used, since this would be cumbersome, but an alternative is to construct a fingerprint (message digest) unique to a given certificate. Fortunately, cryptographers and mathematicians have devised and analyzed one-way (or hash) functions that accomplish precisely this task.
Message digests are used widely in cryptography to verify digital signatures and to ensure data integrity. A hash function is a many-to-one function that takes an arbitrary-length input message and constructs a fixed-length output digest or hash h = H(M). In the present context, a unique userID is determined by constructing the hash of the client's certificate using a trusted algorithm (SHA-1 or MD5, for example). In order for the userID to be unique, one must have reasonable certainty that another client's certificate will not hash to the same value. This requirement is satisfied as long as the hash function is sufficiently collision-resistant. In order to determine the userID in a Web environment, code on the server must have access to the client's certificate. This can be accomplished by directing the Web server to place the client's certificate in the environment when needed. Server software constructs a hash of the certificate, at which point the hash (userID) can be used for authentication translation.
When a registered client makes a request to access a remote system, the user's active status is first verified. If they are not active within the system, they are not allowed access to resources. This essentially amounts to the possibility for revocation of access privileges, in addition to those provided by the client certificate's validity period and any CA revocation lists in use. Once the user's active status is verified, the userID-remote host combination is used to index into the database, which determines the login of the user on the remote system. At this point, the request can be propagated to the remote system by the server software.
SSH has grown in popularity since its introduction and is being considered for an Internet standard. The software has been ported to a wide variety of UNIX platforms; both commercial and non-commercial versions are available. SSH has several features that make it attractive in the present context: strong authentication methods prevent identity spoofing, Trojan horses, and similar means of attack; encryption and compression of data; and secure means for file transfer. These qualities precisely meet the needs of the problem being addressed, hence SSH was chosen as the means to execute commands on the remote system. SSH uses a hybrid cryptosystem similar to SSL; a shared secret is exchanged using public-key cryptography, and then data is encrypted using a symmetric cipher based on the shared secret. Server authentication is performed using public key cryptographic methods, whereas several possibilities are provided for client authentication. In the present approach, secure host-based authentication (called RhostsRSAAuthentication) is used, since this allows the Web server proxy to execute commands on the remote systems as the user, without the need for password exchanges.
Remote System Policy
WebSubmit is a flexible, modular framework for accessing and using remote computing resources across the Web. Though it has been developed at NIST for use as an interface to high performance computing systems, it is certainly not limited to this field of endeavor. The system impacts the user community by making resources more accessible, simplifying and speeding task execution, and insulating users from changes in the way remote resources are controlled. WebSubmit should be useful in any circumstance where a user community needs authenticated individual access to applications on remote systems (assuming a certification authority is available). It is designed to be portable and can be installed at most sites with a minimum of effort. It can support an existing body of CGI code, as well as providing a framework for developing new applications. The security framework implemented in WebSubmit is novel and robust; it provides both strong user authentication and data encryption, although it produces some policy issues that may need to be addressed before it can be adopted. In summary, WebSubmit extends the basic conception of the Web as a data archive and retrieval system to one of a general computing environment.
For More Information