Networking Working Group Wengyik Yeong INTERNET-DRAFT Performance Systems International November 1991 Representing Public Archives in the Directory Status of this Memo This draft document will be submitted to the RFC Editor as an informational document. Distribution of this memo is unlimited. Please send comments to the author, or the discussion group . Abstract The proliferation of publicly accessible archives in the Internet has created an ever-widening gap between the fact of the existence of such archives, and knowledge about the existence and contents of these archives in the user community. Related to this problem is the problem of also providing users with the necessary information on the mechanisms available to retrieve such archives. In order for the Internet user community to better avail themselves of this class of resources, there is a need for these gaps in knowledge to be bridged. Given the richness of its existing information framework, it is interesting to experiment with the use of the Directory as a solution to these information dissemination problems. This document specifies the schema necessary to provide the infrastructure for such experimental efforts. Note that it is the purpose of this schema to provide for the representation of publicly available archives of the most generic nature in the Directory. It is explicitly beyond the scope of this document to provide the necessary support for the representation of specialized archives with more than a minimum of structuring information. The representation of specialized archives such as document repositories or mail archives is left to other schema specifications which will emerge as the problems of information access are better understood. Information Model At the highest level, the information model underlying this document envisions that a user will use the Directory to identify individual archives of interest, but that retrieval services based on non-X.500 based technologies will provide the mechanism for actually retrieving Yeong [Page 1] Public Archives November 1991 archives. As such, the role of the Directory is envisioned to be - to provide the information necessary for a user to successfully identify archives of interest; - to identify the location of the retrieval service to which access is required to retrieve archives of interest; - to identify the mechanisms necessary to successfully retrieve archives of interest. For example, in this model, a user wishing to obtain a given software package would consult the Directory to locate archive retrieval ser- vices providing access to the software in question and to discover that access to the archives is provided through ftp by some archive retrieval service. The user would then retrieve the desired software through the archive service by way of anonymous ftp. It is important to note that in this model, an archive service is an abstract entity that provides access to archives, and is not bound to any particular technology base. As such, an archive service may at the option of its operators provide access to archives by means of multiple protocols. In modeling the existence of archive services and archives, it is necessary to strike a compromise between allowing for the distribu- tion of information in the Directory, and the desire to provide an abstraction of an archive as an object with no extraneous structuring information. To this end, the approach is taken that archives and related archive services are represented as a two-level hierarchy, with archive services representing the first level of the hierarchy, and the archives themselves occurring as objects below the archive service that provides access to them. In this way, each provider of publicly accessible archives may at their option provide and manage their own information, while the abstraction of the archive as a sin- gle object is preserved. Representation of Archive Retrieval Services Archive Retrieval Services are represented as objects of class appli- cationProcess, with RDNs formed from a commonName attribute with value naming the Internet host from which the service is provided. For example, the anonymous ftp service on the Internet host uu.psi.com should have the RDN "commonName=uu.psi.com". Yeong [Page 2] Public Archives November 1991 Representation of Archives This schema represents archives as objects of the class publicAr- chive, as defined below. For the purposes of representation, each file to which an archive retrieval service provides access is con- sidered an archive. Consistent with the goal of representing archives in a generic fashion, a least common denominator approach is taken, with only a minimum of information felt to be common across many archives being selected for inclusion as attributes of an archive object. These attributes are - An Archive Identifier to provide a unique identification of the archive. The Archive Identifier has case-sensitive string values containing the complete specification of an archive file, including any device or path information necessary to uniquely identify the archive file to the archive retrieval service. - An Archive Path used to identify the archive for searching pur- poses. The Archive Path is similar to the Archive Identifier, differing only in that it has case-insensitive string values. It is present only to facilite case-insensitive searching. - An Archive Name that names an archive. The Archive Name takes on case-insensitive string values that need not be unique. - An Archive Location that provides location information for the archive, as well as the access method required to obtain the archive from the archive retrieval service. The values taken on by the Archive Location attribute have a complex syntax. - An Archive Size to denote the size of the archive file. Archive Size values are expressed as integers. - An Archive Publisher representing the person(s) or organization(s) that made the archive available. Archie Pub- lisher attributes take on case-insensitive string values. - Archive Version, Obsoletes/Obsoleted-By, Updates/Updated-By and See-Also attributes to relate an archive to other archives for revision control and other purposes. The Archive Version attri- bute takes on a case-insensitive string value, while the other attributes all take on Distinguished Name (the DNs of the related archives) values. - A Description to provide any additional information to users. The Description attribute takes on case-insensitive string Yeong [Page 3] Public Archives November 1991 values, and contains any additional information that the archive service provider may see fit to provide. - Keywords and Subject attributes that have their usual semantics, and are provided to facilitate searching. Both attributes take on case-insensitive string values. Examples Suppose that a site, "uu.psi.com" ran an archive retrieval service by means of anonymous ftp, and that the contents of the anonymous ftp hierarchy consisted of the files README ls-lR and the file "bind.tar.Z" in the directory "pub" pub/bind.tar.Z Then, assuming that: the DSA providing access to information on the "uu.psi.com" archive service was based on QUIPU software; and the DSA was named "Horned Fish" and occurred in the global DIT immediately subordinate to c=US; then, if the QUIPU notational conventions are used, the entry for "uu.psi.com" could look like cn=uu.psi.com objectClass= applicationProcess & quipuNonLeafObject & quipuOb- ject accessControlList= others # read # child accessControlList= others # read # entry accessControlList= others # read # default masterDSA= c=US@cn=Horned Fish while the entries for the three files that the "uu.psi.com" archive service provides access to would respectively look like archiveIdentifier= ./README objectClass= quipuObject & publicArchive archiveName= README archivePath= ./README archiveSize= 145 archiveVersion= Oct 21 1991 archiveLocation= ftp $ uu.psi.com $ . $ README archivePublisher= PSI Inc. accessControlList= others # read # default accessControlList= others # read # entry Yeong [Page 4] Public Archives November 1991 archiveIdentifier= ./ls-lR objectClass= quipuObject & publicArchive archiveName= ls-lR archivePath= ./ls-lR archiveSize= 23056 archiveVersion= Nov 18 1991 archiveLocation= ftp $ uu.psi.com $ . $ ls-lR archivePublisher= PSI Inc. accessControlList= others # read # default accessControlList= others # read # entry archiveIdentifier= pub/bind.tar.Z objectClass= quipuObject & publicArchive archiveName= bind.tar.Z archivePath= pub/bind.tar.Z archiveSize= 447001 archiveVersion= Oct 3 1990 archiveLocation= ftp $ uu.psi.com $ . $ ls-lR accessControlList= others # read # default accessControlList= others # read # entry Yeong [Page 5] Public Archives November 1991 Appendix: Definitions The following are the definitions of the Public Archive object, and those of its attributes not already defined elsewhere. publicArchive OBJECT-CLASS SUBCLASS OF top MUST CONTAIN { archiveIdentifier, archiveName, archiveLocation } MAY CONTAIN { archivePath, archiveSize, archivePublisher, archiveVersion, description, seeAlso, obsoletesArchive, obsoletedByArchive, updatesArchive, updatedByArchive, keywords, subject } archiveIdentifier ATTRIBUTE WITH ATTRIBUTE SYNTAX caseExactStringSyntax archiveName ATTRIBUTE WITH ATTRIBUTE SYNTAX caseIgnoreStringSyntax archiveLocation ATTRIBUTE WITH ATTRIBUTE SYNTAX documentStoreSyntax archivePath ATTRIBUTE WITH ATTRIBUTE SYNTAX caseIgnoreStringSyntax archiveSize ATTRIBUTE WITH ATTRIBUTE SYNTAX integerSyntax archivePublisher ATTRIBUTE WITH ATTRIBUTE SYNTAX caseIgnoreStringSyntax archiveVersion ATTRIBUTE Yeong [Page 6] Public Archives November 1991 WITH ATTRIBUTE SYNTAX caseIgnoreStringSyntax obsoletesArchive ATTRIBUTE WITH ATTRIBUTE SYNTAX distinguishedNameSyntax obsoletedByArchive ATTRIBUTE WITH ATTRIBUTE SYNTAX distinguishedNameSyntax updatesArchive ATTRIBUTE WITH ATTRIBUTE SYNTAX distinguishedNameSyntax obsoletedByArchive ATTRIBUTE WITH ATTRIBUTE SYNTAX distinguishedNameSyntax Yeong [Page 7]