Draft    FYI on an Internet Trouble Ticket Tracking System      Nov 1991

User Connectivity Problems Working Group                       M. Mathis
Internet Draft                                                       PSC
                                                                 D. Long
                                                                     BBN

                          November 11, 1991

          FYI on an Internet Trouble Ticket Tracking System
                          for addressing
                  Internet User Connectivity Problems


Status of this Memo

    This Internet Draf FYI describes a possible approach to improving the
    usability of the Internet.  It is being distributed to members of
    the Internet community in order to solicit their reactions to the
    proposals contained herein.  The proposed paradigm is not intended
    as a standard for the Internet.  Rather, it is hoped that a general
    consensus will emerge as to the appropriate solution to this
    problem, perhaps leading to the formulation and adoption of
    standards.

    This memo does not specify a standard.

    Distribution of this memo is unlimited.

Author's Address

    This paper introduces a concept by Matt Mathis and the members of
    the IETF User Connectivity Problems Working Group.  Please send
    correspondence to ucp@nic.near.net.  This list may be subscribed to
    by sending a request to ucp-request@nic.near.net.  Archives are
    located in mail-archives/ucp on nic.near.net.

Security Considerations

    This RFC raises no security issues, however further refinements of
    the proposed model will need to address security requirements.

Abstract

    Users having trouble with the Internet are directed to contact their
    designated Network Service Center.  The Network Service Center
    creates a Trouble Ticket which is registered with the Ticket
    Tracking System.  The ticket is an agreement to obtain closure with
    the user.  Network Service Centers can fix problems, track the work
    of others, or transfer responsibility for the ticket to other
    Network Service Centers using a formal hand-off procedure.  Ticket
    hand-offs are coordinated by the Ticket Tracking System and ticket
    progress is monitored by the Ticket Support Centers.  User

User Connectivity Problems Working Group                        [Page 1]

Draft    FYI on an Internet Trouble Ticket Tracking System      Nov 1991

    complaints with the problem resolution process may be lodged with a
    Ticket Support Center, which will act on behalf of the user in
    resolving the problem.

Preface

    In this document formal rules and assertions are left justified
    while commentary and conventions are indented.  The formal rules
    state the requirements for tracking trouble reports.

        The commentary describes how we expect the rules to be invoked.
	The scheme that we are describing here is somewhat like the game
	of bridge: the "rules" are very simple, but the "conventions"
	are self adjusting and very complex.  The commentary in this
	document is intended to seed conventions but not to mandate
	their future.

Network Service Centers (NSCs)

    Network Service Centers are the principle agents of problem
    resolution.  They generate, hold and close tickets, perform
    diagnostics, make repairs, etc.  NSCs are self defined by agreeing
    to adhere to the rules described in this document.

        The NSCs register their agreement to comply with these
	guidelines in order to prevent an NSC from accidentally
	transferring a ticket to a NOC which is not participating, who
	in turn may fail to properly transfer or close a ticket.  The
	only requirement for registration as an NSC is that the
	organization agrees to honor the rules for handling tickets.

        In most cases NSCs are existing NOCs, however other agents who
	are not customarily viewed as NOCs could be NSCs.  Examples
	include the operator of any Internet resource, or the network
	software support group of a computer hardware or software
	vendor.

        It is expected that almost all regional NOCs and some of the
	larger campus NOCs will become NSCs.  NSCs which find themselves
	chronically acting on behalf of some non-registered NOC will
	encourage it to become registered.  Thus there is a built-in
	pressure to help existing NOCs to become registered.

    The current list of NSCs must be available online in an ASCII
    database (an NSC phone book).  Besides listing contact information,
    it should list responsibilities and areas of expertise.






User Connectivity Problems Working Group                        [Page 2]

Draft    FYI on an Internet Trouble Ticket Tracking System      Nov 1991

        By responsibilities, we mean specific components of the Internet
        infrastructure which a particular NSC has direct responsibility
	for maintaining.  Examples include: Nets, Name servers, and ASes
	belonging to a particular organization with an NSC.  Also, as
	separate items, "connectivity to" nets or ASes which are
	nominally transit.

        By "areas of expertise", we mean areas where an NSC has
	significant technical background and may be able to provide
	general help diagnosing problems.  For example, an NSC which
	runs a large nameserver may be willing to help other NSCs
	diagnose nameserver problems.

        If the phone book covers a wider scope than this procedure (for
        example a complete listing of contacts for all networks, domains
	and Administrative Systems for the entire Internet) then each
	entry should be tagged to indicate if the contact or NOC has
	agreed to honor these procedures.

Tickets

    A Ticket is a commitment to obtain closure with a user.  Tickets are
    created when a user reports a problem to an NSC.  Tickets are closed
    when the user is informed of the resolution of the problem.  Only a
    registered NSC can hold a ticket.  An NSC must never refer one of its
    users to another NSC.  An NSC may refer other users to another NSC but
    must include a pointer to a Ticket Support Center as well.

        THIS IS THE CENTRAL POINT: A Ticket is a commitment to obtain
	closure with a user.  These tickets are not intended to track
	problems.  They are to assure that user complaints are never
	lost.  This intent does not preclude their use for other
	purposes.

        These tickets may not be suited to an organization's own
	internal practices.  Most existing ticket systems track
	problems, not complaints.  The scheme described in this document
	does not address all issues required to reliably track all
	problems.  These potential shortcomings might be addressed in
	either of two ways: Implement a system which is a superset of
	the functions described here, or an entirely different problem
	tracking system with cross references to these tickets.

        Tickets are not commitments to fix problems.  An NSC can choose
	to hold a ticket, monitor someone else (e.g. another NOC)
	making a repair, and then contact the user itself to be sure
	that the user is satisfied.  Or, it could transfer the ticket
	(see below) to the NSC responsible for the repair who would then
	contact the user.



User Connectivity Problems Working Group                        [Page 3]

Draft    FYI on an Internet Trouble Ticket Tracking System      Nov 1991

        By "User", we mean anyone who is not a registered NSC.  Users
	can be true end users, host administrators, campus net
	administrators, or unregistered NOCs.  

        If the person who reports a problem is a non-NSC site
	administrator acting on behalf of a true end user it is
	desirable to get contact information for both the site
	administrator and the end user.  If there is any evidence that
	the site administrator is not adequately passing information to
	the end user, the NSCs should contact the end user directly.
	Also, there is opportunity to recruit the site administrator as
	an NSC.

	Each NSC must publish a description of its user community in the
	NSC Phonebook.  For example, a regional network's user community
	might be "official technical contacts at regional member sites
	and anyone outside the network having difficulty reaching the
	regional network".  A backbone provider's user community might
	be "official technical contacts at one of the member networks
	served."

	If someone contacts an NSC who is not in that NSC's designated
	user community, then the NSC must consult the NSC Phonebook and
	refer them to their appropriate NSC.  They must also tell the
	user how to contact the Ticket Support Center, which will help
	resolve any confusion about which NSC the user should be
	contacting.  This policy of mandatory-redirect may expose
	well-known NSC's to a large number of calls.  Practically
	speaking, it may become necessary to refer those users only to
	the Ticket Support Center which can then keep a tally of which
	users are calling the wrong NSC.  If the Ticket Support Center
	were to produce a monthy report of redirects, NSCs might be
	encouraged to improve their efforts at end-user education.

        An NSC may choose not to open a ticket for some classes of
	"simple" reports in which closure with the user is obtained
	"immediately".  However, there are several risks to such
	policies.  If the user has complaints about the NSC's
	performance, the NSC will have no documentation for its defense.
	In general, it is best for NSCs to err on the side of entering
	tickets for insignificant events.

        An NSC may act as a user and open a ticket either with itself or
        another NSC.








User Connectivity Problems Working Group                        [Page 4]

Draft    FYI on an Internet Trouble Ticket Tracking System      Nov 1991

Ticket Tracking System

    There is a Ticket Tracking System responsible for the mechanics of
    tracking tickets.  The TTS should nominally be fully automated,
    being accessed by the various NSCs via the network.  It should also
    support limited telephone queries from NSCs to confirm the status of
    particular tickets.  The TTS is also responsible for archiving
    completed tickets.

        The TTS must always have a recent copy of the ticket, which NSC
	is holding it, and various statuses.  For the initial
	implementation the primary channel to the TTS should be via SMTP
	mail.  Queries and updates are posted as mail messages, with a
	ticket number and function as either the subject or first line
	of the body.  All functions which change tickets are implicitly
	appends: no portion of a ticket is ever deleted.

        At some point this should be migrated to privacy-enhanced mail.

        The TTS, in conjunction with SMTP mail, are really a substitute
	for a distributed database with a public interface.  When a real
	distributed database becomes publicly available (meaning runs on
	enough platforms at a low enough price not to exclude any NSCs)
	we should be prepared to migrate to it.

        The detailed requirements for the TTS belong in a future RFC.

    There is a formal mechanism for passing tickets between NSCs.  This
    mechanism must be designed such that tickets cannot be lost.

        A possible procedure to pass a ticket from R1 to R2 might be as
        follows:

        1) R1 first inquires (out-of-band) if R2 is willing to accept
	   the ticket.

        2) If R2 is unwilling, R1 must continue effort on the ticket,
	   either to find a willing NSC or to repair the problem itself.

        3) If R2 is willing to accept the ticket, R1 sends a message to
	   the TTS with any final remarks, notifying the TTS of its
	   intent to transfer the ticket to R2.

        4) R2 sends a request to the TTS notifying it of its intent to
	   accept the ticket from R1.

        5) The TTS sends conformation notices to both.  In the notice to
	   R2 it includes the entire current content of the official
	   ticket.

User Connectivity Problems Working Group                        [Page 5]

Draft    FYI on an Internet Trouble Ticket Tracking System      Nov 1991

        6) R2 informs the user that the Ticket has been transferred and
           provides any updates.  (This is required.)

        7) R1 optionally contacts the user to reassure him that the
	   problem is being work on.  This is particularly useful if the
	   initial NSC contacted is not the "closest" NSC, in order to
	   encourage a direct query to the closer NSC in the future.

        The out-of-band inquiry in step 1 is the most important part of
	the process.  By "out-of-band," we mean not prescribed by this
	process.  Many problems may be resolved at this point without
	transferring the ticket.  It is likely that R2 is either already
	working on the problem or can fix it on the spot.  In these
	cases, R1 should confirm the repair and contact the user to
	close the ticket.

        It is entirely acceptable for R2 to suggest that some other NSC
	is more appropriate to deal with the problem.  NSCs may refer
	NSCs to other NSCs.

        There are some potential race/misbehaviors, particularly since
	the SMTP delivery can not be assumed to be 100% robust.  However
	there are strong timer based and out-of-band checks possible (R1
	asks R2, "Did you receive confirmation of the transfer?"; R2
	asks the TTS, "Who holds the ticket?".  TTS contacts R1 and/or
	R2 if it receives one of #3 or #4 without the other in the
	prescribed time period).

        Timers can be associated with all of the above states such that
	the TTS can detect protocol botches.

        The ticket transfer procedure must be described in complete
	detail in a future RFC.

Ticket Support Centers

    There is a small set of Ticket Support Centers, to deal with problem
    tickets.  The TSCs are responsible for monitoring the quality of the
    NSCs and the ticket handling procedures.  There are three separate
    functions:  expediting tickets which are not making adequate
    progress (as detected by the TTS timers), arbitrating between NSCs,
    and acting as a user ombudsman.

        The problems covered here represent potential failures of the
        ticketing mechanism itself.  They do not represent normal
	escalation of tickets within the system.  See comments below
	about ticket flow.

        The three functions are really independent and could co-reside
	with some NSCs, such as the NSCs for the backbone networks.


User Connectivity Problems Working Group                        [Page 6]

Draft    FYI on an Internet Trouble Ticket Tracking System      Nov 1991

        Tickets which remain in any state for extended periods of time
	without transactions are likely to be broken.  Some states, such
	as during handoff between NSCs are always transient, and any
	persistence in these states is probably a handoff failure.
	Other states such as "we are working on it" should include
	progress reports or else they are suspect.  We want to be able
	to track chronic minor problems so it may not be unusual for
	some tickets to stay open for a long time.

        If an NSC realizes that some problem is beyond its abilities and
	can not find another NSC to take the ticket, or the other NSCs
	refuse the ticket then TSC can be invoked to find someone who
	will accept the ticket.  In all cases where a ticket is refused
	under controversy, statements from all parties should be
	included in the ticket.  (See meta dialogue below).

        The ombudsman should have a widely published phone number for
	users who feel that they have not received appropriate service
	from an NSC.  The ombudsman should not act as a first point of
	contact.  Any comments by the ombudsman should be recorded in
	the ticket.

    The details of the structure of the ticket flow between NSCs is
    outside the formal scope of this document.  The NSCs are all peers
    in the sense that any NSC can transfer any ticket to any other NSC
    willing to accept it.

        The normal escalation of tickets happens implicitly as the
	result of rules for transferring tickets between centers.
	Problems with Internet connectivity will naturally follow the
	hierarchy of the Internet and the contractual agreements at
	interconnections between ASes.  Problems with the Domain Name
	System will naturally follow its hierarchy.  Interoperability
	problems will naturally flow horizontally between NSCs near the
	end systems, and perhaps to NSCs operated by the end systems' 
	vendors.

        We claim that the above ad-hoc structure is already in place and
        functioning very well for the majority of (simple) problems.
	The formalism of ticketing is needed to prevent malformed or
	difficult problem reports from vanishing without being
	adequately addressed.

        The phone book is critical to optimizing this process.  A
	problem which seems to be circa some resource is most likely
	best dealt with by the NSC nearest that resource, or they are
	likely to have a clearer picture which other NSC should be
	involved (Since NSCs are allowed to redirect other NSCs).

        Since these flows are self organizing, they will adapt to new
	services and as the Internet evolves.

User Connectivity Problems Working Group                        [Page 7]

Draft    FYI on an Internet Trouble Ticket Tracking System      Nov 1991

        The only externally applied pressure to influence the ticket
	flow is that users are encouraged and, if necessary,
	redirected to contact the "nearest" NSC.  (See below)

    The body of a ticket is anticipated to be a chronological series of
    entries.  These entries can be characterized as falling into one of
    four different kinds of stream of thought or conversation, called
    dialogues.  A ticket has an abstract, which is excerpted from the
    dialogues.  The Ticket is "append only".  The only content to a
    ticket which is not part of a dialogue is the identifier of the
    ticket itself.  The dialogues are "user", "operations",
    "engineering" and "meta".

        We will not be proposing the details of ticket format or
        representation except that a ticket has no content outside of
	the dialogues.  The abstract is a summary of the current state
	and history of a ticket.  Data which might ordinarily be
	considered "header information" is included in the abstract.
	The abstract must be algorithmicly generatable from the
	dialogues.  (i.e.  it need not be stored separately, and can not
	be edited directly).  In the simplest case an abstract entry is
	the last of a particular entry in a dialog.  For example, the
	NSC which holds the ticket will always be named by the last
	entry transferring the ticket.  For efficiency reasons the
        abstract may be stored and not recomputed, but then the entire
	history of the abstract will be visible in the Ticket.

        There must be detailed specification for the format of a ticket
	in a future document.  The simpler the constraint on the format
	the easier for many diverse NSCs to use, but the harder it will
	be to do meaningful post-processing (collect statistics on
	failures, etc).  The format should be extensible, so more
	structured detail can be added later.  The initial format should
	be as simple as reasonably possible.  Later on, as we have a
	better understanding of what post-processing is needed, it
	should become more structured.

        Each dialogue has specific participants, audience and
	objectives.  One or more of the dialogues may be empty in a
	specific ticket.  It is anticipated that some of these
	dialogues, notably Engineering and Meta, will be added after the
	ticket is closed with the user.

    The user dialogue relates all conversations with the user who
    reported the problem, including his contact information, initial
    description of the symptoms, record of additional reports about
    additional symptoms and notification of progress.  The transaction
    closing the ticket will always acknowledge contact with the user.




User Connectivity Problems Working Group                        [Page 8]

Draft    FYI on an Internet Trouble Ticket Tracking System      Nov 1991

        Every single exchange with the user should be recorded.  If
	multiple users call about the same problem, they can be carried
	as separate user dialogues within the same ticket.  In that
	case, however, the NSC must reach closure with all of the users
	in order to close the ticket.

    The operations dialogue relates history of the technical effort to
    resolve a problem, including diagnostic results and interpretations,
    and repair.

        This dialogue most closely resembles traditional trouble
	ticketing systems, though in the larger context of this paper,
	it is possible to achieve closure with a user while retaining
	action items of operational or other classifications.  While an
	exhaustive description of this situation is outside the scope of
	this document, it is expected that this will typically result in
	the NSC generating a new ticket on the pending action item.
  
    The engineering dialogue is a commentary on how the infrastructure
    could be improved to prevent future occurrences of this problem.

        We take the position that any failure which is detected by a
	user before being repaired by a NOC really contains two
	failures: the "operational" problem (what failed), and an
	"engineering" problem (why it affected the user).

        In an ideal world all facilities should be sufficiently
	redundant and monitored such that all failures are detected and
	repaired by a NOC without being noticed by the users.  The
	engineering portion of the ticket is to help guide us to that
	goal.

        Many problems are not operational and can not be "repaired" in
	any conventional sense.  Examples include problems caused by
	insufficient bandwidth or other resources limitations.  These
	tickets enter a "purgatory" state where closure has been
	obtained with a user (say by providing a work around) but the
	real problem requires more than a "repair" and is still present.

        By definition these tickets can be closed, because there has
	been closure with the user, but it is imperative that the
	engineering commentary be escalated out of the ticket system and
	into the Internet engineering and planning organizations.  See
	the discussion of closing statuses.

        Since operations personnel are often ill-equipped to evaluate
        engineering issues, engineering dialogues may be entered by the
        engineering staff after the ticket has been closed with the
	user.



User Connectivity Problems Working Group                        [Page 9]

Draft    FYI on an Internet Trouble Ticket Tracking System      Nov 1991

    The meta dialogue is a commentary about the ticket process itself.
    What improvements should be made to the ticketing mechanism to make
    it more effective in the future?

        This would address questions such as "Were the rules followed?",
	"Are the ticket routing conventions appropriate?", "Is the phone
	book accurate and complete?", "Did a ticket handoff fail?", "Was
	a ticket completed in a reasonable time?", etc.

        There are several issues about how the user interacted with the
        ticketing system which should also be addressed here:

        If the user did not contact the nearest NSC, the NSC may be
        insufficiently educating its users.

        If the user contacted more than one NSC, the first NSC may not
	be doing its job or may be violating the "Don't refer users"
	rule.

        If the user contacted more than one NSC, the user may be
	unreasonably impatient.

        The meta dialogue serves as a quality control check on the
	ticketing process and to drive the user education process.

    The final closing status of a ticket can be used to request
    additional effort.

        We believe this to be the most difficult and most critical part
	of the entire process.  Many problems reported as short term
	operational problems are really long term engineering and
	economic issues.  These must be identified and corrected before
	they destabilize the entire Internet.

        The solution proposed here is incomplete and insufficient.  We
	believe that the only tickets of real interest are precisely the
	ones which will tend to be closed with the user without true
	resolution.

    The final status can be decomposed into four parts: an estimate of
    the user's satisfaction with the resolution, an operational
    action-item for facilities outside of the scope of the ticketing
    system (e.g.  upgrade an out-of-revision host), and action-items
    directly from the engineering and meta dialogues in the ticket.








User Connectivity Problems Working Group                       [Page 10]

Draft    FYI on an Internet Trouble Ticket Tracking System      Nov 1991

        These final statuses should have formal representations such
	that statistics and actions-items can automatically be generated
	from closed tickets.  However at this time it not at all clear
	how this should be done.  This is so critical to the long term
	success of the Internet that this should be solved in phases.  A
	future RFC should spell out some clearly interim closing
	statuses, with explicit hooks for a future mechanism.  Only
	after we have operational experience with real Internet-wide
	tickets will we really understand this issue.

        The user's satisfaction is the key result.  If the user is
        dissatisfied, this may be an indication that the ticket was
	closed too early.  An NSC may not always be an objective
	determiner of a user's satisfaction.  If users are constantly
	appealing their problems to TSCs and reporting markedly
	different assessments of their level of satisfaction, then the
	TSC should raise this issue with the NSC and perhaps its funding
	agency.

        An NSC is not required to take any actions after closing a
	ticket.  No mechanism is provided for the evaluation of
	post-ticket actions taken by NSCs.  However, we expect NSCs will
	be motivated to take action so as to prevent recurrence of
	similar tickets or problems.

        Residual operational action items should be explained to the
	user so that the user can follow up with the responsible
	parties.  This will often be the case where it is the user who
	must take some action to solve the problem.  This must not
	become a mechanism for prematurely closing tickets and leaving
	the user to track the problem.  For example, the owner of a
	broken host is the only party who can fix it.

        Every NSC has different priorities and constraints.  An action
	that one NSC might consider routine might be a major project for
	another.  For example, the NSC for a national backbone network
	might be able to make a trivial configuration change to monitor
	another operational parameter of its network whereas, for a
	small corporate NSC, that kind of change might mean purchasing
	an entire network monitoring package.

    Ticket dialogues are, by default, private communications.  NSCs
    determine the extent to which tickets they hold are made public.
    Rights to dialogue contents and responsibilities for dialogue
    privacy are transferred among NSCs along with the tickets
    themselves.

        The TTS maintains copies of all open tickets.  It provides
	copies only to TSCs, the current NSC, and any NSC cited in the
	dialogues.  Any NSC so cited has the right to append commentary.


User Connectivity Problems Working Group                       [Page 11]

Draft    FYI on an Internet Trouble Ticket Tracking System      Nov 1991

        Every NSC must establish a policy for itself about the extent to
	which it will make dialogues public.  For example, NEARnet
	currently makes the operations dialogue available to the user
	and interested network members.  It generally limits the
	engineering dialogue to its staff and technical advisory
	committee and it limits the meta dialogue to its project staff.

        Note that NSC receiving public funding are likely to be required
	to make reports, however detailed, to their sponsors.

    The User/NSC/TTS/TSC structure proposed here assumes a degree of
    cooperation among components.

        This cooperation can arise from any of several sources
	including, but not limited to, a service commitment to
	end-users, a desire for enhancing usability or the Internet, or
	contractual decree of some funding organization.  We hope that
	the system will function well regardless of the reasons various
	organizations have for being a part of it.

        We expect that some issues will transcend the bounds of this
	system and be addressed out-of-band by network providers,
	funding organizations, end-users, and market demands.





























User Connectivity Problems Working Group                       [Page 12]