Newsstar

Contents

Introduction

Newsstar fetches news and posts it to a local server, usually INN. It's designed for Unix-like systems, and all the development was done on Linux. There are already plenty of other programs to do this, but what makes newsstar special is that it can make multiple simultaneous connections, not only to one server, but to several, supporting up to 10 "threads". Before fetching each article it checks that it hasn't already been downloaded by another thread or in a previous session. It can also "pipeline" article requests to make better use of available bandwidth.

I wrote it because a number of ISPs I have used suffer from unreliable newsfeeds. There is an excellent free server made available by the University of Berlin, but it can be a bit slow at times, and using "foreign" servers uses more bandwidth. Therefore I wanted a program which could fetch whatever articles my ISP has available, but use the foreign server to avoid missing posts or getting them very late, and to do it as fast as possible.

Newsstar is distributed under the GPL. If you are reading this file offline you should have a file called COPYING with details. Due to a long-term illness I am unable to earn a living, either by programming, or by any other means, so any gifts will be much appreciated. See below to find out where to send them (hint).

News

Obtaining newsstar

The latest version of the file you are now reading should be available at http://www.realh.co.uk/newsstar.html. The download directory is at http://www.realh.co.uk/unix/. If you're reading this file from its home where the above URL redirected you, this link will skip the redirection and get you to the downloads directory marginally quicker.

Requirements

You'll need a Unix-like system, eg Linux, to run and compile newsstar. Compilation requires a reasonably non-arcane set of system libraries, because I'm not well versed in portable programming.

Obviously you'll need the details of at least one news server to download news from. To get the articles into your local news spool you'll need something that can debatch an rnews file, eg the rnews program supplied with INN. The remote server(s) must support the XHDR command, and allow articles to be requested using their Message-ID. Apparently, btinternet.com's server crawls badly using this method, so should be avoided.

Newsstar optionally uses the NNTP CHECK command to avoid downloading articles the local server already has. If your server/spool doesn't support that you won't be able to fetch from multiple servers without the risk of wasting bandwidth by downloading duplicates. INN supports CHECK, but I can't vouch for other servers at present. I hope to provide alternative strategies in future.

Building

First I recommend you run ./configure --help and check whether any of the options will be useful to you. The standard directory options which affect it are --prefix, --bindir and --sysconfdir. Most of the other options shouldn't need changing.

Then it's just a matter of running the usual:

./configure
make
make install
(make install should normally be performed as root, but it's recommended to do the first two stages as a normal user in a directory you own).

It's recommended that you include INN's binaries in your PATH before running configure, if they're not in one of the standard branches or in one of /usr/lib/news/bin, /usr/local/news/bin or /var/lib/news/bin. This will help it set some path variables for the perl script.

Setting up

Newsstar installs a binary called newsstar, which isn't intended to be run independently. Instead, you should always call the perl script, called newsstar.pl, which acts as a front-end and performs a lot of support work for the binary.

When newsstar starts it first looks in its config directory for a file called main.cf. This has one option per line in the form:

keyword     value
Any amount of whitespace is permitted between the key and value. If you want leading whitespace in a value it should be enclosed in double quotes ("). Lines beginning with # are comments, and blank lines are also ignored. A sample file called main.cf.sample is provided, which you should copy to use as the basis for your own file. Each option is documented there, with the default value for each option shown. Commented options have no default value.

A main.cf file is not compulsory.

See also server-specific option files.

Setting up for download

Once newsstar has read its main.cf file it scans the newsrc directory (RC_DIR) for one or more files named newsrc.*, where the * is the name of each server. This name can either be its address, or a nickname, in which case you must provide its address in its config file (see below).

The newsrc file contains the name of each group, optionally separated from a number by whitespace. If there is no number, newsstar will try to fetch all available articles from the group. If the number is negative, -n, it will try to fetch the group's n most recent articles. A positive number means that was the last article downloaded from the group, and the next fetch will try to fetch all articles newer than that. Usually you will only use blanks or negative numbers when creating the file. When newsstar has run, it automatically updates each newsrc file with the the number of the last article downloaded in each group.

A sample newsrc file, newsrc.sample is provided. You should delete or rename this file; it will be ignored, so you can't have a server called sample. It is compulsory to have at least one newsrc file, because this is how newsstar identifies which servers to connect to.

For each server it finds a newsrc file for, newsstar looks for an optional file in its config directory called cf.* where the * is the name of each server, as used in the newsrc file names.

The underlying format of these files is the same as for main.cf, and a sample with comments is again provided. Some of the options correspond to those in main.cf, in which case the value from main.cf is used by default, but can be overridden on a per-server basis.

Setting up for upload

For each server with an outgoing feed enabled, newsstar looks for a subdirectory named after the server in its outgoing directory eg OUTGOING_DIR/newsstar/my.news.server where OUTGOING_DIR is usually /var/spool/news/outgoing. It reads article files from the directory and tries to upload each one to the remote server. Files successfully uploaded or rejected due to the server already carrying them are deleted from the directory. Those that are rejected or unable to be uploaded for other reasons are moved to OUTGOING_DIR/newsstar/failed.

The above uploading strategy was chosen for flexibility, although INN doesn't set up its outgoing feeds in that way. The perl script takes a typical INN outgoing feed file for each server and produces the directories full of individual files that the binary requires.

Command-line options

The newsstar binary takes a few options; there is no longer any need to edit newsstar.pl to use them; as of version 0.2.2, the script passes options through to the binary.

Verbosity

There are two verbosity level options, -v and -vv (that's two v's, not a w). These cause it to output extra information, mainly for debugging, with -vv being considerably more verbose than -v. All standard messages, including warnings and errors, are sent to stdout, while the extra information enabled by -v and -vv are sent to stderr. The reason for this policy is to make it possible to capture the extra information without it drowning out the standard information (normally) sent to the console.

stdout and stderr merging

The -s option is used to let newsstar know whether stdout and stderr output to separate terminals or files. Normally newsstar assumes that both output to the same place, but the -s option tells it they are separate. This distinction is useful because of the way progress output overwrites itself on the same line where possible.

Using full-screen mode changes the meaning of this option; see below.

Full-screen mode

The -f option causes newsstar to take over the whole terminal, using the curses library. It divides the screen up into a number of sections, including one for each thread, which makes it easier to keep track of progress on a per-thread basis.

You can configure the colours and other attributes used for different types of information in this mode, using a file called curses.cf in the config directory. This has a similar underlying format to the other config files, and a sample is provided showing examples of every available option.

In full-screen mode, the -s option has a different meaning. If used, it means that Info and Debug messages (enabled by -v and -vv) are sent to stderr, but not to the terminal via its full-screen interface. Other levels of message are sent to both. Be sure to redirect stderr away from the terminal if you use this option, otherwise the display will be messed up.

Script options

newsstar.pl takes two options of its own, -A and -a, described below. These options are not passed through to the binary.

The perl front-end

Note that newsstar.pl replaces newsstar.sh used in versions prior to 0.2. It's generated by running the configure script, which works out some of the path variables.

IMPORTANT: This script must be run with write access to the news spool. Most systems have a user called news, which is ideal for this use. As of version 0.3, it is no longer necessary to have INN's binaries (rnews, sm etc) in the PATH before running the script, as it now adds INN's branch itself if necessary.

newsstar.pl performs these functions:

Preparing articles for upload

For each newsrc file, the script looks for a corresponding newsfeeds file in OUTGOING_DIR, flushes it, and copies the referenced articles into the appropriate directory for the newsstar binary. The feed file, usually generated by INN, contains one article reference per line. Newsstar is only interested in the first field in each line, which can either be a partial path to the article (in which case you should check $ARTICLES_DIR near the top of the script), or a storage token. The script automatically distinguishes between partial paths and tokens, taking appropriate action to locate the referenced file.

Near the start of the script is a function newsstar_filtered_copy which performs the copy, removing some headers that may upset the server receiving the article. You can edit this function to perform any other filtering you deem appropriate.

Synchronising newsrc files (-A option)

This is equivalent to suck's -A option, and is enabled by passing the -A option to the script. It takes an optional argument, immediately after the -A with no space (eg -A-50). If present, its value is used for any groups added to the newsrc file (see above). You should only consider using 0, negative values, or ommitting the argument. Groups already in the newsrc file have their value left unchanged of course.

With this option enabled, the script reads the local server's active file (check $ACTIVE_FILE near the top of the script) and checks that they both contain the same set of groups. Each server should also have an ignore file in RC_DIR, named ignore.server_identifier. Groups matching any of the perl regular expression patterns (the section on migrating from suck contains some tips, otherwise perl regexps are too major a subject to cover here) in this file will be excluded from newsrc files. A sample is provided.

-a option

If the -a option is given as well as -A, each newsrc file will be sorted so that its groups appear in alphabetical order.

Running the binary, processing downloaded articles, clearing up

Finally, the script runs the binary, uses rnews to post the downloaded batch to the local server, removes any remaining temporary files etc, and exits.

Migrating from suck

This is quite an easy process, because suck's sucknewsrc files can be used as newsstar newsrc files without alteration, just by moving/renaming them. Converting suck active-ignore files to newsstar ignore files requires some more work, but very little. The main things to watch out for are that in perl regexps, . (period) means match any character, so if you want to exactly match a . in a group name, you should precede it with a backslash. Also, * means match any number of the preceding character, so * in active-ignore should be replaced with .*

Bugs

For a while it was sometime failing to fetch the last few articles from one of the particular servers I connect to (it belongs to World Online and identifies itself as running "INN, 5.4g1, S3"), when also connecting to another server. Newsstar would just seem to wait forever for a reply to the ARTICLE command, without timing out. This problem went away after a while without my doing anything in particular.

It's been serving me well under normal conditions (connecting to 2 servers simultaneously), but I'm not confident yet that it's very graceful in handling problems.

Future plans

Here are features I intend to add to newsstar in the future, in approximate order of importance:

Contact details

Newsstar was written by Tony Houghton, who can be emailed at tony@realh.co.uk, but for newsstar-related emails, please use newsstar@realh.co.uk. At the moment it all goes to the same place, but I might find it handy to be able to filter newsstar-related mails.

My home page is http://www.realh.co.uk, and you should be able to find the page you're reading now at http://www.realh.co.uk/newsstar.html.

Please send material gifts to:

Tony Houghton
271 Upper Weston Lane
Woolston
Southampton SO19 9HY
England