E-mail: webmaster
SnarfNews was initially written (back in 1990) in order to provide a compressed, ASCII-fied USENET feed of a selection of newsgroups to a home machine (an Amiga A500); the original project included a minimal NNTP server and article-handling system, but that has since been dropped in favour of INN on Linux, although it may be reborn one day if I have enough time.
Things you can do with SnarfNews, include:
Of course, this modularity means there is some loss in efficiency, but the benefits for the user in being able to trivially extend the software, and in the flexibility that this approach provides, far outweigh any overhead.
You will find that some functionality is duplicated, for instance much functionality in "runfeeds" could be easily reimplemented using "ifbytes". That's OK - as with most toolkits you'll soon find out that there is more than one way to do things, to use the Perl motto.
SnarfNews deals in four basic types: the USENET article, the "rnews" batch, the encoded "rnews" batch, and the encrypted "rnews" batch.
Hopefully you'll start to see a a common thread about this point.
SnarfNews' interpretation of an "rnews" batch is currently rather strict; programs which expect to receive rnews batches cannot cope with being fed bare articles, and vice versa. For those unfamiliar with rnews batches: snarfnews slings groups of USENET articles around bundled together in a format referred to as an "rnews batch".
The format is:
#! rnews nnn
[article headers]
[article headers]
[article headers]
[article body]
#! rnews nnn
[headers of next article]
...
...
...where nnn is the size of the entire article (in bytes) measured from where it starts on the next line. This is the standard format for grouping articles together for transmission to other sites, and for a more detailed reference the user is referred to the man page for "rnews".
Encoded (compressed and ASCII-fied) and encrypted batches are discussed below.
To get around this, the author defined the word "gronk", thusly:
Gronk: to gather together a bunch of USENET articles collected from disparate sources, and present them as a "rnews" batch.
Following from this, we have programs named "gronknntp", which gathers news articles via NNTP and creates an "rnews" batch from then, and "gronkfiles", which performs a similar task, reading its articles from files on disk.
...to reflect your local site requirements.
When installing SnarfNews, check your permissions and consider who'll be running it; if you intend to wire it into /etc/aliases for mail->news gatewaying, the "$SNARFRUN" directory may have to be readable or writable by "mail", "daemon" or similar.
Similarly, if you're heavily integrating it with "INN" and have not set up a "news" group with group-execute bits, you may have to set permissions likewise for the user that "innd" runs as.
To get around this in the first instance, I ship with the permissions 01777 on the "run" directory. If this is not suitable for you, change it, but be aware of what may have to chenge to get it working in your environment.
Makefile
% make cleanor
% make spotless- occasionally, is probably a good thing.
README.html
snarf program [arguments...]
If a user-written script needs to use several tools from the snarf toolkit, it is probably simplest to write the script as you would normally, and then invoke:
% snarf ~/bin/scriptname- or whatever, in the appropriate place.
snarf-inbound: "| /usr/local/snarf/snarf cbatch2rnews"
...to /etc/aliases, though care should be taken (especially by people writing their own frontends) to ensure that there is no way that this feature may be used by unauthorised people to meddle with your news traffic or gain unauthorised entry to your system.
The supplied frontends are:
art2mail address [file]
% snarf art2mail user@foo.co.uk < article
art2post servername [file]
batch2ihave servername [file] batch2post server [file]
batch2mail to-address [file] batch2mailfrom to-address from-address [file] batch2mailrepl to-address reply-address [file]
The latter two commands permit the user to specify From: and Reply-To: addresses for the messages, respectively.
cbatch2mail to-address [file] cbatch2rnews [file]
% cat batchfile | snarf cbatch2rnews
ebatch2mail keyfile to-address [file] ebatch2rnews keyfile [file]
The "keyfile argument is the name of a file to be found (only) in the "$SNARFSECRET" directory, containing a long text string which should be kept as secret as possible. The contents of this file will be used by PGP as a passphrase to conventionally encrypt the batch of articles before transmission to the remote site.
The contents of the keyfile should be transferred securely to the other site (via PGP or floppy) and be installed there, where "ebatch2rnews" can be used to unpack it, and the result fed into "rnews" or "runbatch".
mail2post servername newsgroup [file]
One or more comma-separated newsgroups may be specified in the "newsgroup" argument, but there should be no spaces in the list.
b64decode [-debug] [file ...]
b64encode
batchencode [file ...] batchdecode [file ...]
The batchencode process requires use of "gzip" specifically because of two important features of that program:
batchencrypt keyfile [file] batchdecrypt keyfile [file]
The argument "keyfile" specifies the name of a file which will be searched for (only) in the "$SNARFSECRET" directory, and which contains a long line of arbirtary text which will be used as the passphrase for the encryption. (see "secret/example", below).
Since PGP also uses adaptive compression and base-64 encoding, it is approximately equivalent to - sometimes slightly better than - "batchencode" in terms of the size of its output; however it has not been investigated whether PGP shares gzip's ability to cope with catenative input (see above), so for the moment it is recommended that the user does not build a system which would pipe more than one encrypted batch at-a-time, into a single instance of "batchdecrypt".
conva2a [-debug] [-skip-HEADER] [-die-HEADER] [-x-HEADER] ['Header: Value' ...] conva2m [-debug] [-skip-HEADER] [-die-HEADER] ['Header: Value' ...] convm2a [-debug] [-skip-HEADER] [-die-HEADER] ['Header: Value' ...]
These programs provide the ability to add, remove, and change arbitrary headers in a USENET article or E-mail message, and reformat the headers appropriately for re-transmission. Enough defaults are built into the filters to avoid most of the work in converting messages of one type to another, but certain data must be supplied on the command line. See (or use) the frontend scripts for examples.
conva2m -skip-to "To: address"
If the address looks like:
From: username@hostname...it is modified to:
From: username@hostname.$SNARFDOMAINOtherwise, if it looks like a bare username, it is modified to:
From: username@$SNARFHOST...however, the user is free to override this by specifying:
convm2a -no-from "From: user@host.domain.com"
If these rules are not sufficent to provide the "From:" address rewriting functionality that you require at your site, users are recommended to write a wrapper to invoke "convm2a" with the "-skip-from" argument, and to insert their own "From:" header. Several pieces of USENET software (notably INN) will ot accept articles for posting which do not contain fully-qualified "From:" addresses.
Similarly messages which contain "HEADER" will be silently discarded when "-die-HEADER" is specified; as a security mechanism this is permanently enabled for the header "X-Reflected-By:" to prevent messages from passing through more than one instance of convX2Y, which could otherwise lead to a disasterous USENET/E-mail loop.
It is also for this reason that the "postart" and "ihaveart" scripts are designed to dilently ignore empty inputs, since they are likely to be the next in the pipeline after a convX2Y filter.
"Conva2a" also has the ability to "comment" out any specified header using the "-x-HEADER" mechanism, which prepends "X-" to any given header; typically however, use of this will not be required, as USENET articles require very little modification for reposting beyond what "conva2a" provides by default.
All filters can also add headers to the articles/messages that are produced, simply by specifying them on the command line, eg:
conva2m 'X-My-Favourite-Cookie: Chocolate Chip'
- will add the above header to the message generated by "convm2a".
Important Note - users who are interested in making changes to E-Mail and USENET headers (other than those already available via the SnarfNews frontends) are strongly advised to read the reference material first (RFC822, RFC1036) to see what is/is not permitted.
Never, under any circumstances, modify, add or remove a USENET article's "Message-Id:" header, if it already has one.
filterart [-debug] [configfile] filterbatch [-debug] [configfile]
If a configuration file is named on the command line (eg" "fred") it will be searched for in $SNARFCONF/filter ("$SNARFCONF/filter/fred") and then as an explicit path; if no such file is specified, the default ("$SNARFCONF/filter/default") is used.
This permits the removal of MAKE MONEY FAST spams, excessively crossposted articles, traffic sourced from particular users/sites, and oversize articles before the traffic is forwarded to another process.
forprep
gronkfiles [-debug]
gronknntp [-debug] [-save [-tree]] hostname[:port] [config]
If the "-save" option is specified, articles are saved to individual files under the directory:
$SNARFSAVE/the.group.nameIf "-tree" is also specified, articles are saved under:
$SNARFSAVE/the/group/name
ifbytes command [arg ...]
This command is chiefly used where "sendmail" (or similar) is at the end of a pipeline where it may (or may not) receive an empty input from "stdin"; when "ifbytes" is utilised in this case, "sendmail" will never actually be invoked, thereby preventing it from being confused.
ihaveart [-debug] [-genid] hostname[:port] [file] postart [-debug] hostname[:port] [file]
If "-genid" is specified with "ihaveart", and there is no "Message-Id:" header in the article, then a Message-Id: header will be created using the value of the "$SNARFHOST" variable as a hostname.
Important: unless the "-debug" switch is specified, these two programs will accept and silently discard empty articles, so that no messages are generated on "stderr" when articles are dropped by "filterart" or the "convX2Y" scripts (below), thereby presenting an empty input to these programs.
innbatch [innfeedname ...]
For reference, the INN "newsfeeds" entry for such a feed would look like:
innfeedname[/exclude,exclude,...]::Tf:
- see the INN "newsfeeds" man page for details.
lock [cookie] unlock [cookie]
nntpmsgids [-debug] [-gronk] hostname[:port] [config] gronkmsgids [-debug] hostname[:port] [pfcnt]
The "nntpmsgids" is configured similarly to "gronknntp" (above) and expects to find a list of newsgroups in the file "config" (default filename as per "gronknntp") and connects to the NNTP server on "hostname", where it then produces a list of the Message-Ids of new articles in those groups, printing the list to "stdout".
The same "SNARFXPOSTMAX" functionality of "gronknntp" is duplicated in this command.
The "gronkmsgids" command reads a list of Message-Ids from "stdin", connects to the NNTP server on "hostname", retrieves the specified articles, and presents them on "stdout" as a "rnews" batch.
The argument "pfcnt" (if specified) should be a small integer (typically between 1 and 3) which specifies a prefetch quantity, viz: the number of article-fetch requests which should queued so that article retrievals occur asynchronously, and the network connection is used to its maximal advantage.
Combined with the asynchronous nature of the "nntpmsgids" command, this provides for exceptionally high throughput of article transferral, as there are few occasions where the link will ever be idle, eg: awaiting for the remote NNTP server to respond.
Such a feed could be implemented via:
stream \
: nntpmsgids nntpserver | gronkmsgids nntpserver 2 | rnews
...in the "feeds" file, after setting up "$SNARFCONF/sites/nntpserver" - but it should be noted that this method of retrieval is of slightly higher risk than a standard "gronknntp" feed, since the article-count maintaining program ("nntpmsgids") cannot determine the successful connection of the article fetcher ("gronkmsgids").
Moreover, if "gronkmsgids" fails to connect to your NNTP server for some reason and quits, you might lose articles which "nntpmsgids" will register as successfully retrieved (if it hasn't been killed by SIGPIPE).
It is for this reason that "nntpmsgids" has an extra option, "-gronk".
When invoked with the "-gronk" option, "nntpmsgids" will automatically launch "gronkmsgids" on the same host, with a "pfcnt" of two, and will pipe the list of message-ids directly to it, so that a feeds entry of:
stream : nntpmsgids -gronk nntpserver | rnews
- is functionally identical to the one previously cited, with the added advantage that "nntpmsgids" can detect a failure in "gronkmsgids", leading to better error recovery. In short: this is the preferred way to pull a feed using this method, though users may find creative uses for "nntpmsgids" and "gronkmsgids" separately.
It should also be noted that "nntpmsgids" currently relies on the non-RFC977 "XHDR" command extension; if this is not available on your NNTP server, use "gronknntp" instead.
runbatch [-debug] "command"
Obvious uses for this command include breaking open an rnews batch and reformatting and mailing the messages out to a maillist, via "conva2m" and "sendmail".
runfeed [-debug] [-nolock] [feedname ...]
feedname : puller [ : pusher ]
Where:
Entries may be broken over several lines using backslash "\" characters.
When a user invokes:
% runfeed feedname...the "runfeed" program searches its config file for an entry with a matching tag. If the matching entry contains no "pusher" command, then the "puller" command is executed and "runfeed" continues to search for other entries with the same tag.
If a "pusher" command is defined, then the "puller" command is executed, with "stdout" redirected to a temporary file. When the "puller" finishes, if this file is NOT empty, then the "pusher" command is invoked, reading "stdin" from this file.
Thus, the user can (for instance) use "gronknntp" to read articles from a NNTP server by specifying it as a "puller", and then mail the resulting rnews batch to another site by specifying a "pusher" to encode and mail the batch, the complete entry being:
feedname \ :gronknntp news.foo.co.uk \ :batchencode | mail snarfnews@bar.co.uk
Whereas, if you don't need to worry about sending out empty batches of articles (eg: you are fetching articles for your local feed) - you might prefer not to specify a "pusher" and just use a pipe:
topup : gronknntp news.foo.co.uk | rnews -v
The "-nolock" option disables locking of each feed in turn, which otherwise prevents multiple instances of the same feed being run simultaneously. Specifying no feedname on the command line will try to run all feeds listed in the "feeds" file.
timeout interval command args ...
lib/snarf.pl
conf/feeds
conf/filter/default
# maximum number of groups an article can be crossposted to and accepted
# (see also SNARFXPOSTMAX in "snarf" for gronknntp config of same)
xpost 9
# max article size specified as nnn (bytes) or nnnK (Kb) or nnnM (Mb)
size 256K
# filter out common spams; format:
# header [egrep pattern matching a spam]
header ^subject:.*money.*money.*
header ^subject:.*make.*money.*fast
header ^subject:.*make.*cash.*fast
header ^subject:.*\$\$\$
extras/inn-subscribe
secret/example
Instead, you should create your own "keyfile" by typing a long jumble of characters into a file, and transporting it via some secure means to the remote site.
People running a recent version of Linux (kernel rev >2.0), or any other OS equipped with a good "/dev/random" random number generator, may like to try:
% make asecret- which should produce a string suitable for redirecting into a "keyfile" in the "secret" directory; users on other operating systems will have to find a different way to create a high-entropy random text string.
The NEWNEWS command as described in RFC977 was ideal when NNTP was designed, when USENET articles arrived in infrequent batches, and people using an NNTP server were likely to be on the same local network. Clock granularity was unlikely to be an issue, since on a LAN it is fairly easy to set all hosts to be on the same time.
Nowadays articles arrive in their hundreds by the second, and people are using long-distance comms to talk to NNTP servers; as such, there is no guarantee that - especially if your NNTP server's clock is a few minutes ahead of yours - that several dozen articles for your desired newsgroups arrived in the interval when it was 12:00 noon on your workstation, and when it was 12:00 noon on the NNTP server.
As such, since there is no easy way within NNTP to ask a server what time it thinks it is, the simplest solution is instead to hack through the newsgroups you desire, looking for new articles. This isn't so bad, really, so long as your server supports the LISTGROUP command, as do all INN servers.
After checking with your firewall administrator that it won't (a) get you into trouble and (b) be blocked by access control lists, get a copy of "runsocks" from the appropriate SOCKS software kit (http://www.socks.nec.com/) and build it. Then set the value of SOCKS_SERVER (in the "snarf" script) to point at your firewall SOCKS host, and modify your "feeds" file to invoke "runsocks" before any puller or pusher that needs to go through the firewall, eg:
get-remote \
: runsocks gronknntp news.remote.co.uk | rnews -v
put-remote \
: innbatch news.remote.co.uk \
: runsocks batch2ihave news.remote.co.uk
...the SOCKS runtime library will be preloaded before execution of "gronknntp", and so will transparently connect out through your SOCKS server to the remote machine.
Users who are on machines which do not have a shared library capability (and therefore cannot use "runsocks") will probably have to build their own SOCKS-ified version of Perl5 and configure it into SnarfNews, in order to achieve this functionality.
The best way to describe this is probably by example; consider a mail-list called "real-ale@foo.co.uk", which you want to gate to/from the newsgroup "local.real-ale".
real-ale-in: "|/usr/local/snarf/snarf mail2post servername local.real-ale
...and subscribe it to "real-ale@foo.co.uk".
E-mail that arrives for "real-ale-in" will now be reformatted as USENET articles and posted to your local NNTP server running on "servername".
% echo local.real-ale > $SNARFCONF/sites/real-ale
real-ale \
: gronknntp localhost $SNARFCONF/sites/real-ale \
: batch2mail real-ale@foo.co.uk
/usr/local/snarf/snarf runfeed real-ale
- every half hour. The cronjob will thus check for new articles posted to "local.real-ale", reformat them as mail messages, and post them to "real-ale@foo.co.uk".
Experimentalist sysadmins with control over their local INN daemon might prefer to set up a "program" feed to support the "news to mail" half of the functionality, above, along the lines of:
snarf!:!*,group.name:Tp:/usr/local/snarf/snarf art2mail address %s
Do remember that you are a small fish in a very big sea, and that however "neat" you think doing this sort of thing may be, the readers of any maillist are quite entitled to be thoroughly annoyed with you if you propagate their posting onto USENET without their permission, or if (by gating USENET postings back onto the maillist) you risk flood the maillist with USENET spam and other junk.
Similarly, maillist moderators are quite entitled to be thoroughly annoyed with you if random people start "posting" to the maillist through your USENET gateway, when the whilst membership of the list is meant to be "closed" or in some other way restricted.
In short: If you're in any doubt as to whenther you should do this sort of thing, the answer should be don't do it.
The overall architecture of SnarfNews was developed whilst the author was in a caffeine-induced alternative state of mind whilst working at the University of Wales, Aberystwyth, as a systems programmer. Bear this in mind if ever you visit Wales - it could happen to you too.
This file is part of SnarfNews Copyright (C) 1991,1992,1993,1994,1995,1996 Alec Muffett
This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.