stat4news - statistics for newsgroups

copyright (C) 2003-2006 by Michael Grimm


The following two Perl scripts are needed to generate simple statistics for newsgroups, like statnews reports for the newsserver INN with the tradspool storing method. statnews cannot deal with the now widely used storage methods like timecaf, timehash, and CNFS.

Therefore I decided to create a new Perl script, that will use the NNTP protocol for gathering all the data needed. Thus, the new script should be capable to connect to any newsserver with given access, local or remote. Besides that, I wanted to learn Perl, so be generous, the following scripts are the very first ones in my life ;-)

At the very beginning I realised that I had to split the functionality into two parts, the rather slow data collecting step, and the rather fast reporting step. Thus, stat4newsdb will pre-calculate all the data needed and store it into a database to be used afterwards by stat4news to do the reporting job. stat4newsdb should be run on a regular basis (e.g. cron), and stat4news whenever needed.

Caution: stat4newsdb will be very slow at its first usage, especially if you will use a remote newsserver with a large newsgroup. After its first use stat4newsdb will speed up significantly, because it will remember where it stopped the last time. And, you need to generate the directory for your databases by yourself before you start stat4newsdb for the very first time.

I do consider these scripts alpha and beta respectively. They do work for me, and I did test stat4news on a Debian sarge system, with a local INN 2.3.3 installed and remotely with two different major newsservers. And, you do need to have the Perl modules Date::Manip, Getopt::Long, MIME::Words, and News::NNTPClient installed.

Have fun. The links for downloading both Perl scripts see below.


stat4newsdb - the data collector:

# --------------------------------------------------------------------------
#
# stat4newsdb (pre-processor for statistical data collected via NNTP)
#
# copyright (C) 2003 by Michael Grimm
#
# Version 0.01   (18 Mar 2003) very dirty initial hack
# Version 0.02   (19 Mar 2003) first alpha
#                                 loop over XHDR individually
# Version 0.03   (20 Mar 2003) second alpha
#                                 get all XOVER headers, and loop over that
#                                 array; performance enhanced
# Version 0.04   (20 Mar 2003) third alpha
#                                 implement first rudimentary authentication
#                                 by using .stats4news in working dir
#                                 (two lines, USER first, PASSWD secondly,
#                                  apply proper file protections!)
# Version 0.05   (28 Mar 2003) fourth alpha
#                                 separation of script into a raw data sampling
#                                 script stat4newsdb and a post-processing and
#                                 reporting tool stat4news.
#                                 stat4newsdb will store raw data into a DBM
#                                 database.
# Version 0.10   (01 Apr 2003) first beta
#                                 mainly bug fixing done.
#                                 improvement of newsserver detection
# Version 0.10.1 (15 Apr 2003) first beta
#                                 mainly bug fixing done.
#
#
# --------------------------------------------------------------------------
#
# stat4newsdb uses the NNTP protocol for fetching articles from a newsserver
# and calculates raw statistical data for later processing. It will store
# its data into a DBM database. stat4newsdb should be run on a regular basis
# by cron. The DBM database can be made accessable for other users using
# stat4news to decide, which statistics to extract finally.
#
# IMPORTANT REMARKS:
#
#       1. You need to have the Perl modules 'Date::Manip', 'Getopt::Long',
#          MIME::Words, and 'News::NNTPClient' installed.
#
#       2. Authentication to the newsserver is partly implemented, yet.
#          Lacking interactive dialog with the user for the time being.
#
#       3. These are my very first Perl scripts written, be generous :)
#
# --------------------------------------------------------------------------
Download stat4newsdb



stat4news - the report generator:

# --------------------------------------------------------------------------
#
# stat4news (calculate and report statistics for a newsgroup)
#
# copyright (C) 2003-2006 by Michael Grimm
#
# Version 0.01 (06 Apr 2003) dirty initial hack
#
#                               TODO:
#
#                               1. ignore newsreader's versions
#
# Version 0.02 (02 Aug 2004)       done.
#
#                               2. implement quoting behaviour
#                               3. more thorough testing
#                               4. beautify authors names, by adding capital
#                                  first letters
#                               5. ...?
#
# Version 0.03 (02 Jul 2005)    update newsreader identification
# Version 0.04 (05 May 2006)    simplify newsreader identification by
#                               combining rarely used reader strings
#                               under "miscellaneous", and
#                               add link to sources of stat4news
#
my $Scriptname = "stat4news";
my $Version    = "0.04 (05 May 2006)";
my $Copyright  = "copyright (C) 2003-2006 by Michael Grimm";
my $Sources    = "http://www.odo.in-berlin.de/stat4news/stat4news.html";
#
# --------------------------------------------------------------------------
#
# stat4news uses the hash pre-calculated from stat4newsdb and calculates user
# defined statistical data.
#
# IMPORTANT REMARKS:
#
#       1. You need to have the Perl modules 'Date::Manip', 'Getopt::Long',
#          MIME::Words installed.
#
#       2. The current data structure of every article in the DBM database is
#          as follows:
#
#          article number,
#          author,
#          subject,
#          article date (epoch),
#          message ID,
#          newsreader,
#          newsserver,
#          written characters,
#          written lines,
#          last written line (for TOFU detection),
#          quoted characters,
#          quoted lines,
#          last quoted line (for TOFU detection)
#
#
#       3. These are my very first Perl scripts written, be generous :)
#
# --------------------------------------------------------------------------
Download stat4news



stat4news and stat4newsdb - license:

# --------------------------------------------------------------------------
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#
# --------------------------------------------------------------------------





counter image