November 12th, 2010

Your Schwartz Factor on your CPAN Page

The Schwartz

The Schwartz factor of a CPAN author is the ratio of the number distributions of that author over the number of tarballs sitting in his CPAN directory. A low number indicates that it’s probably time for this author to do some clean-up (without fearing to lose the old tarballs, as they will always be available via the BackPAN, natch).

As such, I wanted to include a periodic check of my Schwartz factor to my monitoring system. Coming up with a script to extract the information from my CPAN home directory was simple enough:


# see

use strict;
use warnings;

use 5.10.0;

use LWP::Simple qw/ get /;
use List::Util qw/ sum /;

my $author = 'YANICK';

$author =~ s#(.)(.)#$1/$1$2/$&#;  # YANICK => Y/YA/YANICK

my $page = get "$author";

my %dist;
$dist{$1}++  while $page =~ /<a href="(.*)-v?[\d_.]+\.tar\.gz"/ig;

say "Schwartz factor: ", keys( %dist) / sum values %dist;

while( my ( $dist, $num ) = each %dist ) {
    say $dist, ' - ', $num;

This is not exactly the most robust code I ever written — the parsing of the page should be left to HTML::Tree, really — but it’s doing what it’s supposed to do. Not, though, that depending on which mirror site you’ll hit, the factor may vary a little bit.

But then I thought, why keep the fun offline? So I imported the logic into a GreaseMonkey script and I now have the Schwartz factor of CPAN authors added to their CPAN pages:

Schwartz factor on CPAN author page

The script will not work for authors who dropped an index.html in their home directory, or if they use sub-directories, but I expect that they should be more the exception than the rule.

The GreaseMonkey script is available on the site, and on GitHub.