Saturday, October 30, 2004

phto0027

Mark: What are you, a whipping girl?
Angela: No, I'm a dominatrix
Mark: A domi-what-what?

phto0027
Originally uploaded by jimregan.

Sunday, October 24, 2004

Another scraper

I only read Telsa's diary occasionally; maybe having a feed in liferea would change that.

#!/usr/bin/perl -w

use strict;
use XML::RSS;
use LWP::Simple;
use HTML::Entities;

my $rss = new XML::RSS (version => '1.0');
my $url = "http://www.linux.org.uk/~telsa/Diary/diary.html";
my $page = get($url);

$rss->channel(title       => "The more accurate diary. Really.",
              link        => $url,
              description => "Telsa's diary of life with a hacker:" 
	      		     . " the current ramblings");

foreach (split ('<dt>', $page))
{
	if (/<a\sname="([^"]*)">
		<strong>
		([^>]*)
		<\/strong><\/a><\/dt>\s*<dd>
		(.*)<\/dd>/six)
	{
		$rss->add_item(title       => $2,
			       link        => "$url#$1",
		       	       description => encode_entities($3));
	}
}

print $rss->as_string;

Michael Moore scraper improved

Now with added /x-ness!

#!/usr/bin/perl -w

use strict;
use XML::RSS;
use LWP::Simple;
use HTML::Entities;

sub findurl ($$)
{
	my $title = shift;
	my $pagein = shift;
	if ($pagein =~ /<a href="(index.php\?id=[^"]*)">$title<\/a>/i)
	{
		return "http://www.michaelmoore.com/words/diary/$1";
	}
}
							

my $rss = new XML::RSS (version => '1.0');
my $url = "http://www.michaelmoore.com/words/diary/index.php";
my $page = get($url);

$rss->channel(title       => "Mike's Blog",
              link        => $url,
              description => "Michael Moore's blog");

foreach (split ('<table ', $page))
{
	if (/<p><span\sclass="smallText"><i>
		([^>]*)
		<\/i><\/span><br>[\r\n]*<span\sclass="titleText">
		([^>]*)
		<\/span><\/p>[\r\n]*<p>
		(<p>.*<\/p>)
		[\r\n]*<\/p>\n/six)
	{
		$rss->add_item(title       => $2,
			       link        => findurl($2, $page),
		       	       description => $1 . encode_entities($3));
	}
}

print $rss->as_string;

Michael Moore's blog

Is here. Scraper, here:

#!/usr/bin/perl -w

use strict;
use XML::RSS;
use LWP::Simple;
use HTML::Entities;

sub findurl ($$)
{
	my $title = shift;
	my $pagein = shift;
	if ($pagein =~ /<a href="(index.php\?id=[^"]*)">$title<\/a>/i)
	{
		return "http://www.michaelmoore.com/words/diary/$1";
	}
}
							

my $rss = new XML::RSS (version => '1.0');

my $page = get("http://www.michaelmoore.com/words/diary/index.php");

$rss->channel(title       => "Mike's Blog",
              link        => "http://www.michaelmoore.com/words/diary/index.php",
              description => "Michael Moore's blog");

foreach (split ('<table ', $page))
{
	if (/<p><span class="smallText"><i>([^>]*)<\/i><\/span><br>[\r\n]*<span class="titleText">([^>]*)<\/span><\/p>[\r\n]*<p>(<p>.*<\/p>)[\r\n]*<\/p>\n/si)
	{
		$rss->add_item(title       => $2,
			       link        => findurl($2, $page),
		       	       description => $1 . encode_entities($3));
	}
}

print $rss->as_string;

Friday, October 22, 2004

Bill Bailey's blog

Bill Bailey has a blog! Bill Bailey's blog doesn't have a feed! Solution:

#!/usr/bin/perl -w

use strict;
use XML::RSS;
use LWP::Simple;
use HTML::Entities;

my $page;
my $rss = new XML::RSS (version => '1.0');

$page = get("http://www.bill-bailey.co.uk/blog/index.php");
$rss->channel(title       => 'Bill Bailey',
              link        => 'http://www.bill-bailey.co.uk/blog/index.php',
              description => 'Bill Bailey\'s Blog');
      
my @chunks = split ('<table border="0" cellspacing="0" cellpadding="0">', $page);

my ($date, $link, $content, $title);

foreach (@chunks)
{
	if (/<td class="BlogTitle">\s*([^\r\n]*)[\r\n\s]*<\/td>/is)
        {
		$date = $1;	
	}
	if (/<td class="BlogHeader">\s*([^\r\n]*)[\r\n\s]*<\/td>/is)
	{
		$title = $1;
	}
	if (/<td class="BlogText">(.*)<\/td>\s*<\/tr>\s*<tr>\s*<td>&nbsp;/is)
	{
		$content = encode_entities($1);
	}
	if (/<td class="BodyText">\s*<a href="([^"]*)">Read comments/is)
	{
		$link = "http://www.bill-bailey.co.uk$1";
	}
	if ($date && $title && $content && $link)
	{
		$rss->add_item(title       => $title,
			       link        => $link,
		       	       description => $content,
		       	       dc => {date => $date});

	}
}

print $rss->as_string;
More Bill Bailey here.

Userfriendly scraper

#!/usr/bin/perl -w

use strict;
use LWP::Simple;
use XML::RSS;
use Date::Format;

# These regexes taken from Dailystrips
my $patternpre = "<img.+?src=\"(http://www\.userfriendly\.org/cartoons/archives/%y.+?/uf.+?\.gif)\"";
my $urlpre = "http://ars.userfriendly.org/cartoons/?id=%Y%m%d&mode=classic";

my $pattern = time2str ($patternpre, time);
my $url = time2str ($urlpre, time);

my $page = get($url);

my $rss = new XML::RSS (version => '1.0');

$rss->channel(title       => 'User Friendly',
	      link	  => 'http://userfriendly.org/',
	      description => 'User Friendly the Comic Strip');

if ($page =~ /$pattern/ig)
{
	$rss->add_item(title       => time2str("CARTOON FOR %a %b, %Y",time),
	               link        => "$url",
       	               description => "&lt;img src='$1' /&gt;");
}
	
	
print $rss->as_string;
I got fed up of using overblown commands to HTMLise these things, so I have a little sed script now:
s/&/\&amp;/g
s/</\&lt;/g
s/>/\&gt;/g
s/"/\&quot;/g
I changed sun-pic.pl. The line
foreach (split ("\r\n", $page))
is now
foreach (split ("\n", $page))

Thursday, October 21, 2004

Xchat hello abused...

To implement the social commands in FlickrLive.

__module_name__ = "flickr" 
__module_version__ = "1.0" 
__module_description__ = "flickr social commands" 
 
import xchat

otheruser = "This social command requires the name of another user"

command_list = """
/apologize
/applaud
/blink
/blush
/bounce
/brood
/comfort
/cough
/cringe
/evileye
/flirt
/frown
/giggle
/goose
/groan
/highfive
/hit
/hug
/kiss
/laugh
/lick
/massage
/nudge
/pinch
/pout
/punch
/shrug
/smile
/spank
/tickle
/wave
/wink
"""

def apologize_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME apologizes to %s" % word_eol[1])
	return xchat.EAT_NONE

def applaud_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME applauds %s" % word_eol[1])
	return xchat.EAT_NONE

def blink_cb(word, word_eol, userdata):
	xchat.command("ME blinks")
	return xchat.EAT_NONE

def blush_cb(word, word_eol, userdata):
	xchat.command("ME blushes")
	return xchat.EAT_NONE

def bounce_cb(word, word_eol, userdata):
	xchat.command("ME bounces")
	return xchat.EAT_NONE

def brood_cb(word, word_eol, userdata):
	xchat.command("ME broods")
	return xchat.EAT_NONE

def comfort_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME comforts %s" % word_eol[1])
	return xchat.EAT_NONE

def commands_cb(word, word_eol, userdata):
	print command_list
	return xchat.EAT_NONE

def cough_cb(word, word_eol, userdata):
	xchat.command("ME coughs")
	return xchat.EAT_NONE

def cringe_cb(word, word_eol, userdata):
	xchat.command("ME cringes")
	return xchat.EAT_NONE

def evileye_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME gives %s the evil eye" % word_eol[1])
	return xchat.EAT_NONE

def flirt_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME flirts with %s" % word_eol[1])
	return xchat.EAT_NONE

def frown_cb(word, word_eol, userdata):
	xchat.command("ME frowns")
	return xchat.EAT_NONE

def giggle_cb(word, word_eol, userdata):
	xchat.command("ME giggles")
	return xchat.EAT_NONE

def goose_cb(word, word_eol, userdata):
	print "WTF?"
	return xchat.EAT_NONE

def groan_cb(word, word_eol, userdata):
	xchat.command("ME groans")
	return xchat.EAT_NONE

def highfive_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME gives %s a high five" % word_eol[1])
	return xchat.EAT_NONE

def hit_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME hits %s" % word_eol[1])
	return xchat.EAT_NONE

def hug_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME hugs %s" % word_eol[1])
	return xchat.EAT_NONE

def kiss_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME kisses %s" % word_eol[1])
	return xchat.EAT_NONE


def laugh_cb(word, word_eol, userdata):
	xchat.command("ME laughs")
	return xchat.EAT_NONE

def lick_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME licks %s" % word_eol[1])
	return xchat.EAT_NONE

def massage_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME massages %s" % word_eol[1])
	return xchat.EAT_NONE

def nudge_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME nudges %s" % word_eol[1])
	return xchat.EAT_NONE

def pinch_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME pinches %s" % word_eol[1])
	return xchat.EAT_NONE

def pout_cb(word, word_eol, userdata):
	xchat.command("ME shrugs")
	return xchat.EAT_NONE

def punch_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME punches %s" % word_eol[1])
	return xchat.EAT_NONE

def shrug_cb(word, word_eol, userdata):
	xchat.command("ME shrugs")
	return xchat.EAT_NONE

def smile_cb(word, word_eol, userdata):
	xchat.command("ME smiles")
	return xchat.EAT_NONE

def spank_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME spanks %s" % word_eol[1])
	return xchat.EAT_NONE

def tickle_cb(word, word_eol, userdata):
	if len(word) < 2:
		print otheruser
	else:
		xchat.command("ME tickles %s" % word_eol[1])
	return xchat.EAT_NONE

def wave_cb(word, word_eol, userdata):
	if len(word) < 2:
		xchat.command("ME waves")
	else:
		xchat.command("ME waves to %s" % word_eol[1])
	return xchat.EAT_NONE

def wink_cb(word, word_eol, userdata):
	xchat.command("ME winks")
	return xchat.EAT_NONE

xchat.hook_command("apologize", apologize_cb, help="Use /commands for more information")
xchat.hook_command("applaud", applaud_cb, help="Use /commands for more information")
xchat.hook_command("blink", blink_cb, help="Use /commands for more information")
xchat.hook_command("bounce", bounce_cb, help="Use /commands for more information")
xchat.hook_command("brood", brood_cb, help="Use /commands for more information")
xchat.hook_command("comfort", comfort_cb, help="Use /commands for more information")
xchat.hook_command("commands", commands_cb, help="Use /commands for more information")
xchat.hook_command("cough", cough_cb, help="Use /commands for more information")
xchat.hook_command("cringe", cringe_cb, help="Use /commands for more information")
xchat.hook_command("evileye", evileye_cb, help="Use /commands for more information")
xchat.hook_command("flirt", _cb, help="Use /commands for more information")
xchat.hook_command("frown", frown_cb, help="Use /commands for more information")
xchat.hook_command("giggle", giggle_cb, help="Use /commands for more information")
xchat.hook_command("goose", goose_cb, help="Use /commands for more information")
xchat.hook_command("groan", groan_cb, help="Use /commands for more information")
xchat.hook_command("highfive", highfive_cb, help="Use /commands for more information")
xchat.hook_command("hit", hit_cb, help="Use /commands for more information")
xchat.hook_command("hug", hug_cb, help="Use /commands for more information")
xchat.hook_command("kiss", kiss_cb, help="Use /commands for more information")
xchat.hook_command("laugh", laugh_cb, help="Use /commands for more information")
xchat.hook_command("lick", lick_cb, help="Use /commands for more information")
xchat.hook_command("massage", massage_cb, help="Use /commands for more information")
xchat.hook_command("nudge", nudge_cb, help="Use /commands for more information")
xchat.hook_command("pinch", pinch_cb, help="Use /commands for more information")
xchat.hook_command("pout", pout_cb, help="Use /commands for more information")
xchat.hook_command("punch", punch_cb, help="Use /commands for more information")
xchat.hook_command("shrug", shrug_cb, help="Use /commands for more information")
xchat.hook_command("smile", smile_cb, help="Use /commands for more information")
xchat.hook_command("spank", spank_cb, help="Use /commands for more information")
xchat.hook_command("tickle", tickle_cb, help="Use /commands for more information")
xchat.hook_command("wave", wave_cb, help="Use /commands for more information")
xchat.hook_command("wink", wink_cb, help="Use /commands for more information")
I used FlickrLive a few days ago, and got a nice testimonial from Meer:
"It was a chance meeting in the night--day--like two ships sometimes do before they hit the lighthouse, which really, they shouldn't be doing in the day, 'cuz that says something bad about the intoxication of the captains... ...oh, right. He's one of those guys that you meet once and know is unique. And smart. And a clever writer."

Hello Xchat

An xchat hello command. Why not?

__module_name__ = "hellocmd" 
__module_version__ = "1.0" 
__module_description__ = "A hello command" 
 
import xchat

def hello_cb(word, word_eol, userdata):
	if len(word) < 2:
		print "Hello World!"
	else:
		print "Hello %s" % word_eol[1]
	return xchat.EAT_NONE

xchat.hook_command("hello", hello_cb, help="/hello <person>, says hello 
                   to person, or hello world")
That's just looking to be abused :) Saw this , about WinFS. Nothing like [insert name of query language]. Not at all.

Even more LG headlines...

I didn't realise that last post went through - I thought I cancelled it when Blogger refused to add <script> tags. I found the problem - the apostrophe in Barry O'Donovan's name. I think I'll get rid of the separate paragraphs for article & author, and call it a day.

Wednesday, October 20, 2004

More LG headlines

I added it, the script should work, but doesn't seem to want to. So I'll try including it here, as well as Meerkat's (not Meercat :) output:

Linux Gazette headlines

I came across this O'Reilly page about Meercat a few days ago, and figured the Javascript stuff might be nice to provide for LG fans who want to promote us. I have it embedded in the panel to the right, but it refers to one article in an issue that hasn't been published yet. I think it looks OK, but I'm not done with it yet. When it's ready, the code to add to your site/blog will most likely be:

<script language="javascript" type="text/javascript"
src="http://linuxgazette.net/lg.js"></script>

Scraping the Sun

I read The Sun. There, I admitted it. I work in a factory, and it's good to keep up with the news that everyone else reads, but mostly it's because I like looking at pictures of scantily clad women. [shrug] The Sun has a really annoying site though. Two of the pages I read most often are the "Viral Emails" and Bizarre Exposed. So I figured I'd script some of it. sun-viral.pl: This creates an RDF feed from the Viral page that I can use from Liferea.

#!/usr/bin/perl -w

use strict;
use XML::RSS;
use LWP::Simple;

#my $testpage = "http://www.thesun.co.uk/article/0,,13-2004480024,00.html";
my $toppage = get('http://www.thesun.co.uk/section/0,,1,00.html');
my $page;
my $pageurl;
my $rss = new XML::RSS (version => '1.0');

if ($toppage =~ m!(/article/0,,13[^"']*)!)
{
	 $pageurl = "http://www.thesun.co.uk" . $1;
}

if (!$pageurl)
{
	exit -1;
}
$page = get($pageurl);
$rss->channel(title       => 'The Sun: Viral Emails',
                 link        => $pageurl,
                 description => 'The Sun: Viral Emails');
      
my @lines = split ("<[tT][Rr]", $page);

foreach (@lines)
{
	if (m!'(/popupWindow/0,,13[^"']*)', \d+, \d+, 'email\d+'\);">([^<]*)</A>!i)
	{
		$rss->add_item(title       => "$2",
 			          link        => 'http://www.thesun.co.uk' . $1,
		       	          description => "$pageurl");
	}
}

print $rss->as_string;
sun-bizarre.pl: This scrapes Bizarre (I haven't gotten around to the rest of it yet)
#!/usr/bin/perl -w

use strict;
use XML::RSS;
use LWP::Simple;

my $page = get('http://www.thesun.co.uk/section/0,,4,00.html');

my $rss = new XML::RSS (version => '1.0');

$rss->channel(title       => 'The Sun: Bizarre',
     	      link        => 'http://www.thesun.co.uk/section/0,,4,00.html',
	      description => 'The Sun: Bizarre');

# Eek! .* is a bit much, though it works.
# Matches top story
if ($page =~ m!<tr><td colspan=\"\d\"><a href=\"(/article/0,,.*\.html)\"><img src=\"http://images.thesun.co.uk/picture/.*\.gif\" alt=\"([^"]*)\"!)
{
	#print "$1, $2";
	$rss->add_item(title => "$2",
	 	       link =>  'http://www.thesun.co.uk' . $1);
}

# Other stories

my @lines = split ("<[tT][Rr]", $page);

foreach (@lines)
{
	if (m!<td style=[^>]*><a (class=\"[^"]+\" )?href=\"(/article/0,,.*\.html)\"[^>]*>([^<]*)</a><br>[\r\n]*<span[^>]*>([^<]*)[\r\n]*<[/]?td!is)	
	{
#		print "$2, $3, $4";
		$rss->add_item(title       => "$3",
			       link        => 'http://www.thesun.co.uk' . $2,
			       description => "$4");
	}
}

print $rss->as_string;
sun-pic.pl: This downloads the "Click here" images -- I don't like popups, and I don't like having popups open as new tabs much eithier.
#!/usr/bin/perl -w

use strict;
use LWP::Simple;

my $uri = shift;
my $page = get ($uri);

my $uris;

foreach (split ("\r\n", $page))
{
	if (m!(/popupWindow/[^.]*.html)!i)
	{
		$uris .= "http://www.thesun.co.uk$1 ";
	}
	elsif (m!window.open\('(http://images.thesun.co.uk/picture/0,,[^,]*,00.[gj][ip][fg])'!i)
	{
		$uris .= "$1 ";
	}
}
`cd /home/jimmy/.download && wget --referer=$uri -x $uri $uris`;

foreach (split (" ", $uris))
{
	if (/htm[l]+$/i)
	{
		$page = get ($_);
		foreach (split ("\r\n", $page))
		{
			if (m!(http://images.thesun.co.uk/picture[s]?/0,,[^,]*,00.[gj][ip][fg])!i)
			{
				`cd /home/jimmy/Pictures && wget $1`;
			}
		}
	}
}

Wednesday, October 06, 2004

Another week

Another week off work. Last Thursday I went to Cashel to get my stiches removed, and the nurse put a finger cot on my finger, which was extremely painful. The next day I decided to do a mass hair removal, and it got soaked in the shower. Removed it afterwards to find that it had been cutting off the bloodflow to the finger. Got a lot of odd feelings in the finger, so I went to the factory doctor, who told me I couldn't be around extremes of temperature, so no work this week. Been reading and watching movies mostly. I got "Going Postal" last Thursday, reread "Night Watch" and "Equal Rites", started reading "System of the World", and watched "The Pianist" among others. Collected quotes, as usual. Equal Rites: "a hint was to Esk what a mosquito bite was to the average rhino because she was already learning that if you ignore the rules people will, half the time, quietly rewrite them so that they don't apply to you." "Esk, of course, had not been trained, and it is well known that a vital ingredient of success is not knowing that what you're attempting can't be done. A person ignorant of the possibility of failure can be a halfbrick in the path of the bicycle of history." "They both savoured the strange warm glow of being much more ignorant than ordinary people, who were ignorant of only ordinary things." Night Watch: "It wasn't that he'd liked being shot at by hooded figures in the temporary employ of his many and varied enemies, but he'd always looked at it as some kind of vote of confidence. It showed that he was annoying the rich and arrogant people who ought to be annoyed." "'Yeah, all right, but everyone knows they torture people,' mumbled Sam. 'Do they?' said Vimes. 'Then why doesn't anyone do anything about it?' ''cos they torture people.' Ah, at least I was getting a grasp of basic social dynamics, thought Vimes." "Ninety per cent of most magic merely consists of knowing one extra fact." "This was a very thorough cell. Not even sound was meant to escape" Going Postal: "The broom must have been kept as an ornament, because it certainly hadn't been used much on the accumulations in the stable yard. On the positive side, it meant that he had fallen into something soft. On the negative side, this meant he had fallen into something soft." "Dimwell Arrhythmic Rhyming Slang: Various rhyming slangs are known, and have given the universe such terms as 'apples and pears' (stairs), 'rubbity-dub' (pub) and 'busy bee' (General Theory of Relativity). The Dimwell Street rhyming slang is probably unique in that it does not, in fact, rhyme. No one knows why, but theories so far advanced are 1) that it is quite complex and in fact follows hidden rules or 2) Dimwell is well named or 3) it's made up to annoy strangers, which is the case with most such slangs." "What kind of man would put a known criminal in charge of a major branch of government? Apart from, say, the average voter." "And if Moist von Lipwig couldn't cream a little somethi-- a big something off the top, and the bottom, and maybe a little off the sides, then he didn't deserve to!" "'You wrote everything down, Crispin?' he said. 'Why?' Crispin looked apalled. 'Got to keep records, Reacher,' he said. 'Can't cover y'tracks if you don't know where y'left 'em. Then ... can put it all back, see, hardly a crime at all.'" "You had to admire the way perfectly innocent words were mugged, ravished, stripped of all true meaning and decency and then sent to walk the gutter for Reacher Gilt, although 'synergistically' had probably been a whore from the start."