Wed 31 Oct 2007
Here's an interesting one: one of my clients has been seeing mysql
db connections from one of their app servers (and only one) being
periodically locked out, with the following error message reported
when attempting to connect:
Host _hostname_ is blocked because of many connection errors.
Unblock with 'mysqladmin flush-hosts'.
There's no indication in any of the database logs of anything
untoward, or any connection errors at all, in fact. As a workaround,
we've bumped up the max_connect_errors setting on the mysql
instance, and haven't really had time to dig much further.
Till tonight, when I decided to figure out what was going on.
Turns out there's plenty of other people seeing this too, although
MySQL seems to be in "it's not a bug, it's a feature" mode - see
this bug report.
That thread helped clue me in, however. Turns out that mysql counts
any connection to the database, even ones that don't attempt to
make an actual database connection, as a connection error, but they
only log ones that attempt to login. So there's a nice class of
silent errors - and in fact, a nice DOS attack against MySQL - if
you make standard TCP connections to mysql without logging in.
We, being clever and careful, were doing exactly that with
nagios - making a simple TCP connection to
port 3306 - in order to simply and cheaply check that mysql was
listening on that port. Hmmmm.
Easy enough to remedy, of course, once you figure out what's going
on. I even had a nice nagios plugin lying around to let me do more
sophisticated database checks -
check_db_query_rowcount -
so just had to replace the simple check_tcp check with that, and all
is right with the world.
But it's a plain and simple bug, and MySQL need to get it fixed.
Personally I think a simple tcp connection should not count as a
connection error at all without a login attempt (assuming it's not
left half-open etc.). Alternatively, if you do want to count that
as a connection error fine, but at least log some kind of error so
the issue is discoverable and can be handled by someone.
Silent errors are deadly.
Tue 30 Oct 2007
I've tried all three of the current blosxom 'entries' plugins on my
blog in the last few months: entries_cache_meta, entries_cache, and the
original entries_index.
entries_cache_meta is pretty nice, but it doesn't work in static mode,
and its method of capturing the modification date as metadata didn't quite
work how I wanted. I had similar problems with the entries_cache metadata
features, and its caching and reindexing didn't seem to work reliably for me.
entries_index is the simplest of the three, and offers no caching features,
but it's pretty dense code, and didn't offer the killer feature I was after:
the ability to easily update and maintain the publication timestamps it was
indexing.
Thus entries_timestamp is born.
entries_timestamp is based on Rael's entries_index, and like it offers
no caching facilites (at least currently). Its main point of difference
from entries_index is that it maintains two sets of creation
timestamps for each post - a machine-friendly one (a gmtime timestamp)
and a human-friendly one (a timestamp string).
In normal use blosoxm just uses the machine timestamps and works just like
entries_index, just using the timestamps to order posts for presentation.
entries_timestamp also allows modification of the human timestamps,
however, so that if you want to tweak the publish date you just modify
the timestamp string in the entries_timestamp.index metadata file, and
then tell blosxom to update its machine-timestamps from the human- ones by
passing a reindex=<$entries_timestamp::reindex_password> argument to
blosxom i.e.
http://www.domain.com/blosxom.cgi?reindex=mypassword
It also supports migration from an entries_index index file, explicit
symlink support (so you don't have to update timestamps to symlinked
posts explicitly), and has been mostly rewritten to be (hopefully)
easier to read and maintain.
It's available in the
blosxom sourceforge CVS
repository.
Mon 22 Oct 2007
The blosxom SourceForge developers
have been foolish enough to give me a commit bit, so I've been doing
some work lately on better separating code and configuration, primarily
with a view to making blosxom easier to package.
One of the consequences of these changes is that it's now reasonably
easy to run multiple blosxom instances on the same host from a single
blosxom.cgi executable.
A typical cgi apache blosxom.conf might look something like this:
SetEnv BLOSXOM_CONFIG_DIR /etc/blosxom
Alias /blog /usr/share/blosxom/cgi
<Directory /usr/share/blosxom/cgi>
DirectoryIndex blosxom.cgi
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /blog/blosxom.cgi/$1 [L,QSA]
<FilesMatch "\.cgi$">
Options +ExecCGI
</FilesMatch>
</Directory>
The only slightly tricky thing here is the use of mod_rewrite to allow
the blosxom.cgi part to be omitted, so we can use URLs like:
http://www.example.com/blog/foo/bar
instead of:
http://www.example.com/blog/blosxom.cgi/foo/bar
That's nice, but completely optional.
The SetEnv BLOSXOM_CONFIG_DIR setting is the important bit for running
multiple instances - it allows you to specify a location blosxom should
look for all its configuration settings. If we can set this multiple
times to different paths we get multiple blosxom instances quite
straightforwardly.
With separate virtual hosts this is easy - just put the SetEnv
BLOSXOM_CONFIG_DIR inside your virtual host declaration and it gets
scoped properly and everything just works e.g.
<VirtualHost *:80>
ServerName bookmarks.example.com
DocumentRoot /usr/share/blosxom/cgi
AddHandler cgi-script .cgi
SetEnv BLOSXOM_CONFIG_DIR '/home/gavin/bloglets/bookmarks/config'
<Directory /usr/share/blosxom/cgi>
DirectoryIndex blosxom.cgi
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /blosxom.cgi/$1 [L,QSA]
<FilesMatch "\.cgi$">
Options +ExecCGI
</FilesMatch>
</Directory>
</VirtualHost>
It's not quite that easy if you want two instances on same virtual host
e.g. /blog for your blog proper, and /bookmarks for your link blog. You
don't want the SetEnv to be global anymore, and you can't put it inside
the <Directory> section either since you can't repeat that with a single
directory.
One solution - the hack - would be to just make another copy your
blosxom.cgi somewhere else, and use that to give you two separate
directory sections.
The better solution, though, is to use an additional <Location>
section for each of your instances. The only extra wrinkle with this is
if you're using those optional rewrite rules, in which case you have to
duplicate and further qualify them as well, since the rewrite rule itself
is namespaced i.e.
Alias /blog /usr/share/blosxom/cgi
Alias /bookmarks /usr/share/blosxom/cgi
<Directory /usr/share/blosxom/cgi>
DirectoryIndex blosxom.cgi
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} ^/blog
RewriteRule ^(.*)$ /blog/blosxom.cgi/$1 [L,QSA]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} ^/bookmarks
RewriteRule ^(.*)$ /bookmarks/blosxom.cgi/$1 [L,QSA]
<FilesMatch "\.cgi$">
Options +ExecCGI
</FilesMatch>
</Directory>
<Location /blog>
SetEnv BLOSXOM_CONFIG_DIR /home/gavin/blog/config
</Location>
<Location /bookmarks>
SetEnv BLOSXOM_CONFIG_DIR /home/gavin/bloglets/bookmarks/config
</Location>
Because one blosxom just ain't enough ...
Tue 16 Oct 2007
I've been meaning to document the set of firefox extensions I'm currently
using, partly to share with others, partly so they're easy to find and install
when I start using a new machine, and partly to track the way my usage changes
over time. Here's the current list:
Obligatory Extensions
Greasemonkey - the
fantastic firefox user script manager, allowing
client-side javascript scripts to totally transform any web page before it
gets to you. For me, this is firefox's "killer feature" (and see below for
the user scripts I recommend).
Flash Block - disable
flash and shockwave content from running automatically, adding placeholders
to allow running manually if desired (plus per-site whitelists, etc.)
AdBlock Plus - block
ad images via a right-click menu option
Chris Pederick's
Web Developer Toolbar - a
fantastic collection of tools for web developers
Joe Hewitt's Firebug -
the premiere firefox web debugging tool - its html and css inspection
features are especially cool
Daniel Lindkvist's
Add Bookmark Here
extension, adding a menu item to bookmark toolbar dropdowns to add the
current page directly in the right location
Optional Extensions
Michael Kaply's Operator -
a very nice microformats toolbar, for discovering
the shiny new microformats embedded in web pages, and providing operations you
can perform on them
Zotero - a very
interesting extension to help capture and organise research information,
including webpages, notes, citations, and bibliographic information
Colorful Tabs - tabs +
eye candy - mmmmm!
Chris Pederick's
User Agent Switcher -
for braindead websites that only think they need IE
ForecastFox - nice
weather forecast widgets in your firefox status bar (and not just
US-centric)
Greasemonkey User Scripts
So what am I missing here?
Updates:
Since this post, I've added the following to my must-have list:
Tony Murray's Print Hint -
helps you find print stylesheets and/or printer-friendly versions of pages
the Style Sheet Chooser II
extension, which extends firefox's standard alternate stylesheet selection
functionality
Ron Beck's JSView
extension, allowing you to view external javascript and css styles used
by a page
The It's All Text
extension, allowing textareas to be editing using the external editor of
your choice.
The Live HTTP Headers
plugin - invaluable for times when you need to see exactly what is going on
between your browser and the server
Gareth Hunt's Modify Headers
plugin, for setting arbitrary HTTP headers for web development
Sebastian Tschan's Autofill Forms
extension - amazingly useful for autofilling forms quickly and efficiently
Thu 04 Oct 2007
Today I've been reminded that while the web revolution continues
apace - witness Web 2.0, ajax, mashups, RESTful web services, etc. -
much of the web hasn't yet made it to Web 1.0, let alone Web 2.0.
Take ecommerce.
One of this afternoon's tasks was this: order some graphics cards
for a batch of workstations. We had a pretty good idea of the kind
of cards we wanted - PCIe Nvidia 8600GT-based cards. The unusual
twist today was this: ideally we wanted ones that would only take
up a single PCIe slot, so we could use them okay even if the
neighbouring slot was filled i.e.
select * from graphics_cards
where chipset_vendor = 'nvidia'
and chipset = '8600GT'
order by width desc;
or something. Note that we don't even really care much about price.
We just need some retailer to expose the data on their cards in a
useful sortable fashion, and they would get our order.
In practice, this is Mission Impossible.
Mostly, merchants will just allow me to drill down to their
graphics cards page and browse the gazillion cards they have
available. If I'm lucky, I'll be able to get a view that only
includes Nvidia PCIe cards. If I'm very lucky, I might even be
able to drill down to only 8000-series cards, or even 8600GTs.
Some merchants also allow ordering on certain columns, which
is actually pretty useful when you're buying on price. But none
seem to expose RAM or clockspeeds in list view, let alone card
dimensions.
And even when I manually drill down to the cards themselves,
very few have much useful information there. I did find two
sites that actually quoted the physical dimensions for some
cards, but the in both cases the numbers they were quoting
seemed bogus.
Okay, so how about we try and figure it out from the
manufacturer's websites?
This turns out to be Mission Impossible II. The manufacturer's
websites are all controlled by their marketing departments and
largely consist of flash demos and brochureware. Even finding
a particular card is an impressive feat, even if you have the
merchant's approximation of its name. And when you do they often
have less information than the retailers'. If there is any
significant data available for a card, it's usually in a pdf
datasheet or a manual, rather than available on a webpage.
Arrrghh!
So here are a few free suggestions for all and sundry, born
out of today's frustration.
For manufacturers:
use part numbers - all products need a unique identifier,
like books have an ISBN. That means I don't have to try and
guess whether your 'SoFast HyperFlapdoodle 8600GT' is the
same things as the random mislabel the merchant put on it.
provide a standard url for getting to a product page given
your part number. I know, that's pretty revolutionary, but
maybe take a few tips from google instead of just listening
to your marketing department e.g.
http://www.supervidio.com.tw/?q=sofast-hf-8600gt-256
keep old product pages around, since people don't just buy
your latest and greatest, and products take a long time to
clear in some parts of the world
include some data on your product pages, rather than
just your brochureware. Put it way down the bottom of the
page so your marketing people don't complain as much. For
bonus points, mark it up with semantic microformat-type
classes to make parsing easier.
alternatively, provide dedicated data product pages, perhaps
in xml, optimised for machine use rather than marketing.
They don't even have to be visible via browse paths, just
available via search urls given product ids.
For merchants:
include manufacturer's part numbers, even if you want to
use your own as the primary key. It's good to let your
customers get additional information from the manufacturer,
of course.
provide links at least to the manufacturer's home page, and
ideally to individual product pages
invest in your web interface, particularly in terms of
filtering results. If you have 5 items that are going to
meet my requirements, I want to be able to filter down to
exactly and only those five, instead of having to hunt for
them among 50. Price is usually an important determiner of
shopping decisions, of course, but if I have two merchants
with similar pricing, one of whom let me find exactly the
target set I was interested in, guess who I'm going to buy
from?
do provide as much data as possible as conveniently as
possible for shopping aggregators, particularly product
information and stock levels. People will build useful
interfaces on top of your data if you let them, and will
send traffic your way for free. Pricing is important, but
it's only one piece of the equation.
simple and useful beats pretty and painful - in particular,
don't use frames, since they break lots of standard web
magic like bookmarking and back buttons; don't do things
like magic javascript links that don't work in standard
browser fashion; and don't open content in new windows for
me - I can do that myself
actively solicit feedback from your customers - very few
people will give you feedback unless you make it very clear
you welcome and appreciate it, and when you get it, take it
seriously
End of rant.
So tell me, are there any clueful manufacturers and merchants
out there? I don't like just hurling brickbats ...