MySQL blocked hosts

Here's an interesting one: one of my clients has been seeing mysql db connections from one of their app servers (and only one) being periodically locked out, with the following error message reported when attempting to connect:

Host _hostname_ is blocked because of many connection errors.
Unblock with 'mysqladmin flush-hosts'.

There's no indication in any of the database logs of anything untoward, or any connection errors at all, in fact. As a workaround, we've bumped up the max_connect_errors setting on the mysql instance, and haven't really had time to dig much further.

Till tonight, when I decided to figure out what was going on.

Turns out there's plenty of other people seeing this too, although MySQL seems to be in "it's not a bug, it's a feature" mode - see this bug report.

That thread helped clue me in, however. Turns out that mysql counts any connection to the database, even ones that don't attempt to make an actual database connection, as a connection error, but they only log ones that attempt to login. So there's a nice class of silent errors - and in fact, a nice DOS attack against MySQL - if you make standard TCP connections to mysql without logging in.

We, being clever and careful, were doing exactly that with nagios - making a simple TCP connection to port 3306 - in order to simply and cheaply check that mysql was listening on that port. Hmmmm.

Easy enough to remedy, of course, once you figure out what's going on. I even had a nice nagios plugin lying around to let me do more sophisticated database checks - check_db_query_rowcount - so just had to replace the simple check_tcp check with that, and all is right with the world.

But it's a plain and simple bug, and MySQL need to get it fixed. Personally I think a simple tcp connection should not count as a connection error at all without a login attempt (assuming it's not left half-open etc.). Alternatively, if you do want to count that as a connection error fine, but at least log some kind of error so the issue is discoverable and can be handled by someone.

Silent errors are deadly.

'entries_timestamp' blosxom plugin

I've tried all three of the current blosxom 'entries' plugins on my blog in the last few months: entries_cache_meta, entries_cache, and the original entries_index.

entries_cache_meta is pretty nice, but it doesn't work in static mode, and its method of capturing the modification date as metadata didn't quite work how I wanted. I had similar problems with the entries_cache metadata features, and its caching and reindexing didn't seem to work reliably for me. entries_index is the simplest of the three, and offers no caching features, but it's pretty dense code, and didn't offer the killer feature I was after: the ability to easily update and maintain the publication timestamps it was indexing.

Thus entries_timestamp is born.

entries_timestamp is based on Rael's entries_index, and like it offers no caching facilites (at least currently). Its main point of difference from entries_index is that it maintains two sets of creation timestamps for each post - a machine-friendly one (a gmtime timestamp) and a human-friendly one (a timestamp string).

In normal use blosoxm just uses the machine timestamps and works just like entries_index, just using the timestamps to order posts for presentation. entries_timestamp also allows modification of the human timestamps, however, so that if you want to tweak the publish date you just modify the timestamp string in the entries_timestamp.index metadata file, and then tell blosxom to update its machine-timestamps from the human- ones by passing a reindex=<$entries_timestamp::reindex_password> argument to blosxom i.e.

http://www.domain.com/blosxom.cgi?reindex=mypassword

It also supports migration from an entries_index index file, explicit symlink support (so you don't have to update timestamps to symlinked posts explicitly), and has been mostly rewritten to be (hopefully) easier to read and maintain.

It's available in the blosxom sourceforge CVS repository.

Multiple Blosxom Instances

The blosxom SourceForge developers have been foolish enough to give me a commit bit, so I've been doing some work lately on better separating code and configuration, primarily with a view to making blosxom easier to package.

One of the consequences of these changes is that it's now reasonably easy to run multiple blosxom instances on the same host from a single blosxom.cgi executable.

A typical cgi apache blosxom.conf might look something like this:

SetEnv BLOSXOM_CONFIG_DIR /etc/blosxom
Alias /blog /usr/share/blosxom/cgi
<Directory /usr/share/blosxom/cgi>
  DirectoryIndex blosxom.cgi
  RewriteEngine on
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteRule ^(.*)$ /blog/blosxom.cgi/$1 [L,QSA]
  <FilesMatch "\.cgi$">
    Options +ExecCGI
  </FilesMatch>
</Directory>

The only slightly tricky thing here is the use of mod_rewrite to allow the blosxom.cgi part to be omitted, so we can use URLs like:

http://www.example.com/blog/foo/bar

instead of:

http://www.example.com/blog/blosxom.cgi/foo/bar

That's nice, but completely optional.

The SetEnv BLOSXOM_CONFIG_DIR setting is the important bit for running multiple instances - it allows you to specify a location blosxom should look for all its configuration settings. If we can set this multiple times to different paths we get multiple blosxom instances quite straightforwardly.

With separate virtual hosts this is easy - just put the SetEnv BLOSXOM_CONFIG_DIR inside your virtual host declaration and it gets scoped properly and everything just works e.g.

<VirtualHost *:80>
ServerName bookmarks.example.com
DocumentRoot /usr/share/blosxom/cgi
AddHandler cgi-script .cgi
SetEnv BLOSXOM_CONFIG_DIR '/home/gavin/bloglets/bookmarks/config'
<Directory /usr/share/blosxom/cgi>
  DirectoryIndex blosxom.cgi
  RewriteEngine on
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteRule ^(.*)$ /blosxom.cgi/$1 [L,QSA]
  <FilesMatch "\.cgi$">
    Options +ExecCGI
  </FilesMatch>
</Directory>
</VirtualHost>

It's not quite that easy if you want two instances on same virtual host e.g. /blog for your blog proper, and /bookmarks for your link blog. You don't want the SetEnv to be global anymore, and you can't put it inside the <Directory> section either since you can't repeat that with a single directory.

One solution - the hack - would be to just make another copy your blosxom.cgi somewhere else, and use that to give you two separate directory sections.

The better solution, though, is to use an additional <Location> section for each of your instances. The only extra wrinkle with this is if you're using those optional rewrite rules, in which case you have to duplicate and further qualify them as well, since the rewrite rule itself is namespaced i.e.

Alias /blog /usr/share/blosxom/cgi
Alias /bookmarks /usr/share/blosxom/cgi
<Directory /usr/share/blosxom/cgi>
  DirectoryIndex blosxom.cgi
  RewriteEngine on
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_URI} ^/blog
  RewriteRule ^(.*)$ /blog/blosxom.cgi/$1 [L,QSA]
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_URI} ^/bookmarks
  RewriteRule ^(.*)$ /bookmarks/blosxom.cgi/$1 [L,QSA]
  <FilesMatch "\.cgi$">
    Options +ExecCGI
  </FilesMatch>
</Directory>
<Location /blog>
  SetEnv BLOSXOM_CONFIG_DIR /home/gavin/blog/config
</Location>
<Location /bookmarks>
  SetEnv BLOSXOM_CONFIG_DIR /home/gavin/bloglets/bookmarks/config
</Location>

Because one blosxom just ain't enough ...

Top Firefox Extensions

I've been meaning to document the set of firefox extensions I'm currently using, partly to share with others, partly so they're easy to find and install when I start using a new machine, and partly to track the way my usage changes over time. Here's the current list:

Obligatory Extensions

  • Greasemonkey - the fantastic firefox user script manager, allowing client-side javascript scripts to totally transform any web page before it gets to you. For me, this is firefox's "killer feature" (and see below for the user scripts I recommend).

  • Flash Block - disable flash and shockwave content from running automatically, adding placeholders to allow running manually if desired (plus per-site whitelists, etc.)

  • AdBlock Plus - block ad images via a right-click menu option

  • Chris Pederick's Web Developer Toolbar - a fantastic collection of tools for web developers

  • Joe Hewitt's Firebug - the premiere firefox web debugging tool - its html and css inspection features are especially cool

  • Daniel Lindkvist's Add Bookmark Here extension, adding a menu item to bookmark toolbar dropdowns to add the current page directly in the right location

Optional Extensions

  • Michael Kaply's Operator - a very nice microformats toolbar, for discovering the shiny new microformats embedded in web pages, and providing operations you can perform on them

  • Zotero - a very interesting extension to help capture and organise research information, including webpages, notes, citations, and bibliographic information

  • Colorful Tabs - tabs + eye candy - mmmmm!

  • Chris Pederick's User Agent Switcher - for braindead websites that only think they need IE

  • ForecastFox - nice weather forecast widgets in your firefox status bar (and not just US-centric)

Greasemonkey User Scripts

So what am I missing here?

Updates:

Since this post, I've added the following to my must-have list:

  • Tony Murray's Print Hint - helps you find print stylesheets and/or printer-friendly versions of pages

  • the Style Sheet Chooser II extension, which extends firefox's standard alternate stylesheet selection functionality

  • Ron Beck's JSView extension, allowing you to view external javascript and css styles used by a page

  • The It's All Text extension, allowing textareas to be editing using the external editor of your choice.

  • The Live HTTP Headers plugin - invaluable for times when you need to see exactly what is going on between your browser and the server

  • Gareth Hunt's Modify Headers plugin, for setting arbitrary HTTP headers for web development

  • Sebastian Tschan's Autofill Forms extension - amazingly useful for autofilling forms quickly and efficiently

Rant: How To Not Sell Stuff

Today I've been reminded that while the web revolution continues apace - witness Web 2.0, ajax, mashups, RESTful web services, etc. - much of the web hasn't yet made it to Web 1.0, let alone Web 2.0.

Take ecommerce.

One of this afternoon's tasks was this: order some graphics cards for a batch of workstations. We had a pretty good idea of the kind of cards we wanted - PCIe Nvidia 8600GT-based cards. The unusual twist today was this: ideally we wanted ones that would only take up a single PCIe slot, so we could use them okay even if the neighbouring slot was filled i.e.

select * from graphics_cards 
where chipset_vendor = 'nvidia'
and chipset = '8600GT'
order by width desc;

or something. Note that we don't even really care much about price. We just need some retailer to expose the data on their cards in a useful sortable fashion, and they would get our order.

In practice, this is Mission Impossible.

Mostly, merchants will just allow me to drill down to their graphics cards page and browse the gazillion cards they have available. If I'm lucky, I'll be able to get a view that only includes Nvidia PCIe cards. If I'm very lucky, I might even be able to drill down to only 8000-series cards, or even 8600GTs.

Some merchants also allow ordering on certain columns, which is actually pretty useful when you're buying on price. But none seem to expose RAM or clockspeeds in list view, let alone card dimensions.

And even when I manually drill down to the cards themselves, very few have much useful information there. I did find two sites that actually quoted the physical dimensions for some cards, but the in both cases the numbers they were quoting seemed bogus.

Okay, so how about we try and figure it out from the manufacturer's websites?

This turns out to be Mission Impossible II. The manufacturer's websites are all controlled by their marketing departments and largely consist of flash demos and brochureware. Even finding a particular card is an impressive feat, even if you have the merchant's approximation of its name. And when you do they often have less information than the retailers'. If there is any significant data available for a card, it's usually in a pdf datasheet or a manual, rather than available on a webpage.

Arrrghh!

So here are a few free suggestions for all and sundry, born out of today's frustration.

For manufacturers:

  • use part numbers - all products need a unique identifier, like books have an ISBN. That means I don't have to try and guess whether your 'SoFast HyperFlapdoodle 8600GT' is the same things as the random mislabel the merchant put on it.

  • provide a standard url for getting to a product page given your part number. I know, that's pretty revolutionary, but maybe take a few tips from google instead of just listening to your marketing department e.g. http://www.supervidio.com.tw/?q=sofast-hf-8600gt-256

  • keep old product pages around, since people don't just buy your latest and greatest, and products take a long time to clear in some parts of the world

  • include some data on your product pages, rather than just your brochureware. Put it way down the bottom of the page so your marketing people don't complain as much. For bonus points, mark it up with semantic microformat-type classes to make parsing easier.

  • alternatively, provide dedicated data product pages, perhaps in xml, optimised for machine use rather than marketing. They don't even have to be visible via browse paths, just available via search urls given product ids.

For merchants:

  • include manufacturer's part numbers, even if you want to use your own as the primary key. It's good to let your customers get additional information from the manufacturer, of course.

  • provide links at least to the manufacturer's home page, and ideally to individual product pages

  • invest in your web interface, particularly in terms of filtering results. If you have 5 items that are going to meet my requirements, I want to be able to filter down to exactly and only those five, instead of having to hunt for them among 50. Price is usually an important determiner of shopping decisions, of course, but if I have two merchants with similar pricing, one of whom let me find exactly the target set I was interested in, guess who I'm going to buy from?

  • do provide as much data as possible as conveniently as possible for shopping aggregators, particularly product information and stock levels. People will build useful interfaces on top of your data if you let them, and will send traffic your way for free. Pricing is important, but it's only one piece of the equation.

  • simple and useful beats pretty and painful - in particular, don't use frames, since they break lots of standard web magic like bookmarking and back buttons; don't do things like magic javascript links that don't work in standard browser fashion; and don't open content in new windows for me - I can do that myself

  • actively solicit feedback from your customers - very few people will give you feedback unless you make it very clear you welcome and appreciate it, and when you get it, take it seriously

End of rant.

So tell me, are there any clueful manufacturers and merchants out there? I don't like just hurling brickbats ...