why punctuation matters.

James Weinheimer somewhat recently (on my timescale, anyway) defended the terrible, cryptic abbreviations of cataloging.

I particularly liked this bit:

I think it is absolutely vital for librarians and catalogers to stop thinking that the text that is entered into a database or a web page is static and cannot be transformed. That is card thinking. Today there are incredible things that can be done using all kinds of tools from scripting to style-sheets to browser add-ons and who knows what else? Look at Google Translate, and think about how a much simplified tool could reformat abbreviations. And not just for English speakers, but properly done, such a tool could work for all languages who could look and work with exactly the same records.

Anything in a webpage can be transformed if you want it to be transformed and there are lots of possible ways of doing it. It can be done on the server, or it can be done on each client’s computer. This is a basic change in how people can work with our records (and how I hope they want to work with our records, if we’re lucky) that has yet to be thoroughly understood and addressed.

Yes. And now, in the same spirit, I will defend our cryptic, senseless attachment to punctuation. My claim is that punctuation in a catalog record is still important, and that we should continue to pay attention to it and do it correctly.

Ideally, we would not need to type in the punctuation. Ideally we’d have a more modern standard for encoding cataloging data. For now, in real life, we are stuck with crap ILSs and MARC for a while longer. And so the way we can manipulate our data is perhaps a bit less elegant than it could be. But encoding our data using the standards we have (MARC and ISBD punctuation) in the correct, standard way, would greatly simplify things.

That is kind of the point of standards, after all.

When I started in my current position, the 300s were being deleted from some batches of vendor records in the local editing process. Keeping the 300 data in was one of the first decisions I made here.

Ideally, I would have been able to edit my assistant’s batch-editing instruction sheets from: Delete 300 to: Do this regex find/replace on the 300. And that regex would insert “1 online resource (” at the beginning of $a. Then it would insert “)” right before the first occurrence of ” :” or ” ;”.

It would be utterly trivial to update this print-centric field to the current recommended format.

But no. This won’t work because of the sloppiness of the encoding (MARC and ISBD punctuation).

In real life, to make this simple change, I first have to identify all the ways the 300 encoding is screwed up in the file. I find treasures like:

=300 \$axviii, 405 p.$bill.
=300 \$axi, 372 p. :$bill.23cm.
=300 \$a279p ; 17 maps.
=300 \$a52 leaves :$c15 cm

Every time I think I have collected all possible weird permutations of punctuation and coding errors, and need to sit down to construct a frightening regular expression find/replace, I happen across something new.

Then I go through multiple steps to standardize the 300 fields. Depending on the size and crappiness of the batch of records, this can take 15 minutes to hours.

Then I can run the find/replace that I should be able to just run. Which takes at most a few seconds.

Something easy (making sure a $b is before any ill. statements and there is a ” :” before the $b) would make this easy. But punctuation is seen as unimportant and fiddly, anal-retentive cataloger stuff, and it’s just too hard to pay attention, or to build a cataloging system that will reject a record with a 300 field containing “ill.” or “maps” or “ports.” but lacking a ” :$b”.

Now this is a pathetically simple example, and someone with more regex-fu than me might have a magic find/replace combo that can parse any weird 300 thrown at it. (If so, and if you are reading this, please share.)

But you shouldn’t have to be a regex wizard to perform a simple data transformation.

And I will avoid sliding into a whole other tangent, but it seems clear that the movement is toward caring less and less about data quality, dumping whatever crap we can buy from Serials Solutions or other vendors into our catalogs. We have to give up control. We have to recognize the new vision of the catalog in the cloud where we can’t have hissy fits over missing colons. As long as there is some version of the title, it’s good enough.

I wonder what this new laid-back catalog in the cloud will look like and how easy it will be to manipulate its records, mash up the data, or do anything cool or useful with the information there. Probably even more frightening than WorldCat, which is saying something.

I (want to) believe that catalogers are not nitpicky because we are all anal-retentive obsessive compulsives who have nothing better to do than gripe about punctuation and pick fleas off of our cats. I (want to) believe that we are so nitpicky because we know there is a reason for being so—that attention to detail the first time around makes everyone’s job easier and quicker in the end.

Leave a Reply

Your email address will not be published. Required fields are marked *