PURLs are only as good as the maintenance work that has gone into updating the underlying URLs when they inevitably change. And in the lucky cases where the underlying URL haven’t changed, all the work that has gone into managing the infrastructure behind that URL namespace in order for that URL to stay the same. how many of the PURLs still work? This is complex enough for an actual research project and not just a quick blog. Over in the notebook I started by sampling all the target URLs (N=405637 n=662). In the process I noticed that it was oversampling some domains quite a bit like my.yoolib.net. So I tried again, but instead of sampling all the URLs I sampled the PURL namespaces (N=21894, n=644) and picked a random URL from each PURL namespace. This seemed to work better but still seemed to oversample, with hostnames list http://www.olemiss.edu showing up quite a bit. It looks like they might create a new PURL namespace for every finding aid they put up.
Of course, testing whether a URL still works is surprisingly tricky business: the response could be 200 OK but say Not Found, or it could be a totally different page (content drift)
