URL Checker Report

This report searches the catalog database and retrieves all records with 856 fields. Then each selected 856 |u is checked against the Internet. It sometimes takes a long time just to open one URL, so if many URLs have problems, the report may take a very long time to run.

This report is password protected. If you do not know the password, see your supervisor.
The URL Checker (Urlchecker) report finds and reports on URL links in your catalog. Optionally, the report may be used to update catalog URL links when ”referral” URLs are found. The URL Checker report is used to monitor and correct the 856 |u data to ensure that the hypertext link always displays a valid web page.
The URL Checker report contains the following tabs.
• | Basic Tab: Reports |
• | URL Checker Selection Tab |
• | Title Selection Tab |
• | URL Checker Options Tab |
The first time the report is run, particularly on systems with many URLs to check, the Date Range gadget should be used to limit the URLs that are selected to a particular date of creation or date cataloged. URLs can be corrected in batches by running the report for a year or month at a time, depending on the number or URLs you have to check.
Electronic Location and Access (856) tags contain a subfield u that contains a URL, and potentially several other subfields that define access to data outside the catalog. When a |u is present, the Electronic Access field in the catalog displays the URL. Or, if a |z is also present, the text of that subfield, such as ”Click here to go to the web site,” displays in the catalog instead of the URL. When the hypertext in the Electronic Access field of the catalog record is clicked, a web browser is launched and the web page displays. The URL Checker creates a list of all valid MARC 856 Electronic Location and Access tags and then determines whether the web page associated with each 856 displays successfully or not. You may select to report valid URLs, or not.
The catalog only uses 856 subfield u and subfield z. For this reason the report only validates URLs that contain the http:// prefix. The other standard prefixes such as ftp://, telnet:://, or https::// require the use of a logon name and password that are stored in other 856 subfields. The catalog currently does not use these subfields, so a successful match against the web would never be successful for URLs containing other prefixes.
On the URL Checker Selection tab, you can specify a list of particular URLs that should be excluded by using the URLs to Exclude field. Using the String List gadget, you can enter an asterisk after a domain name to exclude all of the URLs associated with a particular domain name. The report assumes the http:// prefix so that you may or may not add the prefix when creating the list of URLs to exclude. The report does not assume the leading ”www” text in the web site name. On this tab, you can also can decide to include bibliographic and/or holding records in the report.
On the Title Selection tab, basic title selections may also be made, such as limiting the URLs checked to records that were recently cataloged or modified.
On the URL Checker Options tab, all of the options are cleared. If the report is run without selecting any check boxes on this tab, only valid 856 entries that did not find a matching web site, or that found a referral web site, are printed in the results.
The ”If a ”Referral URL” is found, update the 856 subfield u?” check box allows you to update any catalog URL that points to a referral web site. Only web sites that contain an automatic referral are updated. If a site has moved and contains a link to an alternate page, but does not have an automatic refresh option in the page’s HTML, the page will be reported as found. Additional options on the URL Checker Options tab allow you to determine whether an 856 should be reported if it does not contain a |u at all, if the |u does not contain the required http:// prefix, or if more than one |u is displayed in a single 856.
In the report log, selections made on the Title Selection tab, URL Checker Selection tab, and the URL Checker Options tab are reported. A statistics summary displays for all selected 856 fields. Both bibliographic and holdings summaries are printed. For each 856 tag, the title control number, the title, the 856 entry, and the reported results are printed. For MHFD records, the library and location are also printed. You can make selections on the URL Checker Options tab to report additional information or to correct catalog URLs.
If the report encounters a web site that takes too long to access, it times out for that site and continues on to the next site. If a web site takes longer than 100 seconds to load, the site is reported as ”not found.”
In libraries that use proxy servers to access the Internet, the URL Checker report will check to see if the Unicorn\Webcat\Config\proxy.cfg file exists. If the file exists, the report reads the file to determine the proxy server address and port number, and accesses the Internet using the proxy server to begin checking URLs. If the proxy.cfg file does not exist, or if the information it contains is incorrect, the URL Checker report will stop processing and print a message in the report log.
Related topics