Skip to content
Snippets Groups Projects
user avatar
David Maus authored
af725d23
History

PicaReader – Classes for reading Pica+ records

About

PicaReader provides classes for reading Pica+ records encoded in PicaXML and PicaPlain.

PicaReader is copyright (c) 2012 by Herzog August Bibliothek Wolfenbüttel and released under the terms of the GNU General Public License v3.

Installation

PicaReader should be installed using the PEAR Installer. This installer is the PHP community’s de-facto standard for installing PHP packages.

pear channel-discover hab20.hab.de/service/pear
pear install --alldeps hab20.hab.de/service/pear/PicaReader

Usage

All readers adhere to the same interface. You open the reader with a string of input data by calling Reader::open() and can call Reader::read() to read the next record in the input data. If the input does not contain (anymore) records Reader::read() returns FALSE. Otherwise it returns either a record object created with PicaRecord’s Record::factory() function.

$reader = new \HAB\Pica\Reader\PicaXmlReader()
$record = $reader->read(file_get_contents('http://unapi.gbv.de?id=opac-de-23:ppn:635012286&format=picaxml'));
$reader->close();

To filter out records or fields you can attach a filter to the reader via Reader::setFilter(). A filter is any valid PHP callback that takes an associative array representing the record as argument and returns a possibly modified array or FALSE if the entire record should be skipped.

The array representation of a record is defined as follows:

RECORD   := array('fields' => array(FIELD, …))
FIELD    := array('tag' => TAG, 'occurrence' => OCCURRENCE, 'subfields' => array(SUBFIELD, …))
SUBFIELD := array('code' => CODE, 'value' => VALUE)

Where TAG, OCCURRENCE, CODE, and VALUE are the respective properties of a Pica+ field or subfield.

For example, if your source delivers malformed PicaXML records like so:

<?xml version="1.0" encoding="UTF-8"?>
<record xmlns="info:srw/schema/5/picaXML-v1.0">
  <datafield tag="">
  </datafield>
  <datafield tag="001A">
    <subfield code="0">0001:14-09-10</subfield>
  </datafield>
  …
</record>

You can attach a filter function to remove these fields with an invalid tag:

$reader = new PicaXmlReader();
$reader->setFilter(function (array $r) { 
    return array('fields' => array_filter($r['fields'],
                                          function (array $f) {
                                            return isset($f['tag']) && \HAB\Pica\Record\Field::isValidFieldTag($f['tag']);
                                          }));
  });
$record = $reader->read(…);
$reader->close();

Development

If you want to patch or enhance this component, you will need to create a suitable development environment. The easiest way to do that is to install phix4componentdev:

apt-get install php5-xdebug
apt-get install php5-imagick
pear channel-discover pear.phix-project.org
pear -D auto_discover=1 install -Ba phix/phix4componentdev

You can then clone the Git repository:

git clone git://gitorious.org/php-pica/picareader.git

Then, install a local copy of the package’s dependencies to complete the development environment:

phing build-vender

To make life easier for you, common tasks (such as running unit tests, generating code review analytics, and creating the PEAR package) have been automated using Phing. You’ll find the automated steps inside the build.xml file that ships with the component.

Run the command ‘phing’ in the component’s top-level folder to see the full list of available automated tasks.

Acknowledgements

Footnotes