Data::PcAxis - A simple interface to the PC-Axis file format
version 0.0.7
use Data::PcAxis;
# Constructor
my $px = Data::PcAxis->new('path/to/pcaxis/file');
# Basic metadata access
my $metadata = $px->metadata;
my @keywords = $px->keywords;
my $value = $px->keyword($keyword);
# Accessing variables
my @vars = $px->variables;
my $num_vars = $px->variables;
my $var_name = $px->var_by_idx($idx);
my $index = $px->var_by_rx($regex);
my @indices = TODO
# Accessing values and codes for variables...
## ...by index
my $val_names = $px->vals_by_idx($var_idx);
my $val_codes = $px->codes_by_idx($var_idx);
## ...by name
my $val_names = $px->vals_by_name($var_name);
my $val_codes = $px->codes_by_name($var_name);
## count number of possible values for each variable
my $counts = $px->val_counts;
## Accessing codes for values and vice versa
my $val_name = $px->val_by_code($var_name, $val_code);
my $val_code = $px->code_by_val($var_name, $val_name);
# Accessing data
## Return a single datum
my $datum = $px->datum(@indices);
## Return a column of data
my $datacol = $px->datacol(['*', $idx_1, $idx_2, $idx_n]);
Data::PcAxis is a module for extracting data (and metadata) from PC-Axis files.
PC-Axis is a file format format used for dissemination of statistical information. The format is used by a number of national statistical organisations to disseminate official statistics.
The following terms have the following specific meanings in this document:
Keyword
This is the top level key of the metadata hash and is equivalent to the PC-Axis Keyword defined in the PC-Axis file format specification (see "REFERENCES"). Keywords are generally CAPITALISED. Examples of keywords would be TITLE, STUB, HEADING, SOURCE, or DATA.
Variable
A variable refers to any of the items that appear under the STUB or HEADING keywords in a PC-Axis dataset. Each value in the DATA section of the dataset represents the combination of a specific value for each variable defined in the metadata. Variables are often referred to in code by array index. A variable's index is its position in a zero-based array consisting of the concatenation of the dataset's STUBs followed by its HEADINGs.
Value
A value is any of the possible values for a variable. Each variable has at least one possible value. Values can be referred to by name or code.
Name
A name is the name of a specific value. For example, the names for possible values for the variable DAY might be 'Monday, Tuesday, Wednesday... etc.'
Code
A code is another way to refer to a possible value for a variable. In general, a value's name is for humans to read and its code is for machines. The codes for the DAY example above might be '01, 02, 03... etc' or 'MON, TUE, WED... etc'.
Datum
A datum is a single data value held in the data array of the Data::PcAxis object or the DATA keyword of the PC-Axis file. Each datum is the value associated with a particular combination of variable values. The number of datums (data) in a PC-Axis dataset is equal to the product of the number of possible values for each variable.
my $px = Data::PcAxis->new('path/to/pcaxis/file');
Creates a new Data::PcAxis object. Takes the path (relative or absolute) to the PC-Axis file that will be represented by the object.
my $hashref = $px->metadata;
Returns a hashref containing the PC-Axis file's metadata. Each of the returned hashref's keys is a metadata keyword from the original PC-Axis file, each of its values is a hashref. Where a keyword in the original PC-Axis file has a single string value (meaning that the value applies to the entire dataset --- e.g. the 'TITLE' keyword), then that keyword's hashref contains a single key, 'TABLE', the value of which is the string to which that keyword pointed to in the original PC-Axis file.
my @keywords = $px->keywords;
Returns an array containing all of the metadata keywords associated with the PC-Axis datasets currently represented by the object.
my $value = $px->keyword('TITLE'); # Returns the value of the 'TITLE' keyword
The keyword method returns the value of the passed keyword. If the keyword holds a value which refers to the entire table (such as the 'TITLE' keyword), then that value is returned as a string. If the keyword passed to the method has different values for each variable (for example, the 'VALUES' and 'CODES' keywords will have a different list of values for each variable), then a hashref pointing to the entire set of values is returned by the method.
my $num_vars = $px->variables; # scalar context
my @vars = $px->variables; # list context
In a scalar context the variables method returns the number of variables represented in the dataset.
In a list context the method returns an array containing the variable names. The variables in the returned array are ordered as they are in the PC-Axis file; first the 'STUB' variables, followed by the 'HEADING' variables.
my $var_name = $px->var_by_idx($idx);
Returns the variable name, at the index passed, in an array composed of [ STUB variables, HEADING variables ], that is the array returned by Data::PcAxis->variables
.
my $index = $px->var_by_rx($pattern);
Returns the index in an array composed of [ STUB variables, HEADING variables ], that is the array returned by Data::PcAxis->variables
, of the first variable the name of which matches the passed pattern.
my $val_names = $px->vals_by_idx($var_idx);
Takes a variable index and returns a reference to an array containing the names of all possible values for the variable represented by that index. The order of the values in the array matches the order in the original file. The order will also match that of arrayrefs returned by Data::PcAxis->codes_by_idx
, Data::PcAxis->vals_by_name
, and Data::PcAxis->codes_by_name
.
my $val_codes = $px->codes_by_idx($var_idx);
Takes a variable index and returns a reference to an array containing the codes for all of the possible values of the variable represented by that index. The order of the codes in the array matches the order in the original file. The order will also match that of arrayrefs returned by Data::PcAxis->vals_by_idx
, Data::PcAxis->vals_by_name
, and Data::PcAxis->codes_by_name
.
my $val_names = $px->vals_by_name($var_name);
Takes a variable name and returns a reference to an array containing the names of all possible values for that variable. The order of the values in the array matches the order in the original file. The order will also match that of arrayrefs returned by Data::PcAxis->vals_by_idx
, Data::PcAxis->codes_by_idx
, and Data::PcAxis->codes_by_name
.
my $val_codes = $px->codes_by_name($var_name);
Takes a variable index and returns a reference to an array containing the codes for all of the possible values of that variable. The order of the codes in the array matches the order in the original file. The order will also match that of arrayrefs returned by Data::PcAxis->vals_by_idx
, Data::PcAxis->codes_by_idx
, and Data::PcAxis->vals_by_name
.
my $counts = $px->val_counts;
Returns a reference to an array of value counts. Each element in the array is the number of possible values in the current PC-Axis dataset for the variable with the same index (in Data::PcAxis->variables
) as the element.
my $val_name = $px->val_by_code($var_name, $val_code);
Takes a variable name and a value code and returns the value which the passed code represents.
my $val_code = $px->code_by_val($var_name, $val_name);
Takes a variable name and a value name and returns the code representing the passed value.
my $datum = $px->datum(@indices);
The datum method takes an arrayref of value indices and returns the data value corresponding to the particular combination of values represented by those indices. The number of elements in the passed arrayref must be equal to the number of variables in the current PC-Axis dataset. Each element must be a positive integer no greater than the number of possible values for the variable it represents.
For example, consider a dataset containing two variables, each of which has two possible values:
my $px = Data::PcAxis->new('path/to/PC-Axis/file');
# Two variables
$px->variables; # ['Sex', 'Year']
# Each variable has two possible values
$px->values('Sex'); # ['Male', 'Female']
$px->values('Year'); # ['2011', '2012']
$px->datum([0,0]); # Data value for males in 2011
$px->datum([0,1]); # Data value for males in 2012
$px->datum([1,0]); # Data value for females in 2011
$px->datum([1,1]); # Data value for females in 2012
my $datacol = $px->datacol(['*', $idx_1, $idx_2, $idx_n]);
The datacol method is similar to the datum method except that one of the elements in the passed arrayref is replaced with a '*' character and rather than returning a single datum it returns a reference to an array of data containing a datum for each possible value for the variable represented by the '*'.
For example, consider a dataset containing two variables, each of which has two possible values:
my $px = Data::PcAxis->new('path/to/PC-Axis/file');
# Two variables
$px->variables; # ['Sex', 'Year']
# Each variable has two possible values
$px->values('Sex'); # ['Male', 'Female']
$px->values('Year'); # ['2011', '2012']
$px->datacol(['*',0]); # [Data value for males in 2011, Data value for females in 2011]
$px->datacol(['*',1]); # [Data value for males in 2012, Data value for females in 2012]
$px->datacol([0,'*']); # [Data value for males in 2011, Data value for males in 2012]
$px->datacol([1,'*']); # [Data value for females in 2011, Data value for females in 2012]
PC-Axis web site (http://www.scb.se/pc-axis)
PC-Axis file format specification (http://www.scb.se/Pages/List____314011.aspx)
Fiachra O'Donoghue <[email protected]>
This software is copyright (c) 2012 by Fiachra O'Donoghue.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.