Background
As part of its Open Government Initiative, the Ontario Government has issued an Open Data Directive that requires provincial Ministries and the Treasury Board Secretariat establish and publish an Inventory and Catalogue of government-wide data.
Application Program Interface (API)
The Ontario Government describes the Data Catalogue as a “one-window central platform” that provides faster and easier access to Government Data, thereby facilitating evidence-based policy, informs service delivery and promotes greater transparency and accountability.
The Data Catalogue’s Application Program Interface (API) helps users identify and access potentially relevant datasets with features that support:
- searching (free-text);
- filtering (Status, Publisher, File Type, Topics); and
- sorting (A-Z, Z-A, Date Added, Date Modified, Update Frequency);
- browsing; and
- downloading datafiles
Metadata Elements
A sampling of web pages served by the API shows that the Data Catalogue and Data Inventory use many of the same – but also some different – metadata elements – to describe the Ontario Government’s datasets. Table 1 presents
- the twenty metadata elements used to describe datasets in the Data Inventory;
- the labeling and/or display of metadata elements on web pages served for these datasets by the Data Catalogue’s API; and
- the data elements of the dynamic HTML serving these web pages to users of the Data Catalogue.
Table 3. Metadata associated with the Ontario Government’s datasets in the Data Inventory and Data Catalogue. * denotes metadata elements associated with Open datasets only; ** denotes metadata elements associated with Restricted datasets only.
Data Inventory |
Data Catalogue |
Metadata Property |
Label/Display |
Dynamic HTML Data Element |
Public Title |
Display only |
content.title |
Short Description |
Display only |
content.intro |
Long Description |
Display only |
content.intro <div class=”main-content row”> |
Other Title |
|
|
Data Custodian Branch |
|
|
Tags |
|
|
|
Topics |
content.tags.tagList |
Date Range – Start |
Time captured* |
content.dataset.dateStart |
Date Range – End |
Time captured* |
content.dataset.dateEnd |
Date Created |
|
|
Date Published |
Date added |
content.datePublished |
Contains Geographic Markers |
|
|
|
Geographical coverage |
content.dataset.geographicalCoverage |
Publisher |
Publisher |
content.dataset.publisherList |
Update Frequency |
Update frequency |
content.dataset.updateFrequency |
|
Data last modified* |
content.dataset.dateDataUpdated |
Access Level |
Data status |
content.dataset.status |
Exemption |
Exemption** |
content.dataset.exemptionList |
Rationale |
Rationale not to release** |
content.dataset.exemptionRationale |
Dataset URL |
|
|
|
Download data* |
content.dataset.filelist, or content.datasetexternalDataLink |
License Type |
Terms of Use* |
content.dataset.license |
|
hidden link to Technical documentation |
content.dataset.technicalDocument.url |
File Types (extensions) |
(link text)* |
fileGroup.list |
Additional Comments |
|
|
Many metadata elements associated with datasets in Ontario’s Data Inventory (e.g. Public Title, Short Description, Long Description, etc.) have clear counterparts in Ontario’s Data Catalogue. Other metadata elements associated with these datasets are provided only in the Data Inventory (e.g. Data Custodian Branch, Date Created) or only in the Data Catalogue (e.g. Data last modified). A few pairs of related metadata elements are worth brief comments.
DI: Contains Geographic Markers versus DC: Geographic Coverage
The Data Inventory associates TRUE or FALSE with the value of Contains Geographic Markers; the Data Catalogue associates Ontario or Canada with the value of Geographic Coverage.
DI: Tags versus DC: Topics
The Data Inventory associates the Ontario Government’s datasets with many dozens of different values of Tag – in what appears to be an uncontrolled process; the Data Catalogue, on the other hand, uses controlled input to associate these datasets with values of Topic selected from a list of official Ontario.ca topics, that resides on the Government’s intranet.
DI: Dataset URL and DC: Download data
The assignment of values to the Dataset URL metadata element in Ontario’s Data Inventory is incomplete and inconsistent; by contrast, the values of Download data associated with datasets in the Data Catalogue reflect the three options offered to the Ministries:
- You can have your data downloadable from the catalogue itself (we will upload the file and it will appear under “Download data”);
- You can link to the data for download (and it will appear under “Download data”; or
- If you can’t link to a direct file, you can put the link under the “Access data” section.
Harvesting Ontario’s Data Catalogue
Accessing the metadata associated with the Ontario Government’s datasets as they are served to users by the Data Catalogue’s Application Programming Interface (API) involves scraping and parsing 2,352+ dynamic HTML web pages – a complex and time-consuming process – one that is unlikely to scale and prone to error. Nonetheless, we have harvested the Data Catalogue (circa September 1, 2017) and want to share four ZIP files that contain the dynamic HTML of the web pages associated with the Ontario Government’s datasets that were Open, To Be Opened, Restricted, and Under Review.
Merging the Data Inventory and Data Catalogue
Finally, we are sharing (in both .xlsx and .ods format files) our preliminary alignment and merge of the two sets of metadata elements that were used to describe the Ontario’s Government’s datasets in the Data Inventory and Data Catalogue.