Ontario’s Government-wide Data Catalogue

Background

As part of its Open Government Initiative, the Ontario Government has issued an Open Data Directive that requires provincial Ministries and the Treasury Board Secretariat establish and publish an Inventory and Catalogue of government-wide data.

Application Program Interface (API)

The Ontario Government describes the Data Catalogue as a “one-window central platform” that provides faster and easier access to Government Data, thereby facilitating evidence-based policy, informs service delivery and promotes greater transparency and accountability.

The Data Catalogue’s Application Program Interface (API) helps users identify and access potentially relevant datasets with features that support:

  • searching (free-text);
  • filtering (Status, Publisher, File Type, Topics); and
  • sorting (A-Z, Z-A, Date Added, Date Modified, Update Frequency);
  • browsing; and
  • downloading datafiles

Metadata Elements

A sampling of web pages served by the API shows that the Data Catalogue and Data Inventory use many of the same – but also some different – metadata elements – to describe the Ontario Government’s datasets. Table 1 presents

  • the twenty metadata elements used to describe datasets in the Data Inventory;
  • the labeling and/or display of metadata elements on web pages served for these datasets by the Data Catalogue’s API; and
  • the data elements of the dynamic HTML serving these web pages to users of the Data Catalogue.
Table 3. Metadata associated with the Ontario Government’s datasets in the Data Inventory and Data Catalogue. * denotes metadata elements associated with Open datasets only; ** denotes metadata elements associated with Restricted datasets only.
Data Inventory Data Catalogue
Metadata Property Label/Display Dynamic HTML Data Element
Public Title Display only content.title
Short Description Display only content.intro
Long Description Display only content.intro <div class=”main-content row”>
Other Title
Data Custodian Branch
Tags
Topics content.tags.tagList
Date Range – Start Time captured* content.dataset.dateStart
Date Range – End Time captured* content.dataset.dateEnd
Date Created
Date Published Date added content.datePublished
Contains Geographic Markers
Geographical coverage content.dataset.geographicalCoverage
Publisher Publisher content.dataset.publisherList
Update Frequency Update frequency content.dataset.updateFrequency
Data last modified* content.dataset.dateDataUpdated
Access Level Data status content.dataset.status
Exemption Exemption** content.dataset.exemptionList
Rationale Rationale not to release** content.dataset.exemptionRationale
Dataset URL
Download data* content.dataset.filelist, or content.datasetexternalDataLink
License Type Terms of Use* content.dataset.license
hidden link to Technical documentation content.dataset.technicalDocument.url
File Types (extensions) (link text)* fileGroup.list
Additional Comments

Many metadata elements associated with datasets in Ontario’s Data Inventory (e.g. Public Title, Short Description, Long Description, etc.) have clear counterparts in Ontario’s Data Catalogue. Other metadata elements associated with these datasets are provided only in the Data Inventory (e.g. Data Custodian Branch, Date Created) or only in the Data Catalogue (e.g. Data last modified). A few pairs of related metadata elements are worth brief comments.

DI: Contains Geographic Markers versus DC: Geographic Coverage

The Data Inventory associates TRUE or FALSE with the value of Contains Geographic Markers; the Data Catalogue associates Ontario or Canada with the value of Geographic Coverage.

DI: Tags versus DC: Topics

The Data Inventory associates the Ontario Government’s datasets with many dozens of different values of Tag – in what appears to be an uncontrolled process; the Data Catalogue, on the other hand, uses controlled input to associate these datasets with values of Topic selected from a list of official Ontario.ca topics, that resides on the Government’s intranet.

DI: Dataset URL and DC: Download data

The assignment of values to the Dataset URL metadata element in Ontario’s Data Inventory is incomplete and inconsistent; by contrast, the values of Download data associated with datasets in the Data Catalogue reflect the three options offered to the Ministries:

  1. You can have your data downloadable from the catalogue itself (we will upload the file and it will appear under “Download data”);
  2. You can link to the data for download (and it will appear under “Download data”; or
  3. If you can’t link to a direct file, you can put the link under the “Access data” section.

Harvesting Ontario’s Data Catalogue

Accessing the metadata associated with the Ontario Government’s datasets as they are served to users by the Data Catalogue’s Application Programming Interface (API) involves scraping and parsing 2,352+ dynamic HTML web pages – a complex and time-consuming process – one that is unlikely to scale and prone to error. Nonetheless, we have harvested the Data Catalogue (circa September 1, 2017) and want to share four ZIP files that contain the dynamic HTML of the web pages associated with the Ontario Government’s datasets that were Open, To Be Opened, Restricted, and Under Review.

Merging the Data Inventory and Data Catalogue

Finally, we are sharing (in both  .xlsx and .ods  format files) our preliminary alignment and merge of the two sets of metadata elements that were used to describe the Ontario’s Government’s datasets in the Data Inventory and Data Catalogue.

Previous:Ontario\’s Government-wide Data Inventory Next:Visualizing the Ontario Government\’s Open Data

1 thought on “Ontario’s Government-wide Data Catalogue”

  1. This is absolutely ah-mazing. I especially like being able to see how we can turn those individual clusters green over time. How can we help you do more?

Leave a Reply

Your email address will not be published. Required fields are marked *