Ontario’s Government-wide Data Inventory

Background

As part of its Open Government Initiative, the Ontario Government has issued an Open Data Directive that requires provincial Ministries and the Treasury Board Secretariat to establish and publish an Inventory and Catalogue of government-wide data.

Metadata Elements of the Data Inventory

On September 1, 2017, the Ontario Government’s Data Inventory was available for download as a Comma-Separated Values (CSV) file at https://files.ontario.ca/opendata/ontariodatainventory_31.csv.

Table 1 presents the twenty metadata elements used to describe datasets in the Data Inventory:

Table 1. Metadata elements used to describe datasets in the Ontario Government’s Data Inventory. Definitions/examples are sourced from the Ontario Government’s Open Data Guidebook, 2015.
Metadata Element Definition
Public Title The name given to a dataset. This must be unique and restricted to 200 characters or less. E.g. Consular offices in Ontario.
Short Description Limited description to 200 characters or less. E.g. List of consular offices operating in Ontario.
Long Description Limited description to 600 characters or less (approximately 100 words). E.g. The data contained includes the name of the Consul General or Honourary Consul General, office address, telephone and fax numbers, email and web addresses.
Other Title Internal name of the dataset. Must be unique. Limited to 200 characters.
Data Custodian Branch E.g. Office of International Relations and Protocol.
Tags
Date Range – Start Date format yyyy-mm-dd.
Date Range – End Date format yyyy-mm-dd.
Date Created Date format yyyy-mm-dd.
Date Published Date format yyyy-mm-dd. E.g. 2016-04-20.
Contains Geographic Markers True if this dataset contains geographic markers, false otherwise. E.g. TRUE.
Publisher Ministry or agency. Selected from a drop-down menu. E.g. Intergovernmental Affairs.
Update Frequency The frequency of update of the dataset. Selected from a drop-down menu. E.g. On demand.
Access Level Whether this data can be targeted for public release, is restricted from public release, or needs more assessment. Selected from a drop-down menu. E.g. Under review.
Exemption Choose the specific exemption that applies to the restricted dataset. Selected from a drop-down menu.
Rationale
Dataset URL (Optional) A web page that can be navigated to to gain access to the dataset. E.g. https://www.ontario.ca/page/consular-offices.
License Type Choose the license under which the distribution is made available. Selected from a drop-down menu. E.g. Ontario.ca Terms of Use.
File Types (extensions) Comma separated list of file types. Selected from a drop-down menu. E.g. DOCX.
Additional Comments Please provide any other information about your dataset that would help us to assess its suitability for open data.

The Evolution of the Data Inventory

Table 2 presents the dates of publication and download links to every release of the Ontario Government’s Data Inventory between July 2016 and August 2017. 1 Notice that, on different days (column B), the Ontario Government published two (08/05/2016, 09/20/2016, 10/21/2016, 11/18/2016, 01/25/2017, 02/27/2017, 03/01/2017, 04/04/2017, 05/19/2017, and 08/03/2017), three (01/09/2017), four (02/13/2017), and sometimes even five (10/06/2016) releases of its Data Inventory.

We’re going to ignore any discrepancies among same-day releases of the Data Inventory, and focus instead on the sole or final release of the day (column C). Upon inspection, we determine that these fourteen releases all differ from one another, and so may be regarded as distinct versions of the Data Inventory.

For ease of reference and access, Table 2 (column D) provides and designates download links for the fourteen distinct versions of the Data Inventory  OntGDI_00.csvOntGDI_01.csv, OntGDI_02.csv,OntGDI_13.csv.

Cleaning Up

The Ontario Government’s Data Inventory confronts the sort of data quality issues (e.g. typos, variant spellings, inconsistent use of punctuation, abbreviations, acronyms, etc.) that are expected with Open Data initiatives – especially in their beginnings.

A few discrepancies in the terminology/orthography used to designate a dataset’s Access Level also appear in different versions of the Data Inventory. These discrepancies are potentially quite troublesome for data analysis – though they are easily remedied (e.g. we replace designations, like “Will be made open/public,” “To be opened,” etc. with the most common designation, “To Be Opened”).

A more vexing problem arises when the same Public Title is assigned mistakenly to more than one dataset (e.g. the Public Title “Tax collections: client clearances” is assigned to datasets #350 and #371 in ontariodatainventory_8.csv). Resolving this problem requires:

  • identifying every duplicate use of any Public Title across different versions of the Data Inventory;
  • generating a Unique Title for every dataset, including those with the same Public Title (e.g. generating “Tax collections: client clearances_01” for dataset #350 and “Tax collections: client clearances_02” for dataset #371 in ontariodatainventory_8.csv,); and
  • assigning the same Unique Title to every dataset across different versions of the Data Inventory.

Both of these data quality issues with the original ontarioinventory_nn.csv files (column A) are addressed in the OntGDI_nn_EN.csv files (column D) provided in Table 2.

Table 2. Publication of the Ontario Government’s Data Inventory, July 2016 to September 2017.
A
Metadata Files
B
Release Date/Time
C
Sole/Final Release of the Day
D
Distinct Versions
ontariodatainventory.csv 07/29/2016 – 14:23 07/29/2016 – 14:23 OntGDI_00_EN.csv
ontariodatainventory_0.csv 08/05/2016 – 10:18
ontariodatainventory_1.csv 08/05/2016 – 10:19 08/05/2016 – 10:19 OntGDI_01_EN.csv
ontariodatainventory_2.csv 09/20/2016 – 10:29
ontariodatainventory_3.csv 09/20/2016 – 10:32 09/20/2016 – 10:32 OntGDI_02_EN.csv
ontariodatainventory_4.csv 10/06/2016 – 09:51
ontariodatainventory_5.csv 10/06/2016 – 10:02
ontariodatainventory_6.csv 10/06/2016 – 10:09
ontariodatainventory_7.csv 10/06/2016 – 10:28
ontariodatainventory_8.csv 10/06/2016 – 10:30 10/06/2016 – 10:30 OntGDI_03_EN.csv
ontariodatainventory_9.csv 10/21/2016 – 15:49
ontariodatainventory_10.csv 10/21/2016 – 15:50 10/21/2016 – 15:50 OntGDI_04_EN.csv
ontariodatainventory_11.csv 11/18/2016 – 10:21
ontariodatainventory_12.csv 11/18/2016 – 10:24 11/18/2016 – 10:24 OntGDI_05_EN.csv
ontariodatainventory_13.csv 01/09/2017 – 13:19
ontariodatainventory_14.csv 01/09/2017 – 13:22
ontariodatainventory_15.csv 01/09/2017 – 13:27 01/09/2017 – 13:27 OntGDI_06_EN.csv
ontariodatainventory_16.csv 01/25/2017 – 11:06
ontariodatainventory_17.csv 01/25/2017 – 11:08 01/25/2017 – 11:08 OntGDI_07_EN.csv
ontariodatainventory_18.csv 02/13/2017 – 13:04
ontariodatainventory_19.csv 02/13/2017 – 13:06
ontariodatainventory_20.csv 02/13/2017 – 13:56
ontariodatainventory_21.csv 02/13/2017 – 13:57 02/13/2017 – 13:57 OntGDI_08_EN.csv
ontariodatainventory_22.csv 02/27/2017 – 15:25
ontariodatainventory_23.csv 02/27/2017 – 15:28 02/27/2017 – 15:28 OntGDI_09_EN.csv
ontariodatainventory_24.csv 03/01/2017 – 15:22
ontariodatainventory_25.csv 03/01/2017 – 15:24 03/01/2017 – 15:24 OntGDI_10_EN.csv
ontariodatainventory_26.csv 04/04/2017 – 13:25
ontariodatainventory_27.csv 04/04/2017 – 13:26 04/04/2017 – 13:26 OntGDI_11_EN.csv
ontariodatainventory_28.csv 05/19/2017 – 11:38
ontariodatainventory_29.csv 05/19/2017 – 11:39 05/19/2017 – 11:39 OntGDI_12_EN.csv
ontariodatainventory_30.csv 08/03/2017 – 12:47
ontariodatainventory_31.csv 08/03/2017 – 12:48 08/03/2017 – 12:48 OntGDI_13_EN.csv

Next:Ontario\’s Government-wide Data Catalogue

  1. Personal correspondence, Dawn Edmonds, Team Lead, Policy and Partnerships, Open Government Office, Treasury Board Secretariat, October 11, 2017.

Leave a Reply

Your email address will not be published. Required fields are marked *