Visualizing the Ontario Government’s Open Data

Visualizing the Data Inventory

In previous posts, we compiled and shared a wealth of data and metadata connected with Ontario’s government-wide Data Inventory and Data Catalogue. These materials lay the groundwork for – and hopefully inspire – the study of the early development of an open data initiative undertaken by a sizable, democratic sub-national government.

Our final contribution (for now) to kick-starting  this study is to share a few ways of visualizing the Ontario Government’s Data Inventory.

First, we use D3 to convert any version of the Data Inventory to a force-layout graph, where

  • source nodes represent Ministries;
  • target nodes represent Datasets;
  • edges connecting source nodes and target nodes represent the relationship hasPublished; and
  • a target node’s colour (e.g. lime, yellowgreen, red, white) represents the Access Level of the corresponding Dataset (Open, To Be Opened, Restricted, Under Review).

Figures 1 and 2 represent Ontario’s Government-wide Data Inventory on July 29,2016 (its inception) and September 1, 2017, respectively.

Figure 1. Ontario’s Government-wide Data Inventory, July 29, 2016.
ontdi_00
 

 

Figure 2. Ontario’s Government-wide Data Inventory, August 3, 2017.
 

We may also animate the Data Inventory’s evolution over time:

  • ontdi_00
  • ontdi_01
  • ontdi_02
  • ontdi_03
  • ontdi_04
  • ontdi_05
  • ontdi_06
  • ontdi_07
  • ontdi_08
  • ontdi_09
  • ontdi_10
  • ontdi_11
  • ontdi_12
  • ontdi_13

 

Finally, let’s add some functionality (tooltips and hyperlinks) to our visualization of the Data Inventory, so that users may explore and click on nodes to access the rich content about datasets on the web pages served by the Data Catalogue’s API.

Previous: Ontario’s  Government-wide Data Inventory and Data Catalogue

Ontario’s Government-wide Data Catalogue

Background

As part of its Open Government Initiative, the Ontario Government has issued an Open Data Directive that requires provincial Ministries and the Treasury Board Secretariat establish and publish an Inventory and Catalogue of government-wide data.

Application Program Interface (API)

The Ontario Government describes the Data Catalogue as a “one-window central platform” that provides faster and easier access to Government Data, thereby facilitating evidence-based policy, informs service delivery and promotes greater transparency and accountability.

The Data Catalogue’s Application Program Interface (API) helps users identify and access potentially relevant datasets with features that support:

  • searching (free-text);
  • filtering (Status, Publisher, File Type, Topics); and
  • sorting (A-Z, Z-A, Date Added, Date Modified, Update Frequency);
  • browsing; and
  • downloading datafiles

Metadata Elements

A sampling of web pages served by the API shows that the Data Catalogue and Data Inventory use many of the same – but also some different – metadata elements – to describe the Ontario Government’s datasets. Table 1 presents

  • the twenty metadata elements used to describe datasets in the Data Inventory;
  • the labeling and/or display of metadata elements on web pages served for these datasets by the Data Catalogue’s API; and
  • the data elements of the dynamic HTML serving these web pages to users of the Data Catalogue.
Table 3. Metadata associated with the Ontario Government’s datasets in the Data Inventory and Data Catalogue. * denotes metadata elements associated with Open datasets only; ** denotes metadata elements associated with Restricted datasets only.
Data Inventory Data Catalogue
Metadata Property Label/Display Dynamic HTML Data Element
Public Title Display only content.title
Short Description Display only content.intro
Long Description Display only content.intro <div class=”main-content row”>
Other Title
Data Custodian Branch
Tags
Topics content.tags.tagList
Date Range – Start Time captured* content.dataset.dateStart
Date Range – End Time captured* content.dataset.dateEnd
Date Created
Date Published Date added content.datePublished
Contains Geographic Markers
Geographical coverage content.dataset.geographicalCoverage
Publisher Publisher content.dataset.publisherList
Update Frequency Update frequency content.dataset.updateFrequency
Data last modified* content.dataset.dateDataUpdated
Access Level Data status content.dataset.status
Exemption Exemption** content.dataset.exemptionList
Rationale Rationale not to release** content.dataset.exemptionRationale
Dataset URL
Download data* content.dataset.filelist, or content.datasetexternalDataLink
License Type Terms of Use* content.dataset.license
hidden link to Technical documentation content.dataset.technicalDocument.url
File Types (extensions) (link text)* fileGroup.list
Additional Comments

Many metadata elements associated with datasets in Ontario’s Data Inventory (e.g. Public Title, Short Description, Long Description, etc.) have clear counterparts in Ontario’s Data Catalogue. Other metadata elements associated with these datasets are provided only in the Data Inventory (e.g. Data Custodian Branch, Date Created) or only in the Data Catalogue (e.g. Data last modified). A few pairs of related metadata elements are worth brief comments.

DI: Contains Geographic Markers versus DC: Geographic Coverage

The Data Inventory associates TRUE or FALSE with the value of Contains Geographic Markers; the Data Catalogue associates Ontario or Canada with the value of Geographic Coverage.

DI: Tags versus DC: Topics

The Data Inventory associates the Ontario Government’s datasets with many dozens of different values of Tag – in what appears to be an uncontrolled process; the Data Catalogue, on the other hand, uses controlled input to associate these datasets with values of Topic selected from a list of official Ontario.ca topics, that resides on the Government’s intranet.

DI: Dataset URL and DC: Download data

The assignment of values to the Dataset URL metadata element in Ontario’s Data Inventory is incomplete and inconsistent; by contrast, the values of Download data associated with datasets in the Data Catalogue reflect the three options offered to the Ministries:

  1. You can have your data downloadable from the catalogue itself (we will upload the file and it will appear under “Download data”);
  2. You can link to the data for download (and it will appear under “Download data”; or
  3. If you can’t link to a direct file, you can put the link under the “Access data” section.

Harvesting Ontario’s Data Catalogue

Accessing the metadata associated with the Ontario Government’s datasets as they are served to users by the Data Catalogue’s Application Programming Interface (API) involves scraping and parsing 2,352+ dynamic HTML web pages – a complex and time-consuming process – one that is unlikely to scale and prone to error. Nonetheless, we have harvested the Data Catalogue (circa September 1, 2017) and want to share four ZIP files that contain the dynamic HTML of the web pages associated with the Ontario Government’s datasets that were Open, To Be Opened, Restricted, and Under Review.

Merging the Data Inventory and Data Catalogue

Finally, we are sharing (in both  .xlsx and .ods  format files) our preliminary alignment and merge of the two sets of metadata elements that were used to describe the Ontario’s Government’s datasets in the Data Inventory and Data Catalogue.

Previous:Ontario\’s Government-wide Data Inventory Next:Visualizing the Ontario Government\’s Open Data

Ontario’s Government-wide Data Inventory

Background

As part of its Open Government Initiative, the Ontario Government has issued an Open Data Directive that requires provincial Ministries and the Treasury Board Secretariat to establish and publish an Inventory and Catalogue of government-wide data.

Metadata Elements of the Data Inventory

On September 1, 2017, the Ontario Government’s Data Inventory was available for download as a Comma-Separated Values (CSV) file at https://files.ontario.ca/opendata/ontariodatainventory_31.csv.

Table 1 presents the twenty metadata elements used to describe datasets in the Data Inventory:

Table 1. Metadata elements used to describe datasets in the Ontario Government’s Data Inventory. Definitions/examples are sourced from the Ontario Government’s Open Data Guidebook, 2015.
Metadata Element Definition
Public Title The name given to a dataset. This must be unique and restricted to 200 characters or less. E.g. Consular offices in Ontario.
Short Description Limited description to 200 characters or less. E.g. List of consular offices operating in Ontario.
Long Description Limited description to 600 characters or less (approximately 100 words). E.g. The data contained includes the name of the Consul General or Honourary Consul General, office address, telephone and fax numbers, email and web addresses.
Other Title Internal name of the dataset. Must be unique. Limited to 200 characters.
Data Custodian Branch E.g. Office of International Relations and Protocol.
Tags
Date Range – Start Date format yyyy-mm-dd.
Date Range – End Date format yyyy-mm-dd.
Date Created Date format yyyy-mm-dd.
Date Published Date format yyyy-mm-dd. E.g. 2016-04-20.
Contains Geographic Markers True if this dataset contains geographic markers, false otherwise. E.g. TRUE.
Publisher Ministry or agency. Selected from a drop-down menu. E.g. Intergovernmental Affairs.
Update Frequency The frequency of update of the dataset. Selected from a drop-down menu. E.g. On demand.
Access Level Whether this data can be targeted for public release, is restricted from public release, or needs more assessment. Selected from a drop-down menu. E.g. Under review.
Exemption Choose the specific exemption that applies to the restricted dataset. Selected from a drop-down menu.
Rationale
Dataset URL (Optional) A web page that can be navigated to to gain access to the dataset. E.g. https://www.ontario.ca/page/consular-offices.
License Type Choose the license under which the distribution is made available. Selected from a drop-down menu. E.g. Ontario.ca Terms of Use.
File Types (extensions) Comma separated list of file types. Selected from a drop-down menu. E.g. DOCX.
Additional Comments Please provide any other information about your dataset that would help us to assess its suitability for open data.

The Evolution of the Data Inventory

Table 2 presents the dates of publication and download links to every release of the Ontario Government’s Data Inventory between July 2016 and August 2017. 1 Notice that, on different days (column B), the Ontario Government published two (08/05/2016, 09/20/2016, 10/21/2016, 11/18/2016, 01/25/2017, 02/27/2017, 03/01/2017, 04/04/2017, 05/19/2017, and 08/03/2017), three (01/09/2017), four (02/13/2017), and sometimes even five (10/06/2016) releases of its Data Inventory.

We’re going to ignore any discrepancies among same-day releases of the Data Inventory, and focus instead on the sole or final release of the day (column C). Upon inspection, we determine that these fourteen releases all differ from one another, and so may be regarded as distinct versions of the Data Inventory.

For ease of reference and access, Table 2 (column D) provides and designates download links for the fourteen distinct versions of the Data Inventory  OntGDI_00.csvOntGDI_01.csv, OntGDI_02.csv,OntGDI_13.csv.

Cleaning Up

The Ontario Government’s Data Inventory confronts the sort of data quality issues (e.g. typos, variant spellings, inconsistent use of punctuation, abbreviations, acronyms, etc.) that are expected with Open Data initiatives – especially in their beginnings.

A few discrepancies in the terminology/orthography used to designate a dataset’s Access Level also appear in different versions of the Data Inventory. These discrepancies are potentially quite troublesome for data analysis – though they are easily remedied (e.g. we replace designations, like “Will be made open/public,” “To be opened,” etc. with the most common designation, “To Be Opened”).

A more vexing problem arises when the same Public Title is assigned mistakenly to more than one dataset (e.g. the Public Title “Tax collections: client clearances” is assigned to datasets #350 and #371 in ontariodatainventory_8.csv). Resolving this problem requires:

  • identifying every duplicate use of any Public Title across different versions of the Data Inventory;
  • generating a Unique Title for every dataset, including those with the same Public Title (e.g. generating “Tax collections: client clearances_01” for dataset #350 and “Tax collections: client clearances_02” for dataset #371 in ontariodatainventory_8.csv,); and
  • assigning the same Unique Title to every dataset across different versions of the Data Inventory.

Both of these data quality issues with the original ontarioinventory_nn.csv files (column A) are addressed in the OntGDI_nn_EN.csv files (column D) provided in Table 2.

Table 2. Publication of the Ontario Government’s Data Inventory, July 2016 to September 2017.
A
Metadata Files
B
Release Date/Time
C
Sole/Final Release of the Day
D
Distinct Versions
ontariodatainventory.csv 07/29/2016 – 14:23 07/29/2016 – 14:23 OntGDI_00_EN.csv
ontariodatainventory_0.csv 08/05/2016 – 10:18
ontariodatainventory_1.csv 08/05/2016 – 10:19 08/05/2016 – 10:19 OntGDI_01_EN.csv
ontariodatainventory_2.csv 09/20/2016 – 10:29
ontariodatainventory_3.csv 09/20/2016 – 10:32 09/20/2016 – 10:32 OntGDI_02_EN.csv
ontariodatainventory_4.csv 10/06/2016 – 09:51
ontariodatainventory_5.csv 10/06/2016 – 10:02
ontariodatainventory_6.csv 10/06/2016 – 10:09
ontariodatainventory_7.csv 10/06/2016 – 10:28
ontariodatainventory_8.csv 10/06/2016 – 10:30 10/06/2016 – 10:30 OntGDI_03_EN.csv
ontariodatainventory_9.csv 10/21/2016 – 15:49
ontariodatainventory_10.csv 10/21/2016 – 15:50 10/21/2016 – 15:50 OntGDI_04_EN.csv
ontariodatainventory_11.csv 11/18/2016 – 10:21
ontariodatainventory_12.csv 11/18/2016 – 10:24 11/18/2016 – 10:24 OntGDI_05_EN.csv
ontariodatainventory_13.csv 01/09/2017 – 13:19
ontariodatainventory_14.csv 01/09/2017 – 13:22
ontariodatainventory_15.csv 01/09/2017 – 13:27 01/09/2017 – 13:27 OntGDI_06_EN.csv
ontariodatainventory_16.csv 01/25/2017 – 11:06
ontariodatainventory_17.csv 01/25/2017 – 11:08 01/25/2017 – 11:08 OntGDI_07_EN.csv
ontariodatainventory_18.csv 02/13/2017 – 13:04
ontariodatainventory_19.csv 02/13/2017 – 13:06
ontariodatainventory_20.csv 02/13/2017 – 13:56
ontariodatainventory_21.csv 02/13/2017 – 13:57 02/13/2017 – 13:57 OntGDI_08_EN.csv
ontariodatainventory_22.csv 02/27/2017 – 15:25
ontariodatainventory_23.csv 02/27/2017 – 15:28 02/27/2017 – 15:28 OntGDI_09_EN.csv
ontariodatainventory_24.csv 03/01/2017 – 15:22
ontariodatainventory_25.csv 03/01/2017 – 15:24 03/01/2017 – 15:24 OntGDI_10_EN.csv
ontariodatainventory_26.csv 04/04/2017 – 13:25
ontariodatainventory_27.csv 04/04/2017 – 13:26 04/04/2017 – 13:26 OntGDI_11_EN.csv
ontariodatainventory_28.csv 05/19/2017 – 11:38
ontariodatainventory_29.csv 05/19/2017 – 11:39 05/19/2017 – 11:39 OntGDI_12_EN.csv
ontariodatainventory_30.csv 08/03/2017 – 12:47
ontariodatainventory_31.csv 08/03/2017 – 12:48 08/03/2017 – 12:48 OntGDI_13_EN.csv

Next:Ontario\’s Government-wide Data Catalogue

  1. Personal correspondence, Dawn Edmonds, Team Lead, Policy and Partnerships, Open Government Office, Treasury Board Secretariat, October 11, 2017.

Milestones in Ontario’s Open Government Initiative

October 2013 Open Government initiative launched.
April 2014 Open data voting tool launched.
175+ open data sets published.
May 2015 Public consulted on the Open Data Directive.
400+ open data sets published.
November 2015 Final Open Data Directive released.
April 2016 Ontario is selected to be part of the Open Government Partnership’s new pilot program. Ontario’s Open Data Directive came into force.
500+ open data sets published
August 2016 Public invited to submit open government ideas as part of the Open Government Partnership pilot program. Ministries start to add their data inventories to the Data Catalogue.
May 2017 We adopted the International Open Data Charter.
We published more directives as part of our commitment to Open Government.
June 2017 Our mid-term self-assessment report and progress update is posted.

Available at https://www.ontario.ca/page/open-government [Accessed 2017-09-01]

Ontario’s Open Government Initiative

Ontario’s Open Government Initiative is about creating a more open and transparent government for the people of Ontario. The OntOGI aims to make the province the most open and transparent jurisdiction in Canada by:

  1. Giving people more opportunities to weigh-in on government decision-making;
  2. Sharing government data online so people can help solve problems that affect Ontarians every day; and
  3. Providing people with the information they want and need to better understand how their government works.

The Open Government Project Tracker invites people to view how Ontario is doing government differently.

Arguably, the most important step taken by the Ontario government toward openness and transparency has been its Open Data Directive to all Ministries and Provincial Agencies (November 2015).

On the international stage, the Ontario government has declared its commitment to openness and transparency by:

  1. Enrolling in the Open Government Partnership’s innovative Subnational Government Partnership Pilot Program (April 2016); and
  2. Adopting the International Open Data Charter (May 2017)

The most recent milestone in the OntOGI was the government’s publication of a mid-term self-assessment report and progress update (June 2017).

Next: Ontario’s Open Data Directive