Public Health Agency of Canada

Data Strategy

Building a foundation for federal public health data

Version as of September 13, 2019Français
expand all sections collapse all sections

Introduction


Our vision is for the Public Health Agency of Canada to be the most trusted source of public health information in the country. Through leveraging data innovation, modern technical capacity, and timely and quality public health data, we aim to enable our people and partners to accomplish our mission: protect and promote the health of Canadians.
An image presenting vision, leadership, partnership, action and innovation.

As the Public Health Agency of Canada (PHAC), we promote and protect the health of Canadians through leadership, partnership, innovation and action in public health.

We strive to prevent and control chronic and infectious diseases as well as injuries, and we prepare for and respond to public health emergencies.

Our actions have been, and always will be, guided by sound data.

Our ability to collect, generate, secure, manage, access, and analyze data is essential to our mandate, and we recognize that the value of our data will be optimized when they are easily found, readily available, have multiple purposes, unbiased and accurate.

From public health notices regarding food-related outbreaks to travel health notices, Canadians rely on PHAC to provide them with accurate and timely information. Our ability to create, collect, and share data effectively is paramount to our ability to provide Canadians with critical public health information. All the while, we are acutely aware of the importance of safeguarding privacy and guarding against bias as we collect, analyze and use our data holdings.

A graphic illustrating the components of quality data.

Our need for, and the availability of, timely data will continue to grow exponentially in the years ahead. Technology is advancing rapidly, creating unprecedented opportunities to integrate machine learning and artificial intelligence into science-based analytics that will improve our ability to generate predictive models for important public health action. There is also a growing expectation for real-time data that inform analysis for timely decisions. PHAC is increasingly combining innovative data analytics and visualization tools with traditional epidemiologic research and surveillance methods, transforming our data rich environment into meaningful public health information. This is enabling us to learn more about our public health ecosystem to support all Canadians in achieving optimal health and reducing health inequities across the country.

This Data Strategy is designed to be both aspirational and practical. Delivering on the strategy will enable PHAC to build a foundation for public health data. When fully realized, the implementation plan will establish clear governance, develop horizontal data infrastructure, adopt innovative approaches, encourage iterative experimentation, prioritize users, and forge partnerships with jurisdictions and external organizations, all while remaining open and transparent. Foundational to all of this recognition is that our culture must support our staff's development and need for tools and resources so that they can continue being the fabric of our federal public health organization.

Through the actions outlined in this Data Strategy we aspire to be an Agency that leverages data innovation and modern technical capacity in pursuit of our mandate to promote and protect the health of Canadians.


At the Public Health Agency of Canada, we can all take great pride in the leadership we have shown in innovating and strengthening Canada’s public health capacity.

Public health issues like antimicrobial resistance and the impacts of climate change are wide-ranging, constantly evolving, and increasingly complex.

Addressing these issues demands real-time data and predictive modelling to inform timely decision-making.

New technologies like genomics and artificial intelligence are making it possible to integrate data from across sectors and enhance our data analysis.

These innovations are helping us to achieve more for Canadians than ever before by significantly augmenting and enhancing our analytical capacities.

The Public Health Agency of Canada’s first Data Strategy is the result of some terrific collaboration and more than six months of staff consultations.

We want to continue that conversation. You are all key players in making the Agency a world-class public health organization - one that is recognized for its talented staff, data innovations and modern technological capacity.

Our Data Strategy is opening up unique opportunities to transform the work of the entire Organization.

We’re super excited to see what the future will hold!

PHAC's Data Context


a design of people in a city with the word data floating in the sky. The 'A' is being lifted by a helicopter.

Data are the raw ingredient for surveillance, evidence, research and science. Data play a key role in informing interventions and policy decisions, and supporting core public health functions. Data inform evaluation, performance information, and results-based management of PHAC’s programs and initiatives. Data contribute to knowledge translation, education, and public health information that is tailored for, and disseminated to, Canadians.

At their most basic, data can be numbers, readings, figures, text, dialogue, and images, collected from various sources. Data play a significant structural role in scientific organizations, providing the essential measurements and observations that guide our research, propel our analyses, and inform our decisions. What begins as data is transformed by PHAC employees into public health evidence and information, which then leads to informed decision-making.

As an Agency, we use data to produce a view of population health across Canada. Data linkages are necessary to maximize the value of information as well as extend its reach to social determinants of health. When used well, data provide crucial national perspectives on everything from foodborne illnesses, to chronic diseases, to health surveillance programs. Data also give us feedback on the impact of our leadership, services, and health promotion initiatives. Using data well is critical to ensuring that data are used transparently and for the public good.

At PHAC, data can be qualitative and quantitative and are collected through surveys, vital statistics, the Census, laboratories, hospitals, interviews, focus groups, and observations. They also include internal data such as corporate reports, program evaluations, financials, human resources, methodology development, etc. Our public health governance and systems are comprised of multiple stakeholders, some with shared legislative responsibilities (between federal and provincial jurisdictions), and some with different priorities. Furthermore, public health outcomes have multifactorial causes that can be difficult to assess and address comprehensively without a unified interdisciplinary approach.

An abundance of organizations have important data sources to inform decision-making for a number of social determinants of health. Nonetheless, if we are able to more easily identify, integrate and interpret all relevant data sources, we can influence changes to reduce health inequities.

At an enterprise level, there are significant efforts in place to support a more strategic use of data. This includes "A Data Strategy Roadmap for the Federal Public Service.” The Roadmap acknowledges that the government is not set up to treat data as a strategic asset for policy-making, program design or service delivery. Canada’s complex legislative and regulatory landscape and administratively burdensome data sharing processes worsens many of the issues PHAC faces as an organization. The Roadmap and PHAC’s Data Strategy present an opportunity to improve how data is created, protected, used, managed, and shared to improve the lives of Canadians.

In addition, the landscape in which we navigate is further characterized by uncertainty stemming from risks of emerging and re-emerging infectious diseases, including those caused by antimicrobial resistance and climate change. In order to be prepared for these and other future unknowns, a structure that unleashes the power of data coming from a multitude of sources is necessary. This will require strengthening our commitment to innovation in data analytics, technology, intervention research, and policy research, as well as fostering strategic partnerships to solve old, new and future public health problems. In doing so, we can enhance PHAC's work in disease detection, cessation, prevention, and control to ultimately keep Canadians healthy.

PHAC produces, collects and disseminates a wide range of public health data – these include data for our many surveillance programs that monitor infectious and chronic diseases, and other emerging public health events.

Despite the high-quality data and analyses generated by PHAC programs, our overarching approach to data management remains fragmented and inconsistent. While past attempts have been made to address the fragmented and inconsistent nature of data within the organization, there was a lack of change management, buy-in, and culture change to support an enterprise-wide approach to data management. Looking at data from an Agency-wide perspective, it is not clear what data are held collectively and what can be accessed by employees. Similarly, it’s hard to identify PHAC's data priorities to inform decision-making and action. When we approach our partners (including provinces and territories, Statistics Canada or the Canadian Institute for Health Information (CIHI)) about our data or surveillance priorities, we do so as individual programs and not as a unified federal entity.

Branch by branch, public health data are collected through different channels, stored on different mediums, analysed using different methods, and disseminated in different ways. These challenges reduce PHAC’s capacity to fulfill our mandate, especially given the heightened importance of interdisciplinary work in public health.

When our staff are asked about the data innovations they’d like to see, they often cannot get past the daily obstacles they face doing their regular data analysis from ongoing IT-related challenges – much less determining opportunities for innovation. For many parts of the organization, we have accepted work-arounds and band-aid solutions as the norm to overcome our data infrastructure and analytics challenges. Where innovative best practices do exist within PHAC, or with other groups and jurisdictions, they are often not shared or scaled-up Agency-wide.

Hexagonal image depicting the 6 challenges across the data management lifecycle.

Collectively, we recognize our challenges as evidenced in the Corporate Risk Profile. As the volume of and need for public health data increases both domestically and internationally, PHAC needs to have access to timely, reliable and accurate information and data. Equally important is having the ability to undertake the data analysis needed for effective, evidence-based decision making pertaining to public health matters.

These challenges reduce PHAC’s capacity to fulfill our mandate. For example, while we may collect an impressive amount of data, little is done in the way of data integration between disciplines such as chronic and infectious diseases, as well as between social and economic determinants of health.

At the heart of “Learning from SARS,” otherwise known as the Naylor Report, were recommendations pertaining to the need for information architecture, models and standards, technology transfer, privacy and information management, development of data sources and system development. We are still working towards these objectives.

“Public health is still struggling to catch up to the potential for effective surveillance afforded by new technologies. ... progress has been too slow and ‘stovepipe’ systems persist everywhere.” (Naylor, 2003)

PHAC aims to empower Canadians to improve their health by ensuring accurate public health information is disseminated in a timely manner. The Agency must evolve and continue to bolster its capacity to deliver on this foundational responsibility to ensure our reputation as a trusted source of public health information remains intact.

Changing our data culture represents an organizational transformation that will require careful, determined, strategic leadership and long-term investments in order to enhance PHAC’s data management capabilities, technical infrastructure and analytic resources.

This Data Strategy is our opportunity to set a new Agency-wide path forward in how we approach data ,and to create a clear and consistent approach to data management that supports decision-making and action.

Data Strategy


One strategy, six themes, three phases

PHAC’s Data Strategy is our opportunity to examine our fundamental relationship with public health data, and how this relationship can adapt to a rapidly evolving environment and high expectations for both the use and protection of data in the public health community, the Government of Canada, and Canadians.

PHAC’s Data Strategy represents our continuously evolving approach in how we use data to inform our actions and decision-making. It is version 1.0 which focuses our efforts in building a stronger data foundation.

The scope of the Data Strategy is vast, covering quantitative and qualitative data, program data, surveillance data, performance measurement data, administrative data, as well as corporate, financial and human resources data. It aims to create a data environment where all data are seen as a shared resource and common responsibility. The Data Strategy also complements the PHAC Strategic Plan by emphasizing our need to “release timely and interactive data and high-quality analysis to support decision making and action.”

As a shared resource and common responsibility, PHAC’s new data environment generates opportunities to leverage quantitative and qualitative data to improve assessment of outcomes and strengthens performance measurement, program evaluation, and policy development.

The Data Strategy is organized around six themes: data governance, data as an asset, science-based analytics, data infrastructure, partnership & collaboration and people & culture. Each theme has its own three-phased project plan with concrete actions to address challenges and opportunities related to data throughout PHAC. The Data Strategy will be realized through an Implementation Plan that will unfold with 55 actions over five years.

an infographic visually walking through the contents of the data strategy.

The first phase of the Implementation Plan will encompass activities led out of the Data, Partnership and Innovation Hub. This first phase also includes the establishment of an executive-level data governance model. With support from the Data Hub, the governance model will lead the Agency’s work in the second and third phases, determining the future direction of activities proposed in the current Implementation Plan.

The second and third phases contain a number of actions that require whole of Agency engagement and buy-in. Making change in PHAC’s data foundation represents an organizational transformation and will require careful, determined, strategic leadership and long-term investments in order to improve PHAC’s data management capabilities, technical infrastructure and analytics resources, including human resources. The governance model will be charged with determining the path forward for the Agency and developing the requisite project charter to enable the measurement and evaluation of progress. Integration with corporate functions, programs and laboratories, and senior management is critical to ensure that that the Agency proceeds towards data and digital transformation in an integrated fashion. Our greatest leap forward will be based on our recognition that a data-centric organization must enhance its innovative use of technology - status quo is not an option.

The opportunities before us are endless. At an enterprise-level, we can complement our traditional data collection methods by leveraging artificial intelligence, big data, and machine learning algorithms. We can pair surveillance and social media data to help predict how diseases spread; undertake or leverage community-based research to understand and improve tuberculosis treatment; explore the use of mobile sensors to better understanding physical and mental health; leverage waste-water surveillance to support an early warning system for disease outbreak and substance abuse; or implement whole genome sequencing combined with geographic mapping to trace a specific strain causing foodborne illness to its facility of origin.

At an international level, there is a global movement towards increased openness and transparency, such as the Open Government Partnership. There are also a number of open data and open science initiatives underway to maximize collaboration between federal science-based departments and agencies. Increased data sharing spurs greater exploration and discovery to maximize the modernization of intramural science.

This is our opportunity to create an effective, efficient and modern relationship between data, information and decision-making to achieve our desired outcomes for Canadians, and in support of building a stronger foundation for federal public health data.


Across the data lifecycle, there are policies, frameworks, guidance, standards and practices that guide our management and use of data as an asset - some of which exist already, others need to be refreshed, and still others will be newly drafted. Regardless of their shape or form, the PHAC Data Strategy provides an outline for a comprehensive vision for PHAC. The Data Strategy sets the direction going forward, focuses our efforts and provides a coherent plan for governing, analyzing and deploying all data related to achieving public health outcomes and in support of a common data culture.

This drive towards a common data culture will be supported by the following principles:

We are one agency, we manage public health data as an asset, and we are enabled and empowered by our technical infrastructure.
  1. We Are One Agency: We engage with our partners with a united voice, and with well-defined channels for establishing agreements and sharing data. Our internal structures and needs are ours to manage and need not be navigated nor understood by others.
  2. We Treat Data as an Essential Tool: Our data can be leveraged, reused, shared and designed to be interoperable. We do this carefully and expertly - to respect privacy and our existing legislation and regulation, while enhancing public health actions. We develop and leverage shared standards and best practices, seek opportunities to be open by default, and work in collaboration with provincial and territorial partners, the public health community, and the general public.
  3. We Are Enabled by Our Technical Infrastructure: We are empowered by our technical infrastructure as it facilitates data collection, analysis and dissemination. With few exceptions, our technical limitations do not impact what data we collect, preserve, and use, nor how we do our analysis and disseminate information to inform public health decisions and actions.

This visual depicts some of the current data-related challenges, the six themes of the data strategy, and the desired future state of data at the Public Health Agency of Canada. This visual was created based on input and feedback provided during a PHAC management team discussion in August 2019.

As per the description above, this visual depicts some of the current data-related challenges, the six themes of the data strategy, and the desired future state of data at the PHAC

Implementation


Click on the image for the one-page implementation plan, or scroll down to the text version below.

A graphic table demonstrating the implementation plan

Consultations


A 3D image of 4 hexagonal cylanders. On top of each one is a statistic about the breadth of the data strategy consultations. These statistics read as follows: '1 agency, 6 regions, 37 events, and 500 people'.

To develop the Data Strategy, PHAC took a “top-down” and “bottom-up” approach. While the Agency’s Executive Committee (EC) approved the overall scope of the Strategy via its “Six Themes” and the engagement approach, the contents of the Data Strategy and the implementation plan are a result of direct engagement with over 500 staff from all branches, labs, regions, groups and levels.

Staff were engaged in one of two ways:

  1. Through a series of open workshops centered on the six themes of the Data Strategy. Participation was both in-person and remote (phone and video) to be inclusive of all geographic regions of the Agency. During the workshops, staff shared their experiences about how data are currently being used within PHAC, what challenges they’ve faced in collecting and using data, and what solutions may address the gaps; and,
  2. Through over twenty targeted consultations with key committees and groups, including groups such as the Privacy Management Division, Audit and Evaluation, the Surveillance Integration Team, staff at the National Microbiology Lab, Tier Two Governance Committees (Policy and Operations). Furthermore, the PHAC Data Hub had conversations with each Branch Executive Committee within PHAC to learn about their needs and expectations for the Data Strategy.

This broad consultation approach ensured that the areas of expertise specific to data informed the scope of the strategy, and was inclusive of PHAC’s program areas, laboratory sciences, as well as evaluation and results-based management. By the end of the 6-month consultation period, over 25% of PHAC’s employees attended and contributed to one or more of the consultation sessions.

The PHAC Data Hub also committed to working in the open during this process. Summaries and notes were sent for validation and then posted for others to read. For a full list of consultations, and the notes from all of the sessions conducted by the Data Hub, visit the PHAC Data Strategy page on GCCollab, True Story: A Data Strategy for Public Health

Who was engaged

  • All staff, via workshops & online (including Colonnade, Carling and all five regions)
  • Office of the Chief Science Officer
  • National Microbiology Laboratory
  • Centre for Surveillance and Applied Research
  • Office of the Chief Public Health Officer
  • Privacy Management Division
  • Infectious Diseases Prevention and Control Branch (IDPCB) Branch Executive Committee
  • Health Security Infrastructure Branch (HSIB) Branch Executive Committee
  • Health Promotion and Chronic Disease Prevention Branch (HPCDPB) Branch Executive Committee
  • Office of Audit and Evaluation
  • Strategic Policy and Planning
  • PHAC’s G5
  • Chief Financial Officer
  • Information Management Services Directorate
  • Human Resources Services Directorate
  • Surveillance Integration Team
  • Sex and Gender-Based Analysis Plus (SGBA+) Champion and Centre for Chronic Disease Prevention and Health Equity
  • Policy Committee
  • Operations Committee
  • Communications and Public Affairs Branch
  • Corporate Services Branch
  • Office of International Affairs

From March to August 2019, the PHAC Data, Partnerships and Innovation Hub (the Data Hub), consulted with over 500 PHAC employees. The consultations formed the foundation of the development of the Data Strategy and Implementation Plan. Time after time, the Data Hub heard PHAC’s data culture described as a collection of “independent islands” with plenty of opportunities to share resources, best practices, and data standards. The most common solution proposed for this was horizontal collaboration, integration, and leadership on data initiatives. There was a widespread desire to get on the same page in order to streamline and rationalize the bespoke processes currently embedded across all elements of the data lifecycle.

For more details on staff feedback, visit the PHAC Data Strategy page on GCCollab, True Story: A Data Strategy for Public Health.

Data Strategy in Action


High Performance Cloud-Computing Pilot – National Microbiology Laboratory

Cloud-computing has become increasingly prevalent in today’s high tech world, offering solutions from basic storage constraints to more intense processing power applications such as artificial intelligence and machine learning. To test out a modern cloud-computing environment, the National Microbiology Laboratory (NML) partnered with Communications Research Centre and assembled a small team of computer scientists, biologists, and research engineers to answer three simple questions: Can the cloud be used to extend the NML’s High Performance Computing (HPC) data centre? Can cloud-based HPC be cost effective? Can the NML leverage cloud-based HPC?

The NML experiences significant bursts of compute activity, occasionally maxing out the on-site HPC infrastructure. This means that some researchers and scientists need to wait for server time when HPC resources are in high demand. Traditionally this has been solved by adding infrastructure and enforcing prioritisation schemes. However, recently, HPC has become a service offered by cloud computing vendors at rates that can be cost effective. To test out the cloud environment, the team embarked on a “six-week challenge” to shift elements of the on-site HPC to Amazon Web Services (AWS) to run actual genomic analysis.

Things did not fall into place immediately. With no data optimization, larger scale computing tests did worse on AWS than on local computing resources. By looking deep into the computing usage numbers, the team realized that performing a simple lift and shift of the on premise architecture into the AWS environment was not going to be efficient enough to reach their performance goals. Leveraging the high performance computing and high capacity network expertise, the team was able to fine tune the infrastructure and resource scaling strategy to outperform the on premise data centre in a cost effective and on demand manner.

The pilot eventually found positive answers to all three questions. The cloud can be used to extend HPC at NML. Cloud-based HPC can be cost effective. The NML can use cloud based computing for real science. More generally, the pilot found that a simple “lift and shift” of an existing system to the cloud did not immediately realise performance goals for larger scale computing tests. Fortunately, using their expertise, the pilot team quickly found efficiencies to improve the performance of cloud computing to a point of exceeding the existing on premise data centre in a cost effective manner.

Public Health Infobase

The Public Health Infobase provides researchers, health practitioners, and others with an interest in public health, access to timely data, the story in the numbers, and the ability to interact with innovative, accessible data visualizations.

Infobase changed the traditional way that most health organizations disseminate data, from paper publications with large and complex tables to the online interpretation and manipulation of data into easily understandable graphs, maps and simple tables.

Though part of what makes Public Health Infobase a success story is a can-do culture of continuous learning, innovative performance measurement, and iterative design (including learning from failure!), the most important driver is the hard work and support of data teams across the Health Promotion and Chronic Disease Prevention (HPCDP) Branch and from an ever-growing list of teams across the Health Portfolio.

Global Public Health Intelligence Network (GPHIN)

The Global Public Health Intelligence Network (GPHIN) combines a web-based platform scanning open-source information across the globe with a team of highly educated, dedicated, multicultural, and multilingual analysts filtering and assessing the information.

GPHIN supports Canada and other countries to perform event-based surveillance for early detection, assessment and reporting of infectious disease outbreaks and other emerging public health threats. Every day, GPHIN processes an average of 3,000 news reports in nine languages and the analysts are always on the lookout for the next potential public health threat.

GPHIN has been recognised as a Big Data pioneer in public health. Advanced machine learning in natural language processing have been implemented to improve relevance scoring, categorization, geographic tagging and de-duplication of the articles feed in the GPHIN platform.

These innovations in Artificial Intelligence techniques significantly help the team of analysts to be more efficient by decreasing system sustenance workload, which leaves the analysts more time to perform the analysis and reporting.


In addition to the sampling of “PHAC success stories”, the implementation of the Data Strategy will be supported by new data science pilots to demonstrate the vision of the strategy in action. Colleagues from across PHAC have approached the Data Hub for support in visualizing their data to make them more accessible and relevant for various audiences. Interactive data visualizations allow for better understanding of public health trends and in turn a better foundation for public health decision making.

What is Data Science? Data Science is the discipline of turning raw data into understanding that can guide decision and policy making through data visualization, statistical modelling, machine learning, spatial analysis, and other methods.

Beyond visualization, the Data Strategy’s pilots will enhance the value of our data by creating interactive platforms, automating processes and facilitating customizable reports to empower staff to make the most of PHAC’s data.