Time-Evolving Data Science / Artificial Intelligence for Advanced Open Environmental Science (TAIAO)¶
Led by the University of Waikato, TAIAO is a data science programme of $13 million (GST exclusive) over seven years, funded by the Ministry of Business, Innovation, and Employment (MBIE). It will advance the state-of-the-art in environmental data science by developing new machine learning methods for time series and data streams that are able to deal with large quantities of big data in real-time, which are tailored to deal with data collected on the New Zealand environment. It will build a new open-source framework to implement machine learning on time series data, provide an open available repository with datasets to improve reproducibility in environmental data science, and build capability in fundamental and applied data science, accessible to all New Zealanders.
This programme is a new collaboration between the Universities of Waikato, Auckland and Canterbury, Beca and MetService and includes world-leading data scientists, data engineers, and environmental scientists.
Our vision is to enable the next level of data science to provide robust and fit-for-purpose tools and methods that are accessible and useful to researchers and practitioners across all areas of the New Zealand environment. Our method is to codesign our work with iwi, industry and government, ensuring through pertinent environmental case studies that we maximise benefit, uptake and suitability. Through our training programme, we will enable the next generation of New Zealanders to play a stronger and more useful role in solving the critical environmental problems that face our country. Our internationally-connected research team will make sure that the latest international advances will be adopted and reinvented for our unique environmental setting, while harnessing the passion of our own data science researchers to preserve our famously beautiful lakes, rivers, forests, estuaries and mountains for future generations.
The TAIAO logo represents realms of water, land and sky. The rau (leaf) is used to represent aspects of growth and to indicate changes over time. The purapura whetū tukutuku pattern is used to represent the myriad of stars that are forever guiding and shaping our environments and the myriad of data points that TAIAO would like to observe and develop guidance from.
Design: Tyler Keegan and Jeremy Tritt
Kōrero: Tyler Keegan and Te Taka Keegan
Data are essential to research, understand, set policy for and manage New Zealand’s environment, but environmental data presents many challenges that require new data science methods to overcome them, and a substantial increase in the capability of environmental researchers, governors and managers to use data science in their work. This programme will develop those new methods and build the required capability. In particular, we will focus on developing methods to deal with environmental datasets that are collected in large volumes over time, and must therefore be dealt with as streams that are analysed incrementally, as they are measured, rather than as collections of data that can be analysed all at once. These methods will address underlying characteristics of the data that evolve over time (e.g. due to climatic or ecological changes), and data that are collected at a range of time intervals and spatial scales ranging from broadscale satellite images to singlepoint measurements on the ground, in the water or air. The methods we develop will be interpretable and explainable (to help users understand why an algorithm produces some particular output), identify and understand anomalies (to distinguish “normal” from “unusual” measurements) and quantify uncertainty in algorithm output (to help decision-makers understand how confident they can be in conclusions drawn from the data science methods). To deliver the methods we develop in a form that environmental scientists and managers can use, we will build a new open source framework to do machine learning on time series data, and provide an open access repository of environmental datasets to improve reproducibility in environmental data science. Through workshops, undergraduate and postgraduate research projects within the programme, we will build New Zealand’s capability in fundamental and applied data science relevant to environmental data, from introductory to postdoctoral level.
We will advance the state-of-the-art of data science by addressing the following three research challenges:
1. Machine Learning for Data Streams and Time Series. Learning from data streams requires different techniques from those used on static data. Often, one- pass techniques are required with data streams, and the data are typically not curated. Applications are driven by anomaly and novelty detection, as well as clustering and event detection, with both supervised and reinforcement learning, the latter being naturally cast in a streaming setting.
2. Machine Learning for Weak signals and Extreme Events. Detecting extreme events in real time is much more challenging than detecting normal behaviour in historical data because extreme events, by definition, occur very rarely. Weak signals are the first indicators of changes that may become significant in the future but climate change means extremes will become more common in the future and we will need to incorporate robust predictions of extremes into our adaptation plans.
3. Deep Learning. Deep Learning is being used increasingly in environmental data science. Neural networks can be very powerful methods for classifying images, but they have the drawback of a high energy consumption. We will focus on two particular problems in deep learning research that are particularly relevant to environmental data science: explainability of deep learning and accurate quantification of predictive uncertainty in deep learning.
General Enquiries: firstname.lastname@example.org