Algal Bloom Detection using a Machine Learning Approach

Satellite Image
Satellite image of lake with algae blooms

Satellite imagery combined with artificial intelligence could be used in the future to detect the presence and extent of harmful algal blooms in New Zealand lakes.

University of Auckland computer science PhD student Olivier (Oli) Graffeuille has developed a novel machine learning algorithm which can lead to precise estimates from satellite imagery on the quantity and nature of the algae.

“Currently algae is usually measured by taking one or two samples and analysing them in a lab. This often isn’t truly representative of the extent of the problem, and it’s costly and slow,’ Graffeuille says.

“With remote sensing using satellite imagery, we can estimate the quantity of algae by measuring the colour of the water. Satellites measure colour at up to 21 different frequencies of light which gives us information to help signal the amount of algae and, in some cases, information on the type of algae present, for example harmful cyanobacteria”.

Algal blooms are a rapid build-up of algae in a waterway which can harm local ecosystems, aquaculture and even human health, making people sick and rendering the body of water unfit for swimming. Algal blooms are increasing nationally as a result of nutrient enrichment and climate warming.

Graffeuille’s machine learning algorithm can be further developed beyond its current prototype and, in time, become a part of a system that can be adopted by companies and councils to monitor the health of waterways around the country.

His work – completed as part of his candidacy for PhD and funded by TAIAO – was published at this year’s international Association of the Advancement of Artificial Intelligence (AAAI) conference. It also won the University of Auckland Computer Science Best Paper Award for 2021.

His PhD supervisor Dr Yun Sing Koh says the excitement of the research is two-fold.

“From a machine learning (ML) angle, the algorithm developed by Oli only leverages a few ground truth data points to train his ML model. Most traditional predictive ML approaches assume the model will be able to obtain comprehensive ground truth data. In reality, this may not be the case. Oli has developed a machine learning algorithm to use a small set of data points in his work and combined it with unlabeled data to build an accurate predictive model.

“Secondly, the applicability of this research to solve the environmental problem is fascinating.”

Graffeuille’s PhD co-supervisor Dr Moritz Lehmann says water quality remote sensing is a difficult task. He has worked with Graffeuille over several years and used his algorithm to inform his work on the Eye on Lakes research project which looks for better ways to detect the extent of cyanobacteria blooms across the country.

“Satellite images by themselves are mostly pretty pictures. Their full potential can be realised when the raw data that makes up the image is converted into actionable information like the amount of algal blooms in lakes.

“I talked to [PhD primary supervisor] Yun Sing Koh and Oli about this and they became interested in trying machine learning approaches to this problem,” Dr Lehmann says.

He says a lot of data is needed for training machine learning models and to validate them – so Graffeuille is now working with a global dataset of about 7000 data entries.

“Oli doesn’t just use the signal the satellite sees, but combines this with information from other sources, for example the size and depth of the lake and the vegetation around it, and even meteorological information,” he says.

Dr Lehmann believes that machine learning has huge potential to enhance the field of Earth observation and provide significant benefits for public and ecosystem health in lakes.

His work at Xerra and the University of Waikato has been laying the foundation for satellite water quality monitoring by assessing its accuracy and improving it. The goal now, he says, should be to operationalise the technology by building an automated and user-friendly system. To reach its potential, remote sensing data has to be integrated with regional and national environmental reporting standards. This has not yet happened.

“But I don’t think we’re far away,” he says.