Monitoring water quality using Python and Sentinel-2 satellite imagery

Substantive overview

With the growth of cities and industry, more and more pollutants have started to find their way into Polish waters. According to many reports, poor condition of water concerns 91.5% of river resources, around 88% of lakes and nearly 100% of transitional and coastal surface waters. Lack of long-term vision for water management, ongoing climate change, insufficient action from the government, farmers, industrialists and the general public will exacerbate the problem quickly.

One of the most urgent issues posing a greatest risk on water quality in Poland, is the lack of environmental infrastructure in rural areas, where only 44% of residents have their wastewater treated. It is not only fertilisers that are a problem here, but also leaking domestic septic tanks. All of these, come down to progressive eutrophication followed by the formation of oxygen deserts in which life is dying. Thus, agriculture is the sector which most seriously threatens the quality of Polish groundwater and surface waters.

Eutrophication refers to overfertilisation of aquatic environments caused by excessive amounts of nitrogen and phosphorus compounds. When dead algae sink to the bottom of the water body and decompose, this process leads to increased oxygen consumption. Anaerobic bacteria are formed as a result, and as they decompose, they emit harmful pollutants such as hydrogen sulphide.

Studies have identified various factors promoting the occurrence of eutrophication, primarily including chlorophyll a (Chl a), nutrients (nitrogen and phosphorus), temperature, and transparency (Al-Thani et al., 2023; Mamun et al., 2022; Lundsør et al., 2020). Chlorophyll a has been widely used as an indicator of the biomass of phytoplankton in water sources.

In this project, Sentinel-2A and 2B satellite imagery were utilized to calculate the Normalized Difference Chlorophyll Index (NDCI), a spectral index specifically designed to estimate chlorophyll concentrations in aquatic environments. The NDCI is derived from the red (band 4) and red edge vegetation (band 5) bands of Sentinel-2 imagery, which are sensitive to the presence of chlorophyll pigments. By processing this data using Python, the NDCI was mapped across various water bodies in Poland, providing a detailed spatial and temporal analysis of chlorophyll distribution.

The relationship between chlorophyll levels and eutrophication is well-established: elevated chlorophyll concentrations often indicate an increased presence of phytoplankton, which can result from nutrient overloading (particularly nitrogen and phosphorus) in the water. This proliferation of phytoplankton can lead to harmful algal blooms, which deplete oxygen levels as they decay, further exacerbating water quality issues and threatening aquatic ecosystems.

By analyzing the NDCI data over time, this project not only identified areas with high chlorophyll concentrations but also tracked the progression of eutrophication in specific regions. The ability to monitor these changes remotely and accurately through satellite imagery represents a significant advancement in environmental monitoring. It allows for the timely identification of at-risk areas, enabling more informed decision-making for water management and mitigation strategies. Additionally, the use of open-source tools like Python makes this approach accessible and scalable, offering the potential for widespread application in environmental monitoring efforts across Poland and beyond.

Technical overview

Workflow to Monitoring Water Quality and Spatial Data Management Using Python and Copernicus EO Data.

In this project, we utilize a comprehensive workflow to analyze water quality using Earth Observation (EO) data from the Copernicus program, specifically focusing on the Sentinel-2 satellite imagery. The process involves several key steps, from data acquisition to spatial analysis, and ultimately, the storage and visualization of results using various geospatial technologies. Below is a detailed description of the workflow:

EO Data Acquisition and Handling with GeoJSON/GeoDataFrame

The initial step involves acquiring the necessary satellite data from the CREODIAS platform. This infrastructure provides Virtual Machines equipped with tens of PB of EO data , great storage and cloud computing capacity. We specifically target Sentinel-2 imagery, which is ideal for monitoring water quality due to its high resolution and multispectral capabilities. The area of interest (AOI) is defined using GeoJSON files, which are parsed and converted into GeoDataFrames using Python’s GeoPandas library. GeoDataFrames allow for efficient handling and manipulation of spatial data within the Python ecosystem.

Data Extraction from Metadata

Once the imagery is downloaded, it is essential to extract and analyze metadata. Using the Geospatial Data Abstraction Library (GDAL), we retrieve crucial information such as acquisition dates, spatial resolution, cloud coverage and band specifications. This metadata is critical for understanding the context of the data and ensuring that the correct bands are used for further analysis.

Cloud elimination and water extraction provided by SCL classification

Classification provided by the SCL enabled it to eliminate clouds from the scene, which was essential to have correct and valuable outcomes. Additionally, it helped to extract water bodies (rivers, lakes, reservoirs) thereby reducing the computational complexity of the analysis.

Calculating Indices NDCI and Band Extraction

The next step involves calculating the Normalized Difference Chlorophyll Index (NDCI), which is pivotal for assessing water quality and chlorophyll concentrations. NDCI is calculated from the using the Red (BAND 4) and Red-Edge (BAND 5) bands using the formula NDCI=(B5-B4)/(B5+B4). Rasterio, a Python library for reading and writing geospatial raster data, is employed to extract these specific bands from the Sentinel-2 imagery and perform the necessary calculations.

Reprojection, Normalization and Colormap Application

After calculating the indices, it is essential to reproject the data to a common EPSG to ensure spatial consistency across the datasets. This is followed by normalizing the pixel values to a 0-255 range to prepare the data for visualization. Applying colormaps enhances the interpretability of the results by visually differentiating between varying levels of water and chlorophyll concentrations.

Conversion to Cloud-Optimized GeoTIFF (COG)

To optimize the data for storage and access, the processed raster images are converted into Cloud-Optimized GeoTIFFs (COG). COGs are ideal for efficient cloud storage and streaming, allowing for quick access and retrieval of geospatial data. This conversion is carried out using GDAL commands, ensuring that the files are structured for optimal performance in cloud environments.

Data Storage in PostgreSQL/PostGIS

The processed geospatial data, including the calculated indices and their associated metadata, are then stored in a PostgreSQL database with PostGIS extension, set up locally on the CREODIAS Virtual Machine. This allows for robust spatial queries and data management. The parameters, such as index values and spatial extents, are read from the GeoDataFrame and inserted into the database, ensuring that all spatial attributes are preserved for future analysis.

Cloud Storage and S3 Bucket Management

In addition to local storage, the data is uploaded to CREODIAS S3 bucket for secure and scalable cloud storage. This ensures that the data is accessible for further analysis and sharing across different platforms. Python’s boto3 library is used to interact with the S3 bucket, managing the upload, retrieval, and organization of the geospatial data.

Visualization and WMS Integration with Mapbender

For visualization, the data is served through a Web Map Service (WMS) using MapServer. The WMS allows for seamless integration of the spatial data into web-based GIS platforms. Mapbender, an open-source web mapping framework, is employed to create interactive web maps that allow users to explore the spatial data layers, including the NDCI index (figure 1 and 2). This setup enables stakeholders to visualize water quality trends and make informed decisions based on the spatial data.

Figure 1 NDCI Index on ponds in Oswiecim (Poland) – screenshot from geoportal
Figure 2 Screenshot from geoportal on Mapbender platform

Summary

This workflow provides a comprehensive overview of the steps involved in monitoring water quality using Copernicus EO data and various geospatial technologies. Each step is linked to a specific part of the analysis, ensuring that the data is processed, analyzed, and visualized effectively. All of these stages were established with methodology “on the fly” which ensured that our program is efficient enough to handle mass amounts of data imagery.


Authors: Klaudia Kościuk and Łukasz Firek, Geoinformatics students of Faculty of Geology, Geophysics and Environmental Protection at AGH University of Krakow.
The project was carried out as part of the 2024 summer student internship at CloudFerro.