Processing Sentinel-5P data using HARP and Python

Air quality is a crucial factor that impacts public health. Thanks to the satellite observations provided by TROPOMI onboard Senitnel-5P, hotspots of nitrogen dioxide (NO2) can be detected over the Earth every day. However, Sentinel-5P data needs to be prepared in a proper way to be useful for further analysis.

From this article, you will learn:

  1. How to find Sentinel-5P data on the CREODIAS platform?
  2. How to use a query to find Sentinel-5P data via virtual machine (VM) using Jupyter Notebook?
  3. How to process Senitnel-5P data using the HARP toolbox and export a newly created netCDF file?
  4. How to plot the image of the distribution tropospheric vertical column NO2 (NO2 TVCD)?
  5. How to calculate average pollution based on a new-created image over specific regions?

How to find Sentinel-5P data on CREODIAS?
Sentinel-5P data are provided as products. On CREODIAS, data can be obtained at two processing levels:

  • LEVEL 1B, which is a radiance in each band of TROPOMI (UV, UVIS, NIR, SWIR),
  • LEVEL 2, which provides a concentration of gases (NO2, SO2, O3, HCHO, and CH4), two cloud products (CLOUD and NP) and two aerosols products (Aerosol Layer Height and Aerosol Index).

The products are distinguished concerning timeliness:

  • near-real-time (NRTI)
  • offline (OFFL)
  • reprocessing (RPRO).

The Royal Netherlands Meteorological Institute responsible for the creation of retrieval algorithms for TROPOMI recommends using offline data (OFFL) available within a few days from acquisition, or the latest version of reprocessed data. This is the product that is going to be used in this case.

In this specific case, the images of Sentinel-5P (product L2__NO2___) were searched that were taken over Poland from the 1st of March to the 31st of March 2023. Click on “Search”, and you can find that there were 65 products fitting the search parameters (it could be slightly different depending on the area of interest – AOI drawn).

Click on “Copy query” to save the query which will be necessary to use in the further step (and paste it to a Notepad or other text editor).

How to use a query to find Sentinel-5P data via virtual machine (VM) using Jupyter Notebook?

You do not have to download data to your machine. You can easily work with desired data on a virtual machine (VM) or via the Horizon interface: https://horizon.cloudferro.com.

In order to filter the data you are interested in (Poland in the period 01.03.2023-31.03.2023), follow the tutorial “Processing Senitnel-5P data on air pollution” starting with “Import libraries” to “Create a list of merged paths and names of each image”. After processing these steps, you are going to obtain a list of files that complies with the conditions defined in Data Explorer.

How to process Senitnel-5P data using the HARP toolbox and export newly created netCDF files?

Now, it is time to run the HARP processing toolbox and create a new netCDF file which will be a dataset containing mean NO2 TVCD over Poland in March 2023. Moreover, only measurements whose quality flag is bigger than 0.75 (mainly cloudiness areas) as well as measurements over the area where the wind speed was lower than 5 m/s (high wind speed could affect values) will be used in the analysis.

To achieve this goal, you need to perform steps from “Using HARP - Atmospheric Toolbox” to “Import and write a newly created image as a netCDF named "mean_no2_2023_03.nc"” in “Processing Senitnel-5P data on air pollution”.

The new file should be created in your default directory (/home/eouser/). To get a visualization of your results, you can put it into QGIS or run code starting with "Define variable, longitude, latitude and colormap which are going to be visualized” to “Create a visualization” in “Processing Senitnel-5P data on air pollution”.

 

How to plot the image of distribution tropospheric vertical column NO2 (NO2 TVCD)?

This part aims to present how to convert Sentinel-5P netCDF data into a GeoTiff image and calculate zonal statistics on the CREODIAS platform. In this part you will learn how to:

In order to achieve the first three goals mentioned above, you need to follow “Processing Senitnel-5P data on air pollution” starting with “Import libraries” to “Plot netCDF - NO2 TVCD values, for the area of interest - Poland”. After performing these actions, you are going to plot the results (below) and create a GeoTiff image in a default directory.

Mean NO2 TVCD over Poland in March 2023.

How to calculate average pollution based on a new-created image over specific regions?

You could also calculate zonal statistics based on a newly created image of your choice. Let’s say you want to calculate the average NO2 pollution in each district of Poland in March 2023. To achieve this, you should perform the step “Plot mean NO2 TVCD for each Poland's district - create choropleth” from the tutorial and save the results as a .jpg file - “Export image of mean NO2 TVCD for each Poland's district to a .jpg”.

Mean NO2 TVCD in March 2023 over Poland referring to districts.

Finally, you can put the results (average NO2 TVCD in each district) into a data frame and save it as a .csv. Actions: “Put mean NO2  TVCD for each Poland's district to a data frame” and “Export mean NO2 TVCD for each Poland's district to a .csv file.”

Five most and least polluted districts of Poland in March 2023.

SUMMARY

With the proposed tutorial you can easily and effectively process huge amounts of Sentinel-5P data starting with defining specific datasets, through processing with the HARP toolbox, ending with creating and writing new data.

Author: Patryk Grzybowski, Data Scientist at CloudFerro