28th April 2025

CREODIAS updated with advanced global AI embeddings

The CREODIAS platform has just gained a new set of global AI embeddings, offering the Earth Observation (EO) community access to even more advanced representations of satellite data.

Embeddings are global AI transformations of Earth observation data (AI embeddings), or more specifically, they are numerical representations of satellite imagery produced using artificial intelligence models. These datasets contain key information, making it easier for scientists and analysts to work with satellite data, tune AI models and extract valuable insights, without having to process large raw datasets.

 This is another milestone towards open access to data and AI-based tools for analyzing satellite imagery. The embeddings were developed in collaboration with ESA Φ-lab and Asterisk Labs,

According to what we reported earlier, embeddings are very popular among AI for EO users. Now the collection has been enriched with data generated by three models: MMEarth, DeCUR-S1 and DeCUR-S2. 

Total embedding resources after the update:

  • 51 TB of AI embeddings generated from processed Sentinel data,
  • more than 40 billion embedding vectors,
  • processing of 147 TB of raw satellite data,
  • analysis covering more than 15 million Sentinel-1 and Sentinel-2 scenes and more than 16 trillion pixels.

The collection is part of the expanded Major TOM publication standard (https://huggingface.co/Major-TOM), and embeddings are available both on the HuggingFace platform and directly in the EODATA catalogue on CREODIAS.

Data location (EODATA directory) of the S3 path:

s3://EODATA/auxdata/MajorTOM/embeddings/Core-S2L2A-MMEarth/

s3://EODATA/auxdata/MajorTOM/embeddings/Core-S1RTC-DeCUR/

s3://EODATA/auxdata/MajorTOM/embeddings/Core-S2L1C-DeCUR/

s3://EODATA/auxdata/MajorTOM/embeddings/Core-S1RTC-SSL4EO/

s3://EODATA/auxdata/MajorTOM/embeddings/Core-S2L1C-SSL4EO/

s3://EODATA/auxdata/MajorTOM/embeddings/Core-S2RGB-DINOv2/

s3://EODATA/auxdata/MajorTOM/embeddings/Core-S2RGB-SigLIP/

Example of reading SigLIP S2L2A RGB embeddings from the EODATA catalog using Python:

import geopandas as gpd

s3_variables = {"endpoint_url": "https://eodata.cloudferro.com",

"key": "<INSERT YOUR PUBLIC KEY HERE>",

"secret": "<INSERT YOUR SECRET KEY HERE >"}

df = gpd.read_parquet("s3://EODATA/auxdata/MajorTOM/embeddings/Core-S2RGB-SigLIP/part_00001-00100.parquet",storage_options=s3_variables)

Access on HuggingFace

Due to limitations in making terabyte datasets available in the HuggingFace repository for embeddings developed on the MMEarth model, averaged versions (10×10 pooling) are available. The full collections can be found on CREODIAS.

Data links on HuggingFace:

What's next?

In the next stages, it is planned to make the MajorTOM satellite dataset available in a form optimized for rapid access and use, along with embeddings on the CREODIAS platform. In addition, use-cases (use-cases) for the embeddings will be developed, and notebooks with examples of their use will be made available to the community, making it even easier to put the technology into practice.

 Team: Jędrzej Bojanowski, CloudFerro; Marcin Kluczek, CloudFerro; Mikołaj Czerkawski, ESA Φ-lab / Asterisk Labs.