The CREODIAS platform has just gained a new set of global AI embeddings, offering the Earth Observation (EO) community access to even more advanced representations of satellite data.
Embeddings are global AI transformations of Earth observation data (AI embeddings), or more specifically, they are numerical representations of satellite imagery produced using artificial intelligence models. These datasets contain key information, making it easier for scientists and analysts to work with satellite data, tune AI models and extract valuable insights, without having to process large raw datasets.
This is another milestone towards open access to data and AI-based tools for analyzing satellite imagery. The embeddings were developed in collaboration with ESA Φ-lab and Asterisk Labs,
According to what we reported earlier, embeddings are very popular among AI for EO users. Now the collection has been enriched with data generated by three models: MMEarth, DeCUR-S1 and DeCUR-S2.
Total embedding resources after the update:
- 51 TB of AI embeddings generated from processed Sentinel data,
- more than 40 billion embedding vectors,
- processing of 147 TB of raw satellite data,
- analysis covering more than 15 million Sentinel-1 and Sentinel-2 scenes and more than 16 trillion pixels.
The collection is part of the expanded Major TOM publication standard (https://huggingface.co/Major-TOM), and embeddings are available both on the HuggingFace platform and directly in the EODATA catalogue on CREODIAS.
Data location (EODATA directory) of the S3 path:
s3://EODATA/auxdata/MajorTOM/embeddings/Core-S2L2A-MMEarth/
s3://EODATA/auxdata/MajorTOM/embeddings/Core-S1RTC-DeCUR/
s3://EODATA/auxdata/MajorTOM/embeddings/Core-S2L1C-DeCUR/
s3://EODATA/auxdata/MajorTOM/embeddings/Core-S1RTC-SSL4EO/
s3://EODATA/auxdata/MajorTOM/embeddings/Core-S2L1C-SSL4EO/
s3://EODATA/auxdata/MajorTOM/embeddings/Core-S2RGB-DINOv2/
s3://EODATA/auxdata/MajorTOM/embeddings/Core-S2RGB-SigLIP/
Example of reading SigLIP S2L2A RGB embeddings from the EODATA catalog using Python:
import geopandas as gpd
s3_variables = {"endpoint_url": "https://eodata.cloudferro.com",
"key": "<INSERT YOUR PUBLIC KEY HERE>",
"secret": "<INSERT YOUR SECRET KEY HERE >"}
df = gpd.read_parquet("s3://EODATA/auxdata/MajorTOM/embeddings/Core-S2RGB-SigLIP/part_00001-00100.parquet",storage_options=s3_variables)
Access on HuggingFace
Due to limitations in making terabyte datasets available in the HuggingFace repository for embeddings developed on the MMEarth model, averaged versions (10×10 pooling) are available. The full collections can be found on CREODIAS.
Data links on HuggingFace:
- https://huggingface.co/datasets/Major-TOM/Core-S2L2A-MMEarth (Core-S2L2A-MMEarth)
- https://huggingface.co/datasets/Major-TOM/Core-S1RTC-DeCUR (Core-S1RTC-DeCUR)
- https://huggingface.co/datasets/Major-TOM/Core-S2L1C-DeCUR (Core-S2L1C-DeCUR)
What's next?
In the next stages, it is planned to make the MajorTOM satellite dataset available in a form optimized for rapid access and use, along with embeddings on the CREODIAS platform. In addition, use-cases (use-cases) for the embeddings will be developed, and notebooks with examples of their use will be made available to the community, making it even easier to put the technology into practice.
Team: Jędrzej Bojanowski, CloudFerro; Marcin Kluczek, CloudFerro; Mikołaj Czerkawski, ESA Φ-lab / Asterisk Labs.

We invite all users of geospatial data to participate in a competition that aims to recognize best solutions and apps that provide value based on consolidation, processing and dissemination of spatial data using CloudFerro Cloud and CREODIAS platform services, tools and data.
The goal of the competition is to foster an ecosystem of spatial data producers, adopters and end users on CloudFerro Cloud. The proposals should include a solution that combines multiple collections of geospatial data, run on environments hosted by CloudFerro Cloud and use CloudFerro AI tools to build user-facing interfaces. It should clearly outline what areas it will impact and how, and what value it will bring in that area. Applicants should have a proven track record of practical or scientific activities.
All proposals should be submitted via email to cloud-competition@cloudferro.com, as PDFs by 31 May 2025.
The authors of the proposals selected by a Jury will receive a technical support package for developing geospatial applications covering: computing power, storage, and networking tailored to the project’s needs, personalized mentoring, hands-on technical support. Successfully completed projects will receive additional technical support packages for support services for further development or operations.
For more details, go to the Geospatial Innovation Competition page.
CREODIAS users can now benefit from the Copernicus Data Space Ecosystem SpatioTemporal Asset Catalogue (STAC) to search for Earth observation products. The results of a query from the CDSE STAC catalogue share the same S3 product path to /eodata repository as for the CREODIAS platform.
STAC promotes cloud-native access to data by providing a standardized way to catalogue geospatial assets, offering several benefits that align well with cloud computing workflows. Using STAC provides a consistent way to describe geospatial assets, facilitating interoperability for discovery and access to heterogeneous datasets. It is designed to manage massive datasets, providing rich metadata that allows processing large amounts of data without the need to download all the raw assets.
Explore and share your feedback about the catalogue on the CDSE forum. Learn more https://dataspace.copernicus.eu/news/2025-2-13-release-new-cdse-stac-catalogue.
European scientists, researchers, and professionals in educational, public, and non-profit organizations across 39 countries can now procure cloud services via the OCRE2024 Framework without undergoing a cumbersome public tendering process.
OCRE 2024 is an EU-compliant cloud procurement framework administered by GÉANT , the collaboration of European National Research and Education Networks (NRENs). It aims to improve access to innovative commercial cloud services for research and education institutions across Europe.
CloudFerro is a qualified provider of cloud services via this framework, offering comprehensive sovereign European cloud computing services, which include virtual machines, dedicated servers, GPU support, Kubernetes, and vast storage and data sharing capabilities. All these services are built on open-source technology. Users also benefit from seamless access to the robust GÉANT network. As the operator of CREODIAS, CloudFerro also gives users unlimited access to a repository of over 80 petabytes of Copernicus Earth observation data.
Check out more on the CloudFerro OCRE webpage: https://cloudferro.com/ocre-open-clouds-for-research-environments/
