2025 - a year of EO embeddings?
Author: Dr. Jędrzej Bojanowski, Data Science Manager at CloudFerro
As we conclude 2024, it is clear that Artificial Intelligence (AI) has shown significant potential in transforming our approach to satellite data analysis, though its impact on general Earth Observation (EO) users remains limited. Looking forward to 2025, we anticipate more concrete advancements in EO, particularly through the integration of AI foundation models and embeddings.
Foundation models, powerful AI systems trained on extensive datasets, are being adapted to interpret satellite imagery with promising results. These models show potential in tasks such as object detection and image classification, requiring minimal additional training.
Embeddings represent a crucial innovation in AI for EO effectively converting complex satellite images into compact, machine-readable formats. This transformation unlocks a wide range of new possibilities.
Applications of embeddings
Embeddings enable efficient similarity searches for geographical features globally. This for instance allows users to select an area of interest on an image representing a specific phenomenon and quickly find similar images showing the same phenomenon. Further, models trained on both images and descriptive texts can generate embeddings that enable natural language queries for specific Earth features. For example, users can simply request "show me an image of deforestation" to retrieve relevant satellite imagery.
Embeddings significantly improve the speed and efficiency of machine learning algorithms that use satellite data. For instance, when training a model to predict crop yields, instead of using large amounts of raw data related to vegetation condition, weather, and landscape characteristics, the model can utilize much smaller embeddings generated from these data sources by a single model. This marks a transformative shift in the effort required to build machine learning models on large scales (e.g., global), which previously demanded vast amounts of data and immense computational power.
Serving as a radically compressed numerical representation of the information in an image, embeddings significantly aid in data compression. They can also help fill gaps in data across various dimensions, including spatial, temporal, and spectral. By using embeddings, AI models can process and analyze EO data more quickly and efficiently, potentially leading to faster insights and decision-making in fields such as environmental monitoring, urban planning, and disaster response.
Further embedding adoption
For 2025, we foresee several potential developments. Embeddings may start to become more widely adopted alongside traditional satellite imagery products. EO software might begin integrating embedding functionality, potentially streamlining processes for researchers and analysts. The EO community may need to prepare to acquire new skills to leverage these emerging technologies. Looking even more ahead in time, technological progress could lead to on-board satellite processing, generating embeddings directly in orbit. This could significantly benefit time-critical applications.
A particularly promising aspect is the potential democratization of EO. As AI-ready geospatial data becomes more accessible, we may witness an influx of new users developing novel applications. However, challenges remain. We must ensure the interpretability of AI-derived insights, address potential biases in models. What we are certain about is that we are ready with computational requirements for global-scale processing, though some improvements can still be made.
First global embedding dataset for EO
As 2024 concludes, it is worth noting a significant milestone in the AI for EO. CloudFerro and the European Space Agency's Φ-lab have released the first global embedding dataset for Earth Observations. This dataset comprises over 170 million embeddings derived from 62 TB of Sentinel-1 and Sentinel-2 data, representing 9.368 trillion pixels, compressed into 1 TB of optimized data. This publicly available dataset can already serve various analytical and testing purposes and will hopefully gather user feedback that is indispensable for continuing the development of AI-ready products.
However, we should understand that this is only the beginning, as the released dataset includes global-scale embeddings for only a single moment in time. Calculating embeddings across all satellite time series requires processing petabytes of data. Significant investments are thus needed, but it appears that the increased uptake of satellite data and the ability to build applications working at a global scale that this will bring, fully justify such investments.
As we approach 2025, a key question emerges: How will the EO community utilize these new data and tools? Will this be the year when AI embeddings already become more widely adopted in monitoring our planet? If so, we stand at the threshold of potential significant changes in Earth Observation. The coming year may bring new applications, tools, and insights that could alter our approaches to environmental monitoring. The year 2025 holds promise for advancements in Earth Observation, though the extent of AI's integration and impact remains to be seen.