Computing & Cloud

 

Large-scale data storage and access technology to power European and global research and industry

 

Large research projects and such important European initiatives like Digital Twin Earth, require collecting, storing, processing and analyzing huge amounts of data. Good news is that big data storage and access technologies are mature and available. CREODIAS is ready to serve over 2PB daily.

 

The challenge

The Fourth Paradigm is a concept that is focused on advancing science by open or increased access to data. In the deluge of new data, it became not only possible, but also necessary to supplement classical scientific paradigms: observation, theory and simulation with a fourth one: large-scale Data Exploration. Something that will unify observation, theory and simulation in an extensive system.

A number of large research projects and initiatives emerging in recent years highlights that necessity and displays the importance of making the data, tools, technologies and platforms accessible. Initiatives like Digital Twin Earth, Human Brain Project, Digital Twin Manufacturing and many others are frameworks for understanding, modelling and forecasting the behavior of extremely complex systems. For such frameworks to work it is indispensable to store and manage huge amounts of heterogenous data and to make it available through unified, flexible, streamlined interfaces to multiple user communities.

Storing and dissemination of peta - or exabytes of heterogenous data in an open and flexible manner poses a serious technical challenge. Object storage provides a solution that is cost-effective, easily scalable and accessible. It allows for storing unstructured or extremely diversely structured data, thanks to the lack of hierarchical directory structure. Instead, object storage uses a unique identifier for each object. This convention and “flat” architecture allows also for massive, dynamic scaling of the storage where the scale is nearly infinite. The matter is to employ it in a real life, commercially viable scenario. Something that requires advanced cataloguing and network services. Altogether guaranteeing automated, fast and easy access to data stored online for immediate use.

 

The solution

CREODIAS currently stores almost 21 PB of Earth Observation data, ingests on average 25 TB of data daily and disseminates it to more than 6 thousand registered users and countless non-registered ones. Using opensource CEPH software for building storage - because of its ability to build and manage object storage for OpenStack – and advanced cataloguing solution, CREODIAS can serve data inside and outside of its cloud via graphical application (CREODIAS Finder) and variety of access interfaces. This machine2machine interfacing enables stakeholders to leverage the data in their processing chains in an automated fashion, both on CREODIAS cloud and on any other infrastructure of their choice.

CloudFerro, which is the CREODIAS operator, has recently conducted tests that show ability to serve 2PB of data daily from its repositories. It is even possible to double that rate. With all the prerequisites: current 21 PB of EO data and possible growth to 50PB if necessary in a near future, benchmarking and tests results and experience from building and operating numerous cloud platforms - Climate Data Store, CODE-DE, WEkEO, EO IPT and others - with combined storage of over 100PB - CloudFerro can operate at a scale required by initiatives like Destination Earth. Building on expertise and lessons learned from previous projects we are able to ingest, store, index and disseminate massive amounts of EO data, tens and hundreds of Petabytes. We can provide easy, remote, broadband and scalable access to online, granular data in a cost-efficient manner. And those are vital capabilities when the fourth paradigm is in force.