Skip to content
eoPortal

Other Space Activities

Historical Data Archives

Last updated:Jun 27, 2024

Ground Segment

Importance of Long Term Data Archives

The preservation of historic Earth observation (EO) data is a vital component to understand how both anthropogenic and natural phenomena evolve over long time periods. This data provides valuable snapshots of an array of variables, from climate and atmospheric trends to enhance monitoring of both anthropogenic activities and ecological factors, allowing a greater understanding of the Earth’s processes. Understanding these processes is of enormous value to both scientific modelling and discovery, and to private interests in sectors such as agriculture and natural resource extraction. 5)

One of the most important scientific contributions of long term data archives lies in climate monitoring and modelling. Data collected and preserved over extensive time periods can be used to analyse current climate conditions and identify trends such as anthropogenic global warming. Series of missions such as the American Orbiting Carbon Observatory (OCOOCO-2 & OCO-3) and the Japanese Greenhouse gases Observing Satellites (GOSATGOSAT-2GOSAT-GW) provide continuous data streams to monitor climate change. 2) 5) 7)

EO data archives are a vital component of the calibration and validation of climate models, as they rely on historical data to validate their accuracy and calibrate parameters. Long term archives of climate data are used for ‘hind-casting’, where models are calibrated against a known set of past conditions, and receive known inputs of climate conditions at each time step. The model outputs can then be compared against observed conditions, provided by preserved climate data, to test or verify their accuracy. Climate modelling is hugely significant not only in making predictions of future weather patterns, but also in experimenting with climate outcomes. Effective climate models allow the simulation of a multitude of scenarios, for example, projecting the course of continued global warming, as well as the impact of different mitigation strategies, without the cost of real-world implementation. However, it is largely enabled by the preservation of long term Earth observation climate data, highlighting the importance of such records. 1) 7)

Long term data archives for EO data also have vital applications in aiding resource management, particularly in areas such as water quality, availability and sustainability, as well as forest management. Water resources are growing in significance as climate change leads to the aridification of many areas and exacerbates natural disasters such as droughts. Long term Earth observation archives are crucial for monitoring water sources, identifying trends, and alleviating the effects of severe drought. Satellite based observations of water-leaving reflectance and radiance can describe hydrological parameters such as water temperature, chlorophyll concentration and turbidity. For example, Landsat-8 and Sentinel-2 data has been used to monitor water quality in the Suez canal. 5) 13)

Figure 1: Total Suspended Matter Concentration in Timsah Lake over time (Image Credit: T. Seleem, D. Bafi, M. Karantzia & I. Parcharidis (2022) 13) )

This experimental use of existing stored EO data shown in Figure 1 demonstrates the potential for water quality measurement from space, and the ability to identify trends in such variables through the curation of a time scaled dataset. Long-term data archives of water quality measurements can provide valuable understanding of Earth’s hydrological processes, and allow better prediction of and response to instances of severe drought. For example, the Temperature Vegetation Dryness Index (TVDI) is an agricultural drought measurement derived from historical Landsat-8 & 9 data, which was used to predict drought conditions on a regional scale in Tamale, Ghana (Hobart, et al., 2024). These predictions allow the creation of early warning systems and drought protocols to alleviate the impacts of natural disasters. This can be seen in the elevated drought tendencies experienced by Chad and subtropical eastern Africa in 2005, where historical Landsat data was again used to delineate fracture zones, identifying areas more likely to contain groundwater reservoirs (). Therefore, the preservation of EO data can also greatly benefit global resource management, by providing time scaled datasets for the identification of pressing environmental trends, such as water source quality and availability, and by contributing to a greater understanding of the Earth’s hydrological processes. 8) 9) 10)

Data Storage Techniques and Practices

The practice of long-term data archiving involves preserving datasets from various missions and sources, encompassing space-borne data, telemetry, processed data, and associated scientific and technical information. This approach often entails employing two main categories of archival technologies: hardware systems, which encompass physical computing and storage units, and software systems, which support digital tools for data preservation and management. Hardware systems typically include diverse storage solutions such as tape libraries, disk arrays, and specialised archival media like optical discs or glass-based storage. These technologies ensure redundancy and durability, crucial for safeguarding large volumes of data over extended periods. Software systems complement these hardware solutions by providing robust data management, metadata handling, and access control functionalities. Together, these technologies form a comprehensive framework for securely storing and preserving vast amounts of data, ensuring accessibility and integrity over time. ESA's Space Data Preservation System (SDPS) is an example of this approach, housing duplicate copies of mission-critical data at the ESA Centre for EO in Frascati, Italy. This system integrates advanced hardware and software solutions tailored to the unique requirements of space data preservation, ensuring continuity and accessibility for future scientific research and operational needs. 3) 4) 6)

 

Hardware Systems

Magnetic Tape Storage

Magnetic tape storage provides high capacity and cost efficient data storage, making it a highly effective tool for long term data archiving. These systems use magnetic tape, housed in cartridges, to sequentially store data. Magnetic tape storage offers high capacity data storage and low costs per terabyte compared to other media storage methods. For example, Linear Tape Open (LTO) Ultrium is a magnetic tape data storage technology that is a high capacity, single reel tape. This product has an uncompressed storage capacity of 18 TB and 45 TB compressed, with future plans including a 1.44 PB compressed storage model. Another benefit of magnetic tape storage archives is their scalability, through systems known as tape libraries. One example of a tape library is the Quantum Scalar i6000, which provides  storage of up to 253.8 PB, through a modular design that allows the combination of a series of individual magnetic tape storage systems. However, tape libraries and magnetic tape storage devices in general can have lower access times, compared to disc based systems, and as data is stored on magnetic tape, these data storage systems are extremely vulnerable to magnetic fields and radiation, which can result in damage to or destruction of stored data. 3)

Figure 2: Quantum Scalar i6000 tape library (Image credit: Quantum)


Optical Disc Storage

Optical disc storage utilises DVD-like discs. These discs store data by physically engraving or using polymers and nanophotonic materials to record information, and use BluRay laser technology to write and read information. Folio Photonics’ manufacturing process for optical discs entails polymers and advanced nanophotonic materials embedded into multi-layer films through a co-extrusion manufacturing process, wherein two different plastic materials are extruded through a single die. Folio Photonics' prototype discs can currently store 0.8TB to 1.0TB and aim to achieve capacities up to 10TB with a longevity of up to 120 years. However, this product is still experimental. Optical discs provide longevity and resistance to environmental factors like light and humidity, making them suitable for archival storage. However, they are susceptible to physical damage such as scratches, and current capacities may not meet the demands of modern data-intensive applications. However, there are products in development to solve this issue, for example, M-Disc, which engraves data onto a rock-like layer for durability, aiming to provide a lifespan of up to 1000 years and capacities of up to 100GB per disc. 3)

Figure 3: Folio Photonics optical disk (Image credit: Folio Photonics)

Disc Array Storage

Disk array storage consists of arrays of disks, both solid state drives (SSD) and hard disc drives (HDD), configured for high-capacity and high-performance data storage. They are similar to tape libraries in their configuration of multiple standalone storage systems, and their scalability. For example, Hitachi Vantara's HCP S31 can scale up to 1.3EB, while NetApp's FAS9500 supports up to 14.7PB. These systems use RAID (Redundant Array of Independent Discs) configurations and scalable architectures to provide redundancy and high availability, suitable for dynamic data environments requiring frequent access and rapid retrieval. Disk arrays offer faster access speeds compared to tape and optical storage but are generally more expensive per terabyte and require more power and cooling. They are ideal for active data environments and cloud integration but may not be as cost-effective for long-term archival of less frequently accessed data. 3)

Figure 4: Hitachi Content Platform (HCP) S Series (Image credit: Hitachi Vantara)

Optical Glass Storage

Optical glass storage, exemplified by Microsoft's Project Silica, stores data in quartz glass, a low-cost and electromagnetic field proof medium with an extremely prolonged lifetime. Optical glass storage uses ultrafast femtosecond lasers to write data onto the quartz medium and polarisation sensitive microscopy using regular light to read data. Due to these methods of reading and writing, data cannot be inadvertently overwritten during reading.This technology achieves volumetric data densities exceeding traditional magnetic tapes, with raw capacities exceeding 7TB in a DVD-sized glass platter. The glass medium is resistant to electromagnetic fields and offers exceptional longevity, estimated to last tens to hundreds of thousands of years. Data is stored as patterns within the glass structure, ensuring stability over time. However, the technology is currently in development and may require significant development before widespread commercial adoption. Its advantages lie in its potential for extremely long-term data preservation and robustness against environmental hazards. 3)

Figure 5: Microsoft glass drive storage plate (Image credit: Microsoft)

 

Software Systems

Cloud Based Data Management and Preservation

Cloud-based solutions like LABDRIVE leverage platforms such as Amazon Web Services (AWS) to offer scalable data management and preservation. LABDRIVE, developed by LIBNOVA, operates as a software-as-a-service (SaaS) platform capable of managing vast amounts of data, up to 15.87 PB across 600 million files in a month. These systems are highly scalable, allowing easy expansion of storage capacity as needed without upfront infrastructure costs. They offer flexible access with internet connectivity and typically include robust backup and disaster recovery capabilities inherent in cloud services. However, reliance on internet connectivity introduces potential latency and access issues, and ongoing operational costs can accumulate as data volumes grow. Cloud-based solutions are ideal for organisations that require rapid scalability, global accessibility, and integrated disaster recovery, making them more suitable for dynamic data environments and digital preservation initiatives. 3)

On-Site Data Archive Management

On-premises archive management tools such as Versity Store Manager (VSM) provide a software structure for data archiving across physical storage devices like tape libraries. VSM utilises a standard Portable Operating System Interface (POSIX) interface to manage archival storage resources, optimising data flow between physical devices and cloud services. These solutions offer control over data security and compliance, as data remains within the organisation's infrastructure. On-site systems are favoured for their predictable costs and direct control over data management policies and access. However, they require significant initial investment in hardware and maintenance, and scalability can be more limited compared to cloud-based solutions. On-premises archive management tools are well-suited for organisations with stringent data governance requirements or regulatory compliance needs, where maintaining physical control over data is critical. 3)

Integrated Data Management

Integrated data management platforms, often with AI capabilities, such as nageruHive from Nageru, offer comprehensive acquisition, processing, and preservation of archived data. These platforms incorporate AI for automatic data classification, metadata extraction, anomaly detection, and data correlation, enhancing efficiency and accuracy in managing large and diverse data sets. Nageru has successfully completed ESA tests to provide a long-term data archive for the Copernicus program, and is waiting to become an ESA provider. These systems excel in automating complex data workflows and ensuring data integrity through advanced analytics and machine learning. However, integrating AI capabilities requires expertise and may entail higher initial costs for implementation and training. They are ideal for organisations handling large volumes of data with diverse formats and requiring sophisticated data analytics and preservation capabilities. 3)

Data Management Practices

The CEOS Working Group on Information Systems and Services (WGISS), laid out a set of ten Data Management Principles (DMP), organised into five DMP themes, discoverability, accessibility, usability, preservation and curation. The ten DMPs are shown below.

Table 1: WGISS Data Management Principles 4)

Discoverability:

DMP-1

Data and all associated metadata will be discoverable through catalogues and search engines, and data access and use conditions, including licences, will be clearly indicated.

Accessibility:

DMP-2

Data will be accessible via online services, including, at minimum, direct download but preferably user-customisable services for visualisation and computation

Usability:

DMP-3

Data will be structured using encodings that are widely accepted in the target user community and aligned with organisational needs and observing methods, with preference given to non-proprietary international standards.

DMP-4

Data will be comprehensively documented, including all elements necessary to access, use, understand, and process, preferably via formal structured metadata based on international or community-approved standards. To the extent possible, data will also be described in peer-reviewed publications referenced in the metadata record.

DMP-5

Data will include provenance metadata indicating the origin and processing history of raw observations and derived products, to ensure full traceability of the product chain.

DMP-6

Data will be quality-controlled and the results of quality control shall be indicated in metadata; data made available in advance of quality control will be flagged in metadata as unchecked.

Preservation:

DMP-7

Data will be protected from loss and preserved for future use; preservation planning will be for the long term and include guidelines for loss prevention, retention schedules, and disposal or transfer procedures.

DMP-8

Data and associated metadata held in data management systems will be periodically verified to ensure integrity, authenticity and readability.

Curation:

DMP-9

Data will be managed to perform corrections and updates in accordance with reviews, and to enable reprocessing as appropriate; where applicable this shall follow established and agreed procedures.

DMP-10

Data will be assigned appropriate persistent, resolvable identifiers to enable documents to cite the data on which they are based and to enable data providers to receive acknowledgement of use of their data.

 

AVHRR Fundamental Climate Data Records

The Working Group on Information Systems and Services (WGISS) is coordinating an internationally backed effort to create and preserve a 1 km spatial resolution Advanced Very High Resolution Radiometer (AVHRR) global landmass dataset. This program aims to pool 1 km AVHRR data from regional archives, with the potential for open and free access, as well as transcribing AVHRR data from unique media sources and reprocessing existing data. This effort to create a fundamental data record (FDR) for AVHRR observations will make use of a number of different preexisting individual data archives, located at many different organisations from around the world. These are shown in the table below. 6) 11) 14)

Figure 6: The AVHRR instrument has been flown on 18 different missions over more than four decades. (Image credit: UniBern)

 

Table 2: AVHRR FDR Datasets and Sources 13)

Provider

Dataset Description

ESA and University of Bern

Long time AVHRR series from 1981 to 2021, including more than 260,000 freely available data products

German Aerospace Centre (DLR)

Four datasets limited to Europe- Land Surface Temperature (LST) Nighttime, Sea Surface Temperature (SST), Vegetation Index (NDVI) and LST Daytime

Institute of Geodesy and Cartography (IGIK)

Department of Remote Sensing database of NOAA derived images and products dating back to 1996

NOAA and USGS

 1 km spatial resolution multispectral dataset of North America from the NOAA satellite series, dating back to 1979

Canada Centre for Mapping and Earth Observation (CCMEO) and Natural Resources Canada (NRCan)

Open source dataset covering Canadian territory from 1981 to 2013

 

 

National Meteorological Service (Argentina)

740 GB dataset of AVHRR data covering South America from 1995 to 2015

National Institute for Space Research (Brazil)

 10 TB dataset covering Brazil and some areas of surrounding nations for the period 1998-2022

University of Hawaii

AVHRR dataset collected from 1990 to 2000 that is stored on tapes at the University of Hawaii

South African National Space Agency (SANSA)

Data covering Southern Africa from 1985 to 2009, derived from NOAA satellite data

Italian Space Agency and University of Rome

Data collected from 2001-2009 of central Africa, stored on Digital Linear Tapes (DLT)

China Meteorological Administration

AVHRR dataset will be provided for Chinese territories by CMA AVHRR coverage of Mongolia from 1993 to 1999

Remote Sensing Department, Information and Research Institute of Meteorology, Hydrology and Environment

 AVHRR coverage of Mongolia from 1993 to 1999

Geo-Informatics and Space Technology Development Agency (GISTDA)

 GISTDA has agreed to provide archived data for Thailand

Commonwealth Scientific and Industrial Research Organisation (CSIRO)

Stitched archive of data from separate Hobart, Darwin, Melbourne, Perth, Alice Springs and Townsville datasets, with availability from 1981 onwards

 

Figure 7: Metadata extracted from ESA L1C products to build a heat chart on data acquisition frequency per day. Red indicates a smaller number of scenes available, while green indicates a higher number of scenes (up to 50). (Image credit: CEOS WGISS)

In line with the WGISS coordinated efforts for a global 1 km AVHRR dataset, the AVHRR Curation Project undertaken by European institutions such as the University of Bern, ESA, and the Dundee Satellite Receiving Station aims to preserve and consolidate nearly 40 years of Advanced Very High Resolution Radiometer (AVHRR) data. This extensive dataset, spanning from NOAA to EUMETSAT satellites since 1982, provides crucial continuity and consistency in monitoring Essential Climate Variables (ECVs). By ensuring data accessibility and homogenization through ESA's Heritage Space Program (LTDP+) programme, the project enhances the usability of AVHRR Level 1b data for climate research. It includes efforts in data rescue, validation, and metafile generation to maintain data integrity and facilitate comprehensive analyses of climate change impacts over time. The initiative supports global climate monitoring initiatives and projects under ESA's Climate Change Initiative, illustrating the critical role of AVHRR archives in advancing climate science and policy. 14)

References

1) Albani, Mirko, and Iolanda Maggio. “Long-Term Data Preservation Data Lifecycle, Standardisation Process, Implementation and Lessons Learned.” INTERNATIONAL JOURNAL OF DIGITAL CURATION, vol. 15, no. 1, 2020.

2) “Amplifying the Global Value of Earth Observation.” World Economic Forum, URL: https://www3.weforum.org/docs/WEF_Amplifying_the_Global_Value_of_Earth_Observation_2024.pdf

3) CEOS-WGISS Data Stewardship Interest Group. “Archive Technology Evolution.” Committee on Earth Observation Satellites, June 2023, URL: https://ceos.org/document_management/Working_Groups/WGISS/Interest_Groups/Data_Stewardship/White_Papers/Archive%20Technology%20Evolution%20White%20Paper.pdf

4) CEOS-WGISS Data Stewardship Interest Group. “Long Term Preservation of Earth Observation Space Data Preservation Guidelines.” March 2023, URL: https://ceos.org/document_management/Working_Groups/WGISS/Interest_Groups/Data_Stewardship/Recommendations/EO%20Data%20Preservation%20Guidelines.pdf

5) “Climate Models | NOAA Climate.gov.” Climate.gov, URL: https://www.climate.gov/maps-data/climate-data-primer/predicting-climate/climate-models

6) Dech, Stefan. “Potential and Challenges of Harmonizing 40 Years of AVHRR Data: The TIMELINE Experience.” MDPI, URL: https://www.mdpi.com/2072-4292/13/18/3618

7) Easterling, David. “Climate Data Challenges in the 21st Century.” ResearchGate, 15 February 2011, URL: https://www.researchgate.net/publication/49826559_Climate_Data_Challenges_in_the_21st_Century

8) Hobart, Marius, et al. “Drought Monitoring and Prediction in Agriculture: Employing Earth Observation Data, Climate Scenarios and Data Driven Methods; a Case Study: Mango Orchard in Tamale, Ghana.” Remote Sensing, vol. 16, no. 11, 2024. MDPI, https://www.mdpi.com/2072-4292/16/11/1942

9) “How Earth Observation Helps Natural Resource Management.” Dragonfly Aerospace, 5 April 2022, URL: https://dragonflyaerospace.com/earth-observation-and-natural-resource-management/

10) “Landsat's Critical Role in Water Management.” Landsat Science, URL: https://landsat.gsfc.nasa.gov/wp-content/uploads/2022/03/LandsatFactsheet_Water_v2_updated_508.pdf

11) Parton, Graham, et al. “Further Professionalising Data Stewardship: engaging with and building on experience, the work of Research Data Alliance PDS Interest Group.” Zenodo, 1 December 2023, URL: https://zenodo.org/records/8305591

12) “Satellite Remote Sensing for Water Resources Management: Potential for Supporting Sustainable Development in Data-Poor Regions.” American Geophysical Union, 29 October 2018, URL: https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2017WR022437

13) Monitoring Using Landsat 8 and Sentinel-2 Satellite Data (2014–2020) in Timsah Lake, Ismailia, Suez Canal Region (Egypt).” Journal of the Indian Society of Remote Sensing, vol. 50, 2022, pp. 2411-2428, URL: https://link.springer.com/article/10.1007/s12524-022-01613-9#Fig6

14) “WGISS-57 | CEOS.” Committee on Earth Observation Satellites, 6 November 2023, URL: https://ceos.org/meetings/wgiss-57/

15) “Why are climate data and evidence important.” World Bank Blogs, 8 December 2011, URL: https://blogs.worldbank.org/en/climatechange/why-are-climate-data-and-evidence-important