This is my Internship Report: 2 months working at Development Seed

Scroll down to find more!

Report on the Summer Internship at Development Seed

Introduction

During my summer internship at Development Seed, I contributed to the EOPF toolkit project, which is part of ESA’s Earth Observation Processing Framework (EOPF) family. My main task was to explore the new Sentinel-1 Zarr data format and demonstrate its potential for temporal analysis and flood monitoring. To become familiar with the tools, I started with the EOPF 101 repository, which provides introductory workflows for early Zarr users. This toolkit served as my entry point to understanding how to access, visualize, and manipulate Sentinel-1 GRD Zarr datasets. As a practical application, I developed a use case focused on the 2024 Valencia floods, where I tested workflows for time series analysis of Sentinel-1 GRD data in Zarr format.

Why Zarr matters?

The Zarr format is cloud-native, chunked, compressed, and optimized for handling large, multidimensional datasets. Unlike the traditional SAFE format, Zarr allows users to:

    Store large datasets in the cloud
    Access only the specific variables needed
    Avoid downloading bulky files

For example, in Sentinel-1 GRD products, if only the measurement group is required, Zarr enables access to just that portion of the dataset without loading the entire structure. This makes it particularly suitable for time series analysis, where large volumes of repeated acquisitions need to be processed efficiently.

    Loading just this

    , rather than all these!

Valencia flood user case

In September 2024, heavy rainfall caused severe flooding in Valencia, Spain. The event was chosen as a case study to test cloud-native SAR workflows. For the analysis, I worked with 14 Sentinel-1 GRD acquisitions spanning the pre-flood, flood, and post-flood phases. Traditionally, this would involve downloading and managing 14 SAFE files. Instead, using Zarr allowed me to directly access cloud-hosted arrays. By leveraging xarray and xarray-sentinel, I constructed a time series dataset. Intensity backscatter values were compared across dates to detect flooded areas:

    Low backscatter (dark values) indicated smooth water surfaces
    Higher values represented urban structures, vegetation, or dry land
    Avoid downloading bulky files

By stacking datasets along a new time dimension, it was possible to observe how floodwaters expanded and receded during the event. While this workflow did not aim to produce an official flood map, it demonstrated how Zarr lowers the entry barrier for building multi-temporal SAR analyses.

    Stacked datasets over a new time dimension

    Analysing the flood evolution over a known point

    Plotting of msot of the intensity images

Technical challenges

One challenge was coregistration. Sentinel-1 GRD acquisitions do not naturally align in shape or dimension values, which complicates stacking into a single time series. While traditional SAR software (e.g. SNAP) handles this automatically, no Python library currently supports coregistration of Zarr-based data. To address this, I implemented a DIY coregistration approach using xarray, which allowed me to align datasets sufficiently for exploratory analysis. This highlighted both the flexibility of Zarr and the need for continued tool development in the ecosystem.

    Raw values

    Intensity backscatter

Advantages and Limitations

Advantages of using Zarr for SAR workflows:

    Faster processing compared to traditional desktop software (e.g., SNAP)
    Cloud-native storage and direct access to relevant variables
    Seamless integration with Python libraries (xarray, dask)

Current limitations:

    Limited availability of Sentinel-1 GRD Zarr datasets in public STAC catalogs
    Restricted set of SAR operations supported in Python libraries
    Lack of built-in coregistration tools for Zarr

Workarounds include requesting data via the EOPF sample service or converting SAFE products to Zarr format independently using the EOPF CPM tool .

Outlook

The migration of Sentinel-1 data to Zarr represents an important step toward cloud-native SAR processing. Open-source libraries such as sarsen are already expanding the available workflows, progressively replicating operations traditionally performed in SNAP.

Although not all functionalities are yet available, the direction is clear: Zarr enables faster, more flexible, and scalable analysis directly in Python environments. This has significant implications for disaster monitoring, where rapid access to data is critical.

Personal reflection

Through this internship, I gained hands-on experience with Zarr, SAR data handling, and cloud-native workflows. Starting with EOPF 101, I progressed from a beginner to building a complete time series analysis workflow applied to a real-world disaster case.

This project allowed me to contribute to the early stages of cloud-native SAR processing and highlighted the importance of open-source collaboration. The ongoing development of the EOPF ecosystem demonstrates how collective efforts can reshape Earth Observation workflows — and being part of this process was both valuable and motivating.

Other Projects