ASDI data catalog overview

The Amazon Sustainability Data Initiative offers several sustainability-oriented datasets through the AWS Public Dataset program. For additional details on this data and technical guidance on how to access it, please visit the Registry of Open Data on AWS (RODA). If you want to get involved, please email sustainability-data-initiative@amazon.com.

Featured Dataset: Sentinel-2
The Sentinel-2 mission is a land monitoring constellation of two satellites that provides global coverage of the Earth's land surface every five days. This data is used in ongoing studies by organizations such as the Blue Dot Observatory, which is establishing a global monitoring system for all at-risk water bodies, and the Radiant Earth Foundation, which is building training datasets to inform machine learning solutions for the global development community.

The data presented is organized in twelve main categories:

Weather Forecast Models
Weather forecast data is generated by computer models and forecasts the future state of weather. These models output variables such as temperature, precipitation, and other meteorological information about the oceans, land, and atmosphere. This information is valuable to the sustainability community because it can improve the predictive capabilities of emergency managers and planners.

HIRLAM (High Resolution Limited Area Model) Weather Model | managed by the Finnish Meteorological Institute: HIRLAM is an operational synoptic and mesoscale weather prediction model covering the European Union and Greenland.

Global Forecast System (GFS) | V2.0 and V3.0 managed by NOAA: GFS is a weather forecast model produced by the National Centers for Environmental Prediction (NCEP). Dozens of atmospheric and land-soil variables are available through this dataset, from temperatures, winds, and precipitation to soil moisture and atmospheric ozone concentration. The entire globe is covered by the GFS at a base horizontal resolution of 18 miles (28 kilometers) between grid points, which is used by the operational forecasters who predict weather out to 16 days in the future. Horizontal resolution drops to 44 miles (70 kilometers) between grid point for forecasts between one week and two weeks.

Unidata NOAA Global Forecast System (GFS) Model | managed by Unidata: The GFS is a weather forecast model produced by the National Centers for Environmental Prediction (NCEP). Dozens of atmospheric and land-soil variables are available through this dataset, from temperatures, winds, and precipitation to soil moisture and atmospheric ozone concentration. The entire globe is covered by the GFS at a base horizontal resolution of 18 miles (28 kilometers) between grid points, which is used by the operational forecasters who predict weather out to 16 days in the future. Horizontal resolution drops to 44 miles (70 kilometers) between grid point for forecasts between one week and two weeks.[MR2]

NOAA High-Resolution Rapid Refresh (HRRR): The HRRR is a NOAA real-time 3-kilometer (2-mile) resolution, hourly updated, cloud-resolving, convection-allowing atmospheric model, initialized by 3-kilometer grids with 3-kilometer radar assimilation. Radar data is assimilated in the HRRR every 15 minutes over a 1-hour period adding further detail to that provided by the hourly data assimilation from the 13-kilometer (8-mile) radar-enhanced Rapid Refresh. It covers the continental USA.

NOAA National Digital Forecast Data (NDFD) | managed by Cornell: Earth and Atmospheric Sciences at Cornell University has created a public data lake of climate data. The data is stored in columnar storage formats (ORC) to make it straightforward to query using standard tools like Amazon Athena or Apache Spark. The data itself is originally intended to be used for building decision support tools for farmers and digital agriculture. The first dataset is the historical NDFD/NDGD (National Digital Guidance Database) data distributed by NCEP, NOAA, and NWS. The NDFD/NDGD contain gridded forecasts and observations at 2.5-kilometer (1.5-mile) resolution for the Contiguous United States (CONUS). There are also 5-kilometer (3-mile) grids for several smaller U.S. regions and non-contiguous territories, such as Hawaii, Guam, Puerto Rico and Alaska. NOAA distributes archives of the NDFD/NDGD via its NOAA Operational Model Archive and Distribution System (NOMADS) in Grib2 format. The data has been converted to ORC to optimize storage space and to, more importantly, simplify data access via standard data analytics tools.

NOAA Global Ensemble Forecast System (GEFS) | managed by NOAA: Previously known as the GFS Global ENSemble (GENS), the GEFS is a weather forecast model made up of 21 separate forecasts, or ensemble members. The National Centers for Environmental Prediction (NCEP) started the GEFS to address the nature of uncertainty in weather observations, which is used to initialize weather forecast models. The GEFS attempts to quantify the amount of uncertainty in a forecast by generating an ensemble of multiple forecasts, each minutely different, or perturbed, from the original observations. With global coverage, GEFS is produced four times a day with weather forecasts going out to 16 days.

DWD COSMO-D2 | managed by Deutscher Wetterdienst: COSMO-D2 high-resolution, short-range numerical weather prediction model for Germany and adjacent countries; regular grid with 2.2-kilometer (1.3-mile) resolution and 65 vertical levels; updated at 00UTC and every following three hours; forecast range 27 hours (45 hours for 03UTC); selection of commonly used parameters.

DWD COSMO-D2 EPS Ensemble | managed by Deutscher Wetterdienst: COSMO-D2 EPS high-resolution, short-range numerical weather ensemble prediction model for Germany and adjacent countries; 20 ensemble members, regular grid with 2.2-kilometer (1.3-mile) resolution and 65 vertical levels; updated at 00UTC and every following three hours; forecast range 27 hours (45 hours for 03UTC); selection of commonly used parameters; ensemble members are bundled in joint grib files.

DWD ICON Global | managed by Deutscher Wetterdienst: ICON global numerical weather prediction model; average resolution of 13 kilometers (8 miles) with 90 vertical levels; updated at 00UTC and every following six hours with a forecast range of 120 hours (180 hours for 00UTC and 12UTC); selection of commonly used parameters.

DWD ICON Global EPS Ensemble | managed by Deutscher Wetterdienst: ICON global EPS ensemble prediction model; 40 ensemble members; average resolution of 40 kilometers (25 miles); updated at 00UTC and every following six hours with a forecast range of 120 hours (extended to 180 hours for 00UTC and 12UTC); selection of commonly used parameters; ensemble members are bundled in joint grib files.

DWD ICON-EU | managed by Deutscher Wetterdienst: ICON-EU regional numerical weather prediction model; European nesting region with increased resolution of approximately 6.5 kilometers (4 miles) with 60 vertical levels; updated at 00UTC and every following three hours with 120-hour forecast range; selection of commonly used parameters.

DWD ICON-EU EPS Ensemble | managed by Deutscher Wetterdienst: ICON-EU EPS regional ensemble weather prediction model; 40 ensemble members; European nesting region with increased resolution of approximately 20 kilometers (12 miles); updated at 00UTC and every following three hours with 120-hour forecast range; selection of commonly used parameters; ensemble members are bundled in joint grib files.

UK Met Office Global and Regional Weather Forecasts: Archive data from the United Kingdom. Met Office Global and Regional Ensemble Prediction System (MOGREPS) available on Amazon S3. Data from two models is available: MOEGREPS-UK, a high resolution weather forecast covering the United Kingdom, and MOGREPS-G, a global weather forecast.

Atmospheric Models from Météo-France Models | managed by OpenMeteoData: Global and high-resolution regional atmospheric models from Météo-France. Includes dozens of atmospheric variables are available through this dataset including temperatures, winds, and precipitation. Our work is based on open data from Météo-France, but we are not affiliated or endorsed by Météo-France.

Weather Observations
Weather observations are the fundamental data used to monitor weather and assess potential risks emerging from extreme weather conditions in order to issue weather warnings. They are also used as inputs to weather forecasts. The data is captured manually by a weather observer, automatically using instrumentation in weather stations, or through a hybrid scheme using weather observers to augment the otherwise automated weather station data.

NEXRAD on AWS | managed by NOAA: Real-time and archival data from the Next Generation Weather Radar (NEXRAD) network of the USA.

Global Surface Summary of Day (GSOD) | managed by NOAA: GSOD is a collection of daily weather measurements including temperature, wind speed, humidity, pressure, and more from 9000+ weather stations around the world.

NOAA Integrated Surface Database (ISD) | managed by NOAA: The ISD consists of global hourly and synoptic observations compiled from numerous sources into a gzipped fixed width format. The ISD was developed as a joint activity within Asheville's Federal Climate Complex. The database includes over 35,000 stations worldwide, with some having data as far back as 1901, though the data show a substantial increase in volume in the 1940s and again in the early 1970s. Currently, there are over 14,000 "active" stations updated daily in the database. The total uncompressed data volume is around 600 gigabytes; however, it continues to grow as more data are added. ISD includes numerous parameters such as wind speed and direction, wind gust, temperature, dew point, cloud data, sea level pressure, altimeter setting, station pressure, present weather, visibility, precipitation amounts for various time periods, snow depth, and various other elements as observed by each station.

Climate Change
Climate data includes both observations and modeled data. For data observations to be considered climate quality, they need to include a time series of measurements of sufficient length (generally 30 years or more), consistency, and continuity because this information is used to determine climate variability and change. Climate model data is most commonly available as climate data projections. This information is designed to evaluate the behavior of the global climate system and is relatively effective at simulating global climate characteristics such as global temperature and broad circulation patterns.

NASA NEX: A collection of Earth science datasets maintained by NASA, including climate change projections and satellite images of the Earth's surface. These are Global Daily Downscaled Climate Projections.

CCAFS-Climate Data: High resolution climate data to help assess the impacts of climate change primarily on agriculture. These open access datasets of climate projections will help researchers make climate change impact assessments.

Global Historical Climatology Network Daily (GHCN-D) | managed by NOAA: GHCN-D is an integrated database of climate summaries from land surface stations across the globe that have been subjected to a common suite of quality assurance reviews. Some data are more than 175 years. This is the daily average version of the dataset, which is updated daily. The datasets are grouped by year.

NOAA Global Historical Climatology Network Hourly (GHCN-H) | managed by NOAA: GHCN-H is a dataset from NOAA that contains daily observations over global land areas. It contains station-based measurements from land-based stations worldwide, about two thirds of which are for precipitation measurement only. Other meteorological elements include, but are not limited to, daily maximum and minimum temperature, temperature at the time of observation, snowfall and snow depth. It is a composite of climate records from numerous sources that were merged together and subjected to a common suite of quality assurance reviews. Some data are more than 175 years old. The data is in CSV format. Each file corresponds to a year from 1763 to present and is named as such.

ECMWF ERA5 | managed by Intertrust: ERA5 is the fifth generation of ECMWF atmospheric re-analyses of the global climate, and the first reanalysis produced as an operational service. It utilizes the best available observation data from satellites and in-situ stations, which are assimilated and processed using ECMWF's Integrated Forecast System (IFS) Cycle 41r2. The dataset provides all essential atmospheric meteorological parameters like, but not limited to, air temperature, pressure, and wind at different altitudes, along with surface parameters like rainfall, soil moisture content, and sea parameters like sea-surface temperature and wave height. ERA5 provides data at a considerably higher spatial and temporal resolution than its legacy counterpart ERA-Interim. ERA5 consists of high resolution version with 31-kilometer (19-mile) horizontal resolution, and a reduced resolution ensemble version with 10 members. It is currently available since 2008, but will be continuously extended backwards, first until 1979 and then to 1950.

Downscaled Climate Data for Alaska | managed by Scenarios Network for Alaska and Arctic Planning at the International Arctic Research Center, University of Alaska, Fairbanks: This dataset contains historical and projected dynamically downscaled climate data for the State of Alaska and surrounding regions at 20-kilometer (12-mile) spatial resolution and hourly temporal resolution. This data was produced using the Weather Research and Forecasting (WRF) model (Version 3.5). We downscaled both ERA-Interim historical reanalysis data (1979-2015) and both historical and projected runs from 2 GCM’s from the Coupled Model Inter-comparison Project 5 (CMIP5): GFDL-CM3 and NCAR-CCSM4 (historical run: 1970-2005 and RCP 8.5: 2006-2100).

Scientific Information for Land Owners (SILO) | managed by Queensland Government: SILO is a database of Australian climate data from 1889 to the present. It provides continuous, daily time-step data products in ready-to-use formats for research and operational applications. Gridded SILO data in annual NetCDF format are on AWS. Point data are available from the SILO website.

Energy
The energy category includes datasets that support sustainable energy work such as wind and solar climatologies, energy intensity indicators, or annual consumption, among others.

Wind Integration National Dataset (WIND) | managed by NREL: The WIND is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies.

National Solar Radiation Data Base | managed by NREL: The National Solar Radiation Data Base (NSRDB) is a serially complete collection of hourly and half-hourly values of the three most common measurements of solar radiation—global horizontal, direct normal, and diffuse horizontal irradiance—and meteorological data. These data have been collected at a sufficient number of locations and temporal and spatial scales to accurately represent regional solar radiation climates.

Land Hydrology
Hydrological data includes both observations and modeled data. Hydrological model data helps monitor and predict hydrological variables in the real-world system that are not easily observed (e.g., surface water, soil moisture, runoff, and groundwater). This data aids the sustainability communities’ understanding and management of water resources.

NOAA National Water Model Reanalysis | managed by NOAA: The NOAA National Water Model Reanalysis dataset contains output from a 25-year retrospective simulation (January 1993 through December 2017) of version 1.2 of the National Water Model. One application of this dataset is to provide historical context to current real-time streamflow, soil moisture, and snowpack NWM conditions. The Reanalysis data can be used to infer flow frequencies and perform temporal analyses with hourly streamflow output and three-hourly land surface output. The long-term dataset can also be used in the development of end user applications which require a long baseline of data for system training or verification purposes.

NOAA National Water Model (NWM) Short-Range Forecast | managed by NOAA: The NWM is a water resources model that simulates and forecasts water budget variables, including snowpack, evapotranspiration, soil moisture, and streamflow over the entire continental United States (CONUS). The model, launched in August 2016, is designed to improve the ability of NOAA to meet the needs of its stakeholders (forecasters, emergency managers, reservoir operators, first responders, recreationists, farmers, barge operators, and ecosystem and floodplain managers) by providing expanded accuracy, detail, and frequency of water information. It is operated by NOAA’s Office of Water Prediction. This bucket contains a four-week rollover of the Short-Range Forecast model output and the corresponding forcing data for the model. The model is forced with meteorological data from the High Resolution Rapid Refresh (HRRR) and the Rapid Refresh (RAP) models. The Short-Range Forecast configuration cycles hourly and produces hourly deterministic forecasts of streamflow and hydrologic states out to 18 hours.

Ocean Forecast Models
Ocean models are numerical models with a focus on the properties of oceans and their circulation. Ocean models play a large role in aiding our understanding of the ocean's influence on weather and climate.

NOAA Ocean Forecast System (OFS) | managed by NOAA: The OFS was developed to serve the maritime user community in a joint project of the NOAA/National Ocean Service (NOS)/Office of Coast Survey, the NOAA/NOS/Center for Operational Oceanographic Products and Services (CO-OPS), and the NOAA/National Weather Service (NWS)/National Centers for Environmental Prediction (NCEP) Central Operations (NCO). OFS generates water level, water current, water temperature, water salinity (except for the Great Lakes), and wind conditions nowcast and forecast guidance four times per day.

Air Quality
Air quality data includes both observed and modeled data. It is used to monitor and predict air quality impacts on human health and the environment. Additionally, it can be used to monitor regulatory compliance.

OpenAQ: Global, aggregated physical air quality data from public data sources provided by government, research-grade, and other sources.

GEOS-Chem Input Data: Input data for the GEOS-Chem Chemical Transport Model. Includes the NASA/GMAO MERRA-2 and GEOS-FP meteorological products, the HEMCO emission inventories, and other small data such as model initial conditions.

Environmental Protection Agency Risk-Screening Environmental Indicators: Detailed air model results from EPA’s Risk-Screening Environmental Indicators (RSEI) model.

Safecast | managed by Safecast: An ongoing collection of radiation and air quality measurements taken by devices involved in the Safecast project.

Earth Observations
Earth Observational datasets are mainly comprised of information derived from satellite observations, but can also include imagery taken from aircrafts. The data is typically used to monitor land cover change for environmental monitoring, agricultural applications (e.g., crop management supporting food security), inland water monitoring, and flood mapping and management (e.g., post flood: risk analysis, loss assessment, and disaster management).

Sentinel-1 | managed by Sinergize: Sentinel-1 is a pair of European imaging Synthetic Aperture Radar (SAR) satellites launched in 2014 and 2016. Its six-day revisit cycle and ability to observe through clouds makes it perfect for sea and land monitoring, emergency response due to environmental disasters, and economic applications. (Requester Pays model)

Sentinel-1 Single Look Complex (S1 SLC) dataset for South Asia, Southeast Asia, Taiwan, and Japan | managed by Nanyang Technological University in Singapore: The S1 SLC dataset contains Synthetic Aperture Radar (SAR) data in the C-Band wavelength. The SAR sensors are installed on a two-satellite (Sentinel-1A and Sentinel-1B) constellation orbiting the Earth with a combined revisit time of six days, operated by the European Space Agency. The S1 SLC data are a Level-1 product that collects radar amplitude and phase information in all-weather, day or night conditions, which is ideal for studying natural hazards and emergency response, land applications, oil spill monitoring, sea-ice conditions, and associated climate change effects. (Requester Pays model)

Sentinel-2 | managed by Sinergise: The Sentinel-2 mission is a land monitoring constellation of two satellites that provide high resolution optical imagery and continuity for the current SPOT and Landsat missions. The mission provides global coverage of the Earth's land surface every five days, making the data of great use in ongoing studies. (Requester Pays model)

Landsat 8 | managed by Planet: An ongoing collection of satellite imagery of all land on Earth produced by the Landsat 8 satellite.

Moderate Resolution Imaging Spectroradiometer (MODIS) | managed by the U.S. Geological Survey and NASA: MODIS is a sensor on the Terra and Aqua satellites. MODIS has a low spatial resolution, but a high temporal resolution which makes it good for land change detection. It is also used for forest fire detection.

GOES | managed by NOAA: GOES provides continuous weather imagery and monitoring of meteorological and space environment data across North America. This is a geostationary satellite whose primary function is to support weather forecasting. We host GOES 17 and GOES 18 data.

Unidata GOES-16 | managed by Unidata: GOES provides continuous weather imagery and monitoring of meteorological and space environment data across North America.

National Agricultural Imagery Program (NAIP): High quality 1-meter aerial imagery (from aircraft) captured during the agricultural growing seasons in the continental U.S. (Requester Pays model)

China-Brazil Earth Resources Satellite (CBERS) | managed by AMS Kepler: Imagery acquired by the CBERS takes multi-spectral observations that can be used for detecting land use. (Requester Pays model)

DigitalGlobe Open Data Program | managed by DigitalGlobe: Pre and post event high-resolution satellite imagery in support of emergency planning, risk assessment, monitoring of staging areas and emergency response, damage assessment, and recovery. Also includes crowdsourced damage assessments for major, sudden onset disasters.

OpenStreetMap (OSM): OSM is a free, editable map of the world, created and maintained by volunteers. Regular OSM data archives are made available in Amazon S3.

OpenStreetMap (OSMLR) Linear Referencing: OSMLR a linear referencing system built on top of OpenStreetMap. OSM has great information about roads around the world and their interconnections, but it lacks the means to give a stable identifier to a stretch of roadway. OSMLR provides a stable set of numerical IDs for every 1-kilometer (.6 mile) stretch of roadway around the world. In urban areas, OSMLR IDs are attached to each block of roadways between significant intersections.

Terrain Tiles: A global dataset providing bare-earth terrain heights, tiled for easy usage.

USGS 3DEP LiDAR Point Clouds | managed by Hobu, Inc.: The goal of the 3DEP is to collect elevation data in the form of light detection and ranging (LiDAR) data over the conterminous United States, Hawaii, and the U.S. territories, with data acquired over an 8-year period. This dataset provides two realizations of the 3DEP point cloud data. The first resource is a public access organization provided in Entwine Point Tiles format, which a lossless, full-density, streamable octree based on LASzip (LAZ) encoding. The second resource is a Requester Pays of the same data in LAZ (Compressed LAS) format. Resource names in both buckets correspond to the USGS project names.

NOAA Global Hydro Estimator (GHE) | managed by NOAA: The GHE provides a global mosaic imagery of rainfall estimates from multi-geostationary satellites, which currently includes GOES-16, GOES-15, Meteosat-8, Meteosat-11 and Himawari-8. The GHE products include: instantaneous rain rate, 1-hour, 3-hour, 6-hour, 24-hour, and multi-day rainfall accumulation.

Natural, Social, and Economic Indicators
Natural, social, and human capital datasets include social, environmental, and economic indicators. Examples of indicators are ecological footprint, poverty rate, life expectancy, and unemployment rate. This information is valuable to the sustainability community for understanding the current status of a region, country, or continent, and the rate of change for environmental, social, and economic measures.

NFA 2017 – Ecological Resource Use and Resource Capacity of Nations from 1961 to 2013: Ecological Footprint vis-a-vis GDP. The National Footprint Accounts (NFAs) measure the ecological resource use and resource capacity of nations from 1961 to 2013. The calculations in the NFAs are primarily based on United Nations data sets, including those published by the Food and Agriculture Organization, United Nations Commodity Trade Statistics Database, and the UN Statistics Division, as well as the International Energy Agency.

U.S. Census Bureau American Community Survey (ACS): U.S. Census Bureau ACS Public Use Microdata Sample (PUMS) available in a linked data format using the Resource Description Framework (RDF) data model.

High Resolution Population Density Maps + Demographic Estimates by CIESIN and Facebook | managed by Facebook: Population data for a selection of countries, allocated to 1-arcsecond blocks and provided in a combination of CSV and Cloud-optimized GeoTIFF files. This refines CIESIN’s Gridded Population of the World using machine learning models on high-resolution worldwide Digital Globe satellite imagery. CIESIN population counts aggregated from worldwide census data are allocated to blocks where imagery appears to contain buildings.

Disasters
Disaster datasets include data used for disaster mitigation and prevention.

Open Earthquake Early-Warnings (OpenEEW) | managed by Grillo: Grillo has developed an IoT-based earthquake early-warning system in Mexico and Chile and is now opening its entire archive of unprocessed accelerometer data to the world to encourage the development of new algorithms capable of rapidly detecting and characterizing earthquakes in real time.

Biodiversity
Biodiversity datasets monitor the count and variability of living organisms (terrestrial, marine, and other aquatic ecosystems) and their ecosystems.

eBird Status and Trends Model Results | managed by the Cornell Lab of Ornithology: The eBird Status and Trends project generates estimates of bird occurrence and abundance at a high spatiotemporal resolution. This dataset represents the primary modeled results from the analysis workflow and are designed for further analysis, synthesis, visualization, and exploration.

Machine Learning
Machine Learning datasets are used for machine-learning research related to sustainability. These include labeled training datasets for supervised and semi-supervised machine learning algorithms.

Africa Soil Information Service (AfSIS) Soil Chemistry | managed by Quantitative Engineering Design: This dataset contains soil infrared spectral data and paired soil property reference measurements for georeferenced soil samples that were collected through the AfSIS project, which lasted from 2009 through 2018. In this release, we include data collected during Phase I (2009-2013). Georeferenced samples were collected from 19 countries in Sub-Saharan African using a statistically sound sampling scheme, and their soil properties were analyzed using both conventional soil testing methods and spectral methods (infrared diffuse reflectance spectroscopy). The two types of data can be paired to form a training dataset for machine learning, such that certain soil properties can be well-predicted through less expensive spectral techniques.

Latest news
See how we put our scale and inventive culture to work on building a sustainable future.