This project contains a tool to harvest dataset metadata from the Humanitarian Data Exchange (HDX) and transform it into the OpenGeoMetadata (OGM) Aardvark schema.
Make sure you have Ruby installed on your system.
To run the harvester:
ruby harvester.rbThe script will:
- Check
state.jsonfor the last run date. - Fetch datasets from HDX that have been modified since that date.
- Save the original metadata to
metadata-hdx/. - Transform and save the metadata to
metadata-aardvark/. - Update
state.jsonwith the current timestamp.
convert.rb transforms one or more HDX metadata files from metadata-hdx/ into the OGM Aardvark schema and writes the results to metadata-aardvark/.
Pass the input files as arguments:
ruby convert.rb metadata-hdx/some-id.json metadata-hdx/another-id.jsonOr convert all files at once using a shell glob:
ruby convert.rb metadata-hdx/*.jsonOutput files are written to metadata-aardvark/ with the same filename as the corresponding input file.
To run the test suite:
ruby harvester_spec.rb