Tools serving to control and read vocabulary data in a standardized way.
Go to file
2024-11-09 15:33:49 +01:00
scripts Use MD_STD::json_encode over general \json_encode() 2023-11-27 01:33:53 +01:00
src Properly handle commas at the end of names when guessing certainty 2024-11-09 15:33:49 +01:00
static Add blacklist for unwanted rewrites in consolidating place names 2023-11-26 23:55:43 +01:00
tests Properly handle commas at the end of names when guessing certainty 2024-11-09 15:33:49 +01:00
LICENSE Add license 2021-02-06 13:52:19 +01:00
phpstan-baseline.neon Rewrite incomplete time span spellings to extend parsable and splittable time names 2023-11-20 03:18:02 +01:00
phpstan.neon Rewrite incomplete time span spellings to extend parsable and splittable time names 2023-11-20 03:18:02 +01:00
phpunit.xml Migrate PHPunit config to PHPUnit 10's requirements 2023-11-14 03:28:24 +01:00
README.md Add README 2021-02-06 13:40:54 +01:00

Tools to automatically clean and enrich vocabulary entries at museum-digital

This repository contains a set of tools, that can be hooked into an existing application working with museum-digital's structures and libraries, to simplify the handling of vocabulary entries.

General applicability

While most scripts in this repository require a DB connection to a museum-digital vocabulary database, and are thus likely useful outside of museum-digital's own ecosystem. An exception are src/NodaTimeSplitter.php and src/NodaUncertaintyHelper.php.

NodaTimeSplitter

src/NodaTimeSplitter.php contains a list of rules to reformulate and parse entered time names into an array.

NodaUncertaintyHelper

src/NodaUncertaintyHelper.php contains lists of indicators for invalid or uncertain inputs and functions to use those lists to clean inputs . If, e.g., "Berlin?" has been entered as a place, this actually means that the entered place is "Berlin" and the entry is uncertain.