Commit Graph

351 Commits

Author SHA1 Message Date
e8edb4a459 Time splitter: Handle first/second half
Close #31
2025-04-05 00:09:39 +02:00
8491b62a83 Validate against time errors in autogenerating translations for times
Close #30
2025-04-04 20:03:59 +02:00
bb2b1c2c32 Update NodaGroup 2025-03-13 00:30:33 +01:00
5054d3c62f Use more rigurous trimming in NodaConsolidatedNamesForPersinst 2025-03-10 04:18:00 +01:00
beba838c0d Correctly handle multibype hyphens in XXXX-XXXX 2025-03-10 04:13:59 +01:00
54dd958073 See before 2025-03-10 04:05:00 +01:00
5b99304b5c Accept an additional type of hyphen / dash in time splitting 2025-03-10 03:59:44 +01:00
5cce98f15b Extend tests 2025-03-10 03:20:46 +01:00
5036c77f32 Extend test for getting actor ID by life dates + name 2025-03-10 02:18:28 +01:00
e95415be8f Add test for getting actor ID by name with life dates 2025-03-10 01:48:09 +01:00
5192781494 Use Wikipedia API for getting descriptions from Wikipedia rather than
parsing HTML in Wikidata fetcher

Thanks @awinkler
2025-03-09 02:08:26 +01:00
d9d9f7fcdc Continue refactoring tests for time splitter to run provider-based 2025-02-24 14:02:42 +01:00
dbfa0df17f Begin restructuring NodaTimeSplitterTest to use data providers 2025-02-21 10:32:07 +01:00
3409ec7afe Begin adding autotranslation language CRH / Crimean Tatar
Some formatting is still unclear. See https://forum.museum-digital.info/d/52-additional-languages-for-translations-crimean-tatar/9
2025-02-18 17:51:36 +01:00
27ac3f255a Minor typing improvements 2025-02-15 13:36:50 +01:00
9d7d53a858 Disallow fetching from Wikidata disambiguation pages
Close #23
2025-02-13 22:37:17 +01:00
28f6db67ff Disable XML error warnings when parsing unclean inputs from Wikidata 2025-02-13 21:48:07 +01:00
2f3bc5f2fa Prefer wikipedia page titles over wikidata labels
Close #28
2025-02-13 21:38:13 +01:00
39362f537a Merge branch 'master' of gitea:museum-digital/MDNodaHelpers 2025-02-13 17:19:43 +01:00
de0357473a Make constant for test language in NodaWikidataFetcherTest public, allowing reuse 2025-02-13 17:19:06 +01:00
ef43270fb2 Map suffixes material and technique to their respective tag relation
types
2025-02-13 14:04:38 +01:00
338e09f001 Add kannada to list of languages fetched from wikidata 2025-02-13 13:10:45 +01:00
4cf9eaf4fa Remove superfluous params passed to function 2025-02-13 13:10:30 +01:00
18438251a7 Add functions for getting IDs by any translated entry irrespective of
the language
2025-02-12 17:15:19 +01:00
1cf0f9858a Add tests for loading translations in NodaWikidataFetcher 2025-02-12 16:02:04 +01:00
1d50027809 Make function getWikidataEntity public 2025-02-12 15:48:52 +01:00
d1cee17ef5 Add Telugu to list of languages to fetch in Wikidata fetcher
Close #24
2025-02-12 12:47:02 +01:00
baf7905e0b Map gender Q207959
Q207959 is androgyny, mapping is a preliminary solution
2025-02-03 09:41:16 +01:00
9bf14d7d91 Add search function for getting entries in NodaIDGetter across vocabs 2025-01-31 23:25:40 +01:00
a621534136 Update NodaBlacklistedTerms 2025-01-24 13:45:28 +01:00
51fe9a5e45 Cover more edge cases for splitting time names 2025-01-15 11:49:20 +01:00
9c2eaa2929 Allow splitting 1945-48 2025-01-15 10:35:35 +01:00
546c17031a Make NodaImportLogger more resilient, prevent error in case of duplicate import names 2024-12-12 12:43:11 +01:00
bf22f5541d Retrieve "displayed subject" relationship from suffix "<Motiv>", "[Motiv]" 2024-12-03 16:07:41 +01:00
e036d7881a Add missing strict typing in function params 2024-12-01 22:11:17 +01:00
d8db941485 Disallow tags of name "Nichtmünzliches" (de) 2024-11-24 16:08:14 +01:00
b7bb7364d4 Ensure duplicate time names can be parsed in NodaTimeSplitter (e.g.
1.1.2024-1.1.2024)
2024-11-20 10:02:10 +01:00
4dcd93b947 Better validate input JSON fetched from Wikipedia 2024-11-12 15:36:32 +01:00
c72ad51dda Merge branch 'master' of gitea:museum-digital/MDNodaHelpers 2024-11-11 09:11:35 +01:00
d6dea3e280 Remove use of SESSION in NodaWikidataFetcher 2024-11-11 09:11:15 +01:00
6f7ad13c4e Add class NodaTagRelationIdentifier for parsing tag relation types from
input tag names
2024-11-09 19:44:09 +01:00
48355a6a36 Identify uncertainty before brackets ("Berlin ? (Germany)" > "Berlin
(Germany)" + Uncertain)
2024-11-09 18:42:18 +01:00
7cfe752c94 Handle commas when guessing time certainty 2024-11-09 15:40:27 +01:00
29ca05f552 Properly handle commas at the end of names when guessing certainty 2024-11-09 15:33:49 +01:00
eb371d4270 Ensure times can be split despite spaces at random points in given name 2024-10-23 18:02:23 +02:00
16f36c0852 Improve test coverage 2024-10-10 14:32:55 +02:00
669a8a1459 Add tests for lookup functions by vocabulary references 2024-10-10 14:16:52 +02:00
a9c506497c Respect diacritics when looking up tag, actor, .. IDs 2024-10-10 09:51:28 +02:00
06f13c1a71 Add functions for loading only norm data links from Wikidata for places
+ actors
2024-10-03 16:36:30 +02:00
cd49f194f2 Refactor wikidata fetcher 2024-10-03 15:56:31 +02:00