Commit Graph

296 Commits

Author SHA1 Message Date
55b2b61ef7
Add classes for syncing fulltext indexes in manticore, not mysql
directly
2021-12-08 23:00:58 +01:00
24714265c2
Prevent error if wikidata doesn't return a search result 2021-11-30 17:53:24 +01:00
ea280fc144
Add non-empty-string phpdoc types to NodaWikidataFetcher 2021-11-29 22:31:17 +01:00
0ab6f5e608
Add "Anonym" and "Anonymus" to list of disallowed actor names 2021-11-21 01:39:57 +01:00
054dc731f1
Disable libxml errors when parsing Wikipedia information 2021-11-18 23:53:11 +01:00
d0c4bfcf1f
Add more blacklisted tag names (e.g. "weitere", "other") 2021-10-16 19:27:43 +02:00
99a5303773
Merge branch 'master' of gitea:museum-digital/MDNodaHelpers 2021-10-10 13:22:44 +02:00
790dcd92fa
Add [vermutlich] to list of uncertainty suffixes for actors 2021-10-10 13:22:13 +02:00
581d9c7079
Use "d" for coordinates fetched through wikidata, remove useless
parentheses
2021-10-10 12:32:55 +02:00
edbe0230af
Add "research_note" to list of accepted edited sections in noda edit log 2021-10-09 14:10:10 +02:00
ca7424b043
Add NodaNameGetter for batch retrieval of names 2021-09-16 01:09:10 +02:00
fe31a29159
Add "vermtl. " to list of uncertainty prefixes 2021-08-27 18:50:51 +02:00
fb327762dc
Add capability to split english decade terms (1920s) 2021-08-27 16:19:19 +02:00
bb4e2a727a
Add "c. " to list of uncertainty prefixes 2021-08-27 15:24:50 +02:00
7d89596286
Add option to save edits to name variants (for actors) in edit log 2021-08-19 12:23:08 +02:00
a0b6207f81
Add missing htmlspecialchars in Wikidata results list 2021-08-15 20:03:25 +02:00
6d60d9eec7
Significantly extend the timeout for SPARQL queries to Wikidata 2021-08-13 13:07:29 +02:00
87fd2a25df
Add functions for identifying Wikidata IDs by external IDs 2021-08-12 15:33:48 +02:00
8eb576f43d
Extend list of known genders to parse from Wikidata 2021-08-12 13:59:20 +02:00
aa9f307c55
Expect minimized JS file in injecting js to wikidata results pages 2021-08-10 14:35:25 +02:00
0167890147
Add option to inject JS on wikidata results lists 2021-08-08 17:38:21 +02:00
e773bab7ce
Allow a special wikidata results list for actors, making suggestions
based on birth and death dates
2021-08-07 17:38:49 +02:00
e69be5b2b1
Use === over == in more cases 2021-07-24 23:21:00 +02:00
d269b6644b
Extend blacklist of disallowed tag names 2021-07-14 22:01:46 +02:00
0f2f7b2787
Add the different variants of "verschiedenes" ("various" in German) to
tag blacklist
2021-07-14 21:58:14 +02:00
7e27f15515
Remove specific blacklist file for tags 2021-07-14 13:46:27 +02:00
f930ca794e
Add a list of blacklisted tags 2021-07-14 13:37:18 +02:00
0fa759c604
Add check against empty wikidata / wikipedia descriptions 2021-07-06 12:50:33 +02:00
fba4706b67
Add check against empty source ID in references to controlled
vocabularies in wikidata fetcher
2021-07-06 12:15:31 +02:00
af13f747b7
Allow listed and searched Wikidata entries to be without descriptions 2021-07-03 15:18:02 +02:00
c56ae6ce66
Merge branch 'master' of gitea:museum-digital/MDNodaHelpers 2021-07-01 15:36:55 +02:00
3a5790853c
Add ", um" to the list of suffixes to indicate a time entry being
uncertain
2021-07-01 15:35:18 +02:00
bd3851ccf4
Use separate function for generating overview lists in
NodaWikidataFetcher
2021-06-30 22:55:37 +02:00
2c0d8e041e
Add "nicht benannt" to list of unwanted place and actor names 2021-06-30 14:36:11 +02:00
062c0d12dc
Extend list of known unwanted actor names 2021-06-29 15:37:05 +02:00
7bd315aa0c
Add "ism." as a shorthand for ismeretlen to list of disallowed place
name

Ismeretlen means unknown / no information in Hungarian and is thus a
non-value.
2021-06-29 13:50:09 +02:00
b6b2bbccff
Extend list of empty place names to remove 2021-06-18 23:30:03 +02:00
d244065dbe
Update base update time of places when editing a place 2021-05-27 02:33:51 +02:00
ce453a3d69
Allow noda_link as a loggable section in NodaLogEdit 2021-05-27 00:35:15 +02:00
a50610b640
Fix wrong table name 2021-05-27 00:34:56 +02:00
bc3f2a94d6
Add function for getting tags by base name, log base edits for tags in
NodaLogEdit
2021-05-26 22:36:02 +02:00
a4f24e5478
Add logging of synchronization with wikidata 2021-05-26 17:12:15 +02:00
8a30cf2c2a
Add class NodaLogEdit for easily logging updates to the main noda tables 2021-05-26 16:24:20 +02:00
6a91e31f41
Improve handling of timespans 2021-05-13 23:00:23 +02:00
2a8ba31410
Import place hierarchy from Wikidata
Close #5
2021-05-11 01:37:49 +02:00
874cfb8a6f
Extend time splitter to handle e.g. "17./18. Jh." 2021-05-07 16:25:50 +02:00
db6953ca51
Move syncers to src/Sync subdirectory 2021-05-06 23:37:31 +02:00
74caf06280
Use lowercase typing for instanceof > instanceOf 2021-05-06 23:34:55 +02:00
c84e3401ff
Add classes for keeping the fulltext (search) tables in sync 2021-05-06 23:24:43 +02:00
041d3598eb
Allow using Wikidata links for fetching information for actors 2021-05-05 01:26:32 +02:00
6dcdb3aff6
Add function for assembling display names by given name and family name 2021-05-04 23:57:30 +02:00
bde3c2cb9e
Add a class NodaNameSplitterTest, for now splitting names into given
name and family name
2021-05-04 23:04:41 +02:00
1dd05a3822
Add blacklist for unwanted tag names
Close #4
2021-04-25 00:16:53 +02:00
9157e8a0f1
Add fix for empty noda references in fetching tags from Wikidata 2021-04-24 01:13:27 +02:00
e1a9a99797
Use ++$i over $i++ outside of loops in Wikidata fetcher
This is a slightly more performant way of incrementing an integer.
2021-04-12 12:54:07 +02:00
792754c20c
Fetch orcid IDs in wikidata fetcher 2021-04-07 11:33:49 +02:00
e957db4210
Add condition to split times like "xxxx bis yyyy" 2021-03-26 12:32:27 +01:00
c964053c91
Add function for reading Wikidata ID from a Wikipedia page 2021-03-18 01:23:45 +01:00
1fd87c7e6d
Simplify NodaWikidataFetcher, unify list of langs, simplify linking to noda sources
Close #2
2021-03-17 22:06:08 +01:00
f0b5a08cdf
Move NodaWikidataFetcher to this repository 2021-03-17 16:11:06 +01:00
1fe795d219
Use mysqli->autocommit(false) to speed up autotranslating 2021-03-08 21:23:38 +01:00
668477f199
Add missing check in NodaTimeAutotranslater 2021-01-31 19:39:09 +01:00
7ccdfd4659
Fix function comment in NodaTimeSplitter 2021-01-31 01:50:25 +01:00
aca4f86da5
Add "Neu" und "Neu hergestellt" to list of disallowed time entries 2021-01-29 20:03:06 +01:00
a761a9dfd7
Stop time splitter for start / end, if common time splitter can be used 2021-01-07 11:43:20 +01:00
c02165df7b
Add exception catching in splitting times / dates 2021-01-06 23:11:05 +01:00
54764e741a
Add option to split and translate times with start and end dates
Close #1
2021-01-06 23:05:26 +01:00
fcc63c4ea0
Merge branch 'master' of gitea:museum-digital/MDNodaHelpers 2021-01-06 16:07:46 +01:00
a6030e4a5f
Fix bug in month names similar in English and German 2021-01-06 16:07:21 +01:00
7ef09db72c
Add static function in NodaIDGetter to get tag ID by import log 2021-01-04 23:06:36 +01:00
9f67d253da
Add functions to get actor and place IDs by import logs 2021-01-04 22:50:26 +01:00
8f612dede1
Read 1917-ig. as similar to 1917-ig in time splitter 2020-12-28 14:40:04 +01:00
6e910cd676
Add English month names for splitting time terms 2020-12-22 12:22:14 +01:00
8ac22165fc
Add "ohne angabe" to list of disallowed terms 2020-12-21 15:32:16 +01:00
d8e44550fc
Add "Ohne Datum" to list of disallowed time terms 2020-12-21 15:15:00 +01:00
af454ec013
Setup ID getter by rewrite for tags to return arrays
Tag rewrites can now be set for multiple target tags.
2020-12-20 23:24:08 +01:00
fce933c12a
Extend list of disallowed noda terms 2020-12-20 15:40:30 +01:00
b27f0ec918
Add "Keine Angaben" to list of disallowed inputs for places 2020-12-19 02:37:38 +01:00
a070970554
Remove empty newlines in class defs 2020-12-19 02:36:38 +01:00
ca13f36c0d
Add function to get tag IDs by their translated names 2020-12-07 13:43:07 +01:00
50ff1a2339
Add script to get highest related tag 2020-10-30 16:30:23 +01:00
0ea9c31845
Explicitly use global namespace in function calls 2020-10-23 17:03:51 +02:00
14e82826ae Fix bug in getting place IDs by noda links 2020-10-05 12:05:48 +02:00
99aa1d74ad Improve / make more explicit: type safety 2020-10-04 23:59:40 +02:00
97566ea2d9 Split more time variations 2020-10-04 23:57:59 +02:00
8a4a8f7ed8 Split more variations of dots in dates, century ranges 2020-10-04 23:20:58 +02:00
d0fe1e89ed Improve trimming inputs when cleaning certainty indicators 2020-10-04 22:52:15 +02:00
1f4d692fb5 Enable automatic translations of times "before" a given date 2020-10-04 19:34:17 +02:00
1685d78f65 Allow splitting times "before <X>" 2020-10-04 19:27:23 +02:00
a0037c9883 Allow splitting times after <year><month> 2020-10-04 19:17:18 +02:00
be46c39efd Fix wrong assumption on handling counting times when autotranslating
"after <month>"
2020-10-04 18:36:03 +02:00
c9a1a74bce Enable autotranslating of times 'after' a certain date 2020-10-04 18:21:33 +02:00
5e90e5d3f2 Add strings for expressing times 'after' and 'before' 2020-10-04 17:40:51 +02:00
2a57537436 Allow splitting times "Nach 1905" ("Nach " followed by 4 digit time
number)
2020-10-04 17:39:34 +02:00
36d27e0f73 Remove / disallow certain input names in NodaUncertaintyHelper 2020-10-04 02:40:21 +02:00
4e934e380c Use [0-9]{4} spelling time 2020-10-03 19:13:27 +02:00
ff35ca7bd9 Enable time splitter to deal with some roman numbers 2020-10-03 16:10:43 +02:00
80cd88222d Enable time splitter to recognize sz as abbr. for század 2020-10-03 15:59:49 +02:00
3664bcf3f6 Add getting places by noda links to NodaIDGetter 2020-10-01 12:53:27 +02:00
67cc76cff9 Allow splitting of German short decade names: 20er or 1920er 2020-09-27 17:12:34 +02:00
91f435a2e4 Enable parsing of months: 2020-01 2020-09-27 17:10:17 +02:00
de7968fbbd Only allow splitting by international format if month < 13 2020-09-27 12:38:38 +02:00
48f3bd2c3f Allow splitting international dates (2020-12-20) 2020-09-27 12:36:34 +02:00
830b37f547 Improve autotranslating of times before 1.1.1000 2020-09-26 16:10:26 +02:00
c9d8d4bdbd Allow automatic translations of days before 1000 CE 2020-09-26 16:02:18 +02:00
b405855fc2 Disallow translating as decade before 1000 CE 2020-09-26 15:30:30 +02:00
8eda7d4c7f Improve type-safety 2020-09-26 15:21:32 +02:00
2b8b5d5743 Add check for improved type safety 2020-09-26 15:19:04 +02:00
3058f25a1c Add tests for German dates, enable splitting of 5 digit timespans 2020-09-26 15:10:06 +02:00
14b0d8037d Add tests for splitting Hungarian dates 2020-09-26 14:15:15 +02:00
d56d47aee1 Add test for NodaTimeAutotranslater, allow parsing days and months BC 2020-09-26 13:20:22 +02:00
cb2eff61a3 Use local representation of DB connection in NodaTimeAutotranslater 2020-09-26 12:21:00 +02:00
7a1dcbb14f Fix 2020-09-26 10:23:37 +02:00
1cfbfe7743 Improve type-safety 2020-09-25 22:29:56 +02:00
68d07c03d8 Allow splitting timespans BC.E. 2020-09-25 09:00:54 +02:00
7bc5bdf335 Allow parsing of "1910-1925." 2020-09-24 17:59:15 +02:00
9f39437c6e Improve splitting and translating of times BC 2020-09-24 17:35:40 +02:00
785b1c5156 Allow parsing of single-digits century spans also in the form of 1-3.
század
2020-09-24 15:49:42 +02:00
1668495573 Add missing abbreviations for Hungarian months, parse -tól 2020-09-24 15:45:50 +02:00
ddaa31646c Enable splitting of es évek years 2020-09-24 13:24:48 +02:00
36d8257ca0 Fix problem in last-syllable-depending time suffixes in Hungarian (as
évek vs. es évek)
2020-09-24 11:54:01 +02:00
d7e2c7f4ed Add automatic translation of decade names 2020-09-24 11:47:54 +02:00
a4a94a8f8a Allow autotranslation of time spans before 1000 CE 2020-09-23 17:03:12 +02:00
0f6a6ebc84 Add automatic splitting and translation of centuries (CE) 2020-09-23 10:28:04 +02:00
308e11b4f8 Add automatic translation of times since and until another time 2020-09-22 22:46:48 +02:00
4f1e65934a Enable NodaTimeSplitter to split dates with uncertain end or start
(seit, bis)
2020-09-22 17:58:26 +02:00
707f781f1e Fix attempt to parse 5-digit times as time German dates 2020-09-22 11:23:21 +02:00
b8dbfb32df Add körül as a known time uncertainty indicator (suffix) 2020-09-22 11:03:39 +02:00
dd2fbafd25 Improve type-safety / explicitness 2020-09-21 10:49:34 +02:00
e4558ae227 Add trim to place names for checking uncertainty also at start of
checker
2020-09-21 02:11:58 +02:00
e53eec84e6 Add functions cleaning of uncertainty indicators to
NodaUncertaintyHelper
2020-09-21 01:57:21 +02:00
923505f146 Add NodaUncertaintyHelper for guessing uncertainty of noda entries 2020-09-21 01:24:07 +02:00
7bbd50a586 Add class NodaIDGetter for collecting functions to identify noda
entities by available attributes
2020-09-20 22:40:40 +02:00
8f7df866d7 Fix bug in splitter causing wrong positives 2020-09-20 18:50:26 +02:00
ce6e388866 Fix 2020-09-20 18:32:48 +02:00
d1c9e6e15f Fix missing output value in some cases of time splitting 2020-09-20 18:04:41 +02:00
f268ab412c Add capability to parse dates like "2300-800 v. Chr." (German) and "Kr.
e. 1200" (Hungarian)
2020-09-20 17:40:45 +02:00
8da158aa77 Enable splitting of timespans of three-digit year names BC (in German) 2020-09-20 17:33:56 +02:00
c0047a5956 Enable splitting of 4-digit times BC (in German) 2020-09-20 17:32:20 +02:00
974fe39cde Allow translations of times before 1000 CE 2020-09-20 17:13:42 +02:00
fd0bd48995 Strip away variations of n. Chr. from time strings for splitting 2020-09-20 15:52:02 +02:00
affe8e3741 Teach time splitter to handle multi-year time spans in CE 2020-09-20 15:42:59 +02:00
c298794a32 Improve type-safety 2020-09-18 21:38:49 +02:00
4aa2a5df2f Add automatic translation of month names for main time names in splitter 2020-09-18 21:24:21 +02:00
130140e910 Remove "közott" for timespans 2020-09-18 19:17:10 +02:00
f05938c867 Initial 2020-09-18 18:48:40 +02:00