- DATA: "unknown" allowed for Language speaker counts where cannonically there is no data (note: by contrast, omitting the
speakerattribute communicates the data is incomplete in that regard) - DATA: No commas in language names/preferred_names
- DATA: Nepali and Zaza added as macrolanguages
- DATA: Removed stray Latin chars in Japanese Hiragana orthography
- TWEAK: Upgraded to
pyproject.tomlconfig - TWEAK:
Checker._get_checks_for_orthographyredefined as class method, to allow for easier inspecting of what checks are opted in an orthgraphy - TWEAK:
Checknow haveprecheckmethod to initialize and indicate pass early on
- LICENSE: Relicensed under Apache License 2.0
- FIX:
Shaper.check_joiningrefined to not be more lenient and not fail fonts with other than one-to-one positional substitutions or general sequence manipulatingccmpcode - FIX Using a
--statusthat does not include 'living' no correctly omits these languages - TWEAK: Better output with
-v: Consequently log skipped languages and orthographies plus the skipping the reason - FEATURE: Added
-t/--shaping-thresholdthat allows fine-tuning conjunct check failures by accounting for conjunct frequency - FEATURE: Added
--no-shapingflag to disable shaping checks entirely (on by default) - DATA: Modified multitudes of
design_requirements, droppeddesign_alternates - DATA:
jpnLatin orthography marked as secondary - DATA: Introduced
combinationsorthography attribute - DATA:
hinandmaiinclude syllablecombinationswith frequencies distributions (1: most common, 0: least common) - TWEAK:
design_requirmentscan now be either a string, or a dict ofnote+alternates(detailing which characters are affected) - TWEAK: Minor tweak to logging in
Orthography - TWEAK:
Checker._check_shapingwith better pre-check to skip mark attachment checks for glyphs not in the font (obvious) - TWEAK: Cleaned up multiple CLI options:
- Added
--checkoption to replace--supportlevel.--checktakes any value ofbase,auxiliary,punctuation,numerals,currency,allor a comma-separated list of those - Removed
--include-historicaland--include-historicalin favor or--statuswhich accepts any combination ofLanguageStatusor "all", defaults to "living" - Removed
--include-all-orthographiesin favor of--orthographywhich accepts any combination ofOrthographyStatusor "all", defaults to "primary"
- Added
- TWEAK: Better logging output and logging strategies in the CLI/modules,
-vprovides basic language in/out and config logs,-vvgives very detailed support logs - TWEAK: Removed the deprecated
STATUSES,ORTHOGRAPHY_STATUSESandSUPPORTLEVELSfrom the codebase - TWEAK: Dropped python 3.8 & 3.9 from supported environments, added 3.13, 3.14
- DATA: Fixed to
cbi(thanks @moyogo) - TWEAK: Fixed
hyperglot-exportcommand for dumping expanded database
- DATA: Minor refinements to
fin,cesandnav - DATA: Design requirements updated for
bosandsrbas well as some Cyrillic breve mentiones - TWEAK: Improved inherited type where the original value is a yaml list
- TWEAK: Added parameter to instantiate a
Languageand force a reload of the data / ignoring the.hyperglot-cache
- FIX: Fixed
hyperglot-dataerror
- DATA: All language yaml documents now have their
contributorslisted, some havereviewerslisted - DATA: *Massive improvement of language
sourceswith proper source citations where possible - DATA: Added
punctuation,numeralsandcurrencyattributes to orthographies - checking for these attributes will be added in the next update! - DATA: Added
lib/hyperglot/extra_data/default.yamlto include inheritable defaults per script - DATA: Refined
jpn,ryuandainKatakana orthographies - FEATURE: Orthography attributes can inherit from other languages with
<iso>syntax, see README - TWEAK: Improved loading time for repeat access by saving parsed language data cache file
- TWEAK: Orthographies can no longer have an
inheritattribute - TWEAK: Improved loading speed for repeat queries and indivudal language queries
- TWEAK: Refactored
Languages,LanguageandOrthographyobject instantiation to always return parsed and defaulted nested objects - TWEAK: Removed the
--speakersand--autonymCLI options - TWEAK: Removed the
--comparisonCLI option (seeexamplesinstead) - TWEAK: Removed the
--languagesCLI option, usehyperglot-info LanguageName/ISOinstead - TWEAK: Removed the
--strict_isoCLI option; use the python library to access this option, particularlyLanguage.get_name(script, strict_iso=True)
- FIX: Fixed an issue where trying to log missing shaping glyphs would crash in
FontChecker - FIX: Improved mark shaping detection to interpret ccmp substitutions of base + mark as correctly shaping (thanks @arialcrime)
- TWEAK: Cleaned up
hyperglot.language.Languageclass and added attribute properties for dict properties with computed defaults (as opposed to writing defaults for missing attributes) as well as more code annotation - TWEAK:
hyperglot.orthography.Orthographyobject hasscript_isoattribute returning the mapped ISO 15924 script tag - DATA: Added
lib/extra_data/script-names.yamlwith a list of all current Hyperglot scripts and a mapping to their ISO 15924 code equivalent - DATA: Added di/tri-graphs to Czech and Hungarian orthographies and fixed their order
- DATA: Added Squamish (
squ) (thanks @justinpenner) - DATA: Unified "Geʽez" script with reversed comma, as opposed to previous mixed use of "Ge'ez/Fidel" and "Ge'ez"
- DATA: Amended spelling "Tai Viet" script in title case to match other script names
- DATA: Corrected spelling of "Bamum" script and language (instead of less used "Bamun" used in Hyperglot)
- DATA: Use "Coptic" instead of "Coptic/Numbian" script name
- DATA: Use "Burmese" script for language "Mon"
- DATA: Use "Baybayin" script name instead of "Tagalog (Baybayin, Alibata)"
- DATA: Fixed Toki Pona (
tok) file name - TWEAK: Make sure
Orthography.base_charsandOrthography.aux_charsreturn no duplicates for decomposed character sequences - TWEAK: Define
Languages,LanguageandOrthographyas module top level exports for easier importing, e.g. now:from hyperglot import Language
- FIX: Set correct default values for
Language.statusandOrthography.preferred_as_groupand provide validation and tests for these. - TWEAK: Deprecated plain list
SUPPORTLEVELS, VALIDITYLEVELS, STATUSES, ORTHOGRAPHY_STATUSESand replaced them withSupportLevel, LanguageValidity, LanguageStatus, OrthographyStatusenums throughout the code base. The deprecated values will be removed in the next minor version. - TESTS: Added simple tox config for running test on all supported minor python versions
- FIX: Fixed type hinting issue causing failure on python 3.8.x
- DATA: Added Banjar (
bjn) (thanks @mahalisyarifuddin) - DATA: Expanded Xavánte (
xav) data (thanks @moyogo)
- DATA: Refined Romanian by adding
design_alternatesexplicitly
- DATA: Refined Klingon (
tlh) orthography and added a draft version of Toki Pona (tok) - FEATURE: Implemented shaping checks for mark positioning when required by unencoded base + mark combinations or
--decompose - FEATURE: Implemented shaping checks for connecting scripts to detect presence of required positional forms
- FEATURE: Implemented
hyperglot-reportcommand with same options ashyperglotand additional--report-missing n,--report-marks nand--report-joining n— or--report-all nto toggle all aforementioned — parameters/flags for outputting languages almost supported by the font - TWEAK: Support checking is now done via
hyperglot.checkerobjects for cleaner separation between language data and checking fonts - TWEAK: Various python APIs and objects changed and refactored
- TWEAK: Bumped required python version to 3.8.0
- DATA : Added Tlingit
tlilanguage data (thanks @jcrippen) - DATA: Fixed inconsistent note about
Ŋin various languages (thanks @moyogo) - TWEAK: Improved
hyperglot-validateto spot lookalike characters in the wrong script, e.g.a(Latin U+0061) vsа(Cyrillic U+0430) - TWEAK: Explicitly ignore non-yaml files (e.g. operating system or other) in the data when parsing
- TWEAK: Improved
hyperglot-validatecommand to better catch yaml issues (thanks for reporting @jcrippen)
- DATA: Removed orthography status
deprecatedand usinghistoricalfor those instances - DATA: Added Ethiopic languages
awn,byn,gez,har,sgw,tig,xanand updatedtir(thanks @dyacob and @NeilSureshPatel) - DATA: Added Avestan
- DATA: Corrections to
jbo(thanks @berrymot) - DATA: Updated
scoprimary orthography (thanks @moyogo) - DATA: Some fixes to
kkjorthography (thanks @moyogo) - DATA: Small note added to
Dagbani(thanks @clauseggers and @moyogo) - DATA: Fix to Shan (
shn) containing some stray Latin characters - FIX: Fix issue with file name conflicts on Windows systems
- FIX: Fix pypi missing data files
- FEATURE: Added
-l/--languageflag to show supported/not supported glyphs of a font for specific languages - DATA: Restructured
hyperglot.yamlinto individual files for each language inhyperglot/data/xxx/xxx.yaml - DATA: Fix two auxiliary glyphs in Georgian which where swapped uppercase / lowercase by mistake
- DATA: Small charset fixes to Kom
bkmand Southern Samosbd(thanks @moyogo) - DATA: Small tweak to Afrikaans
afr(thanks @iandoug)
- DATA: Added languages and scripts for: Ainu, Akkadian, Ancient Egyptian, Mycenaean Greek, Linear A, Linear B, Minoan, Pontic Greek, Okinawan, Sumerian, Klingon, Minaen, Hadramautic, Qatabanian and Sabaean (big thanks to @gusbemacbe !)
- DATA: Added Kayah autonym
- DATA: Added design requirement note for
Ŋ - DATA: Improved Georgian, added Mtavruli and auxiliary
- DATA: Added historical orthographies for German and English that use
ſ
- FIX: Fixed orthography of Thai to not require
◌̍ ◌̎in base checks
- FIX: Fixed missing script attribute in 'lee' orthography
- FIX: Fixed typo in 'Oriya' script name
- FEATURE: Implemented
hyperglot-dataCLI command to search and display language information returned by Hyperglot - FEATURE: Implemented more convenient language access via attributes on hyperglot.languages.Languages, e.g. Languages().eng to access a hyperglot.language.Language object for "eng"
- DATA: Fix in Standard Malay encoding of
'(thanks M. Mahali Syarifuddin and Caleb Maclennan) - DATA: Added numerous Burkina Faso and other African languages (another huge thanks to @moyogo !)
- DATA: Added Oriya
- DATA: Added Kartvelian languages (kat, sva, xmf, lzz) (thanks Ana)
- DATA: Dozens of African and North-American languages added and refined (thanks @moyogo !)
- DATA: Refined English
auxiliary
- DATA: Fix for Pinyin
- CLI: Introduced
--sort(alphabetic, default, orspeakers) and--sort-dir(asc, default, ordesc)
- DATA: Fix for Skolt Sami (soft sign)
- DATA: Fix for Hawaiian (okina)
- DATA: Fix for Thai including several missing marks and letters
- DATA: Fix in Buginese
- DATA: Updates to Indonesian and Standard Malay
- DATA: Fix for Turkish orthography
- DATA: Fix for Afrikaans orthography
- DATA: Corrected ISO code for Gen language
- DATA: Added Benin languages
- DATA: Small fix to Portuguese
- DATA: Revised Tamil orthography
- DATA: Added Apinayé, Karo and Awetí languages
- FIX: Fixed an encoding issue affecting Windows environments
- DATA: Fixed typos in Buginese
- DATA: Reviewed Minangkabau orthography
- DATA: Added Batak languages and refined Balinese
- FIX: Further improvement to detection of orthographies with unencoded base + mark combinations
- TWEAK: Refined the returned properties of
hyperglot.language.Orthographyto include base and auxiliary lists of encoded characters as well as required marks for - TOOLS: Added scraper for fetching a mapping of Opentype language systems to ISO codes and saving them in
other/languagesystems.yaml
- DATA: Renamed
design_notetodesign_requirementsand made its data structure a list - DATA: Introduced
design_alternates- a list of characters which may require special design in a font supporting an orthography - DATA: Added
design_alternatesfor several Cyrillic and Latin languages
- DATA: Corrected speaker count for Manipuri
- DATA: Updates to Andaandi and Old Nubian
- DATA: Minor formatting and duplicate fixes
- FIX: Fixed parsing issue that led for some languages to require marks in their support as if the
--marksflag was used - TWEAK:
hyperglot.language.Languageno longer prunes or parses any character lists, but this is instead done on running the support checks by instantiating aOrthographyobject and using it for checking, leaving the dict representation of the yaml data in theLanguageuntouched - FEATURE: Introduced
hyperglot.language.Orthographyabstraction for easier access of parse lists vs yaml raw character strings - TESTS: More refactored Languages, Language and new Orthography tests
- DATA: Changed the way
marksand decomposition are handled in the data entry and saving - DATA:
baseandauxiliarymay now contain unencoded base + mark character combinations without those getting decomposed on saving - DATA: Updated approximately 50-100 languages which previously had unencoded base + mark combinations not saved in their character sets, since those were not unicode characters - this update added and retains those unencoded combinations for more comprehensive listing of the orthographies
- DATA: Marks are now always placed on
◌in the data for easier readability - CLI: Default checking (without
-m) no longer requires implicit combining marks, meaning those which are retrieved from decomposing the characters - the default check will still require those marks, which are explicitly listed inmarksand are not the result of decomposing the characters - CLI: Introduced
-m/--marksas a flag to require all marks for a support level check - CLI: Changed
-m/--modeto-c/--comparison - TWEAK: Removed
hyperglot.parse.prune_superflous_marksas no longer needed - TWEAK: Introduced
hyperglot.parse.parse_marks - TWEAK: Removed
pruneandpruneRetainDecomposedflags fromLanguages()and changed default call toLanguages()to no longer prune or parse its dict contents - TWEAK: Only calls to
Language()now parse the orthography data (with defaultTruefor argumentparse) - TWEAK: Renamed methods
hyperglot.languages.get_support_from_charstosupportedandhyperglot.languages.has_supporttosupported - TWEAK: Added warnings and validation checks for multiple inheritance levels (e.g. A inherits from B inherits from C should instead be A inherits from C)
- Data: Updated Ter Sami orthography as inheriting from Kildin Sami
- Data: Fixes to Kildin Sami
- Data: Some fixes to Marshallese
- Data: Added Ottoman Turkish and a transliteration orthography for it
- Data: Added Hanunoo
- Data: Replaced Single right comma (and other variants) with Modifier letter apostrophe for some Sami languages
- Data:
- FIX: Fixed issue that caused to parse some fonts (#24)
- TWEAK: Allow inheriting an orthography without explicitly having a script present in the orthography, this will inherit the primary script orthography of the parent
- DATA: Updated language data for Nubian languages and Japanese
- DATA: Introduced
transliterationorthography status (started in 0.2.10)
- DATA: Updated language data for Minang (xrg), Tamil (tam), Cherokee (chr), Tagalog (tgl), Aja (ajg), Khmer (khm), Madurese (mad), Javanese (jav) and others
- FIX: Reverted hotfix from 0.2.9 and implemented validation to use iso yaml file only for editable package installs and emit warning
- FIX: Refined
--decomposeand fixed an issue where the decompose option ended up returning more stringent matches than teh default - FIX:
--outputoutput refactored to no longer expect the result to be structured by support levels - TWEAK: Refactored multiple file input result intersection and union
- TESTS: Better tests relating to deomposed output
- TESTS: Added tests for multiple file input intersection and union results
- HOTFIX: Prevent error message about missing file in CLI use
- FIX: Fixed inheritence when it chains, e.g. Algerian Arabic inheriting from Tunisian Arabic which inherits from Standard Arabic
- FIX: Fixed inheritence missing
marks,design_notesandnote - TWEAK: Make sure
marksare saved in ordered form, so saving does not arbitrarily alter the order - TESTS: Added tests for orthography inheritance
- DATA: Constrained speaker counts to integers only
- DATA: Fixed various speaker counts containing malformed data
- DATA: More design notes for Latin-script languages
- DATA: Khmer added as draft, Armenian, Buginese, Georgian, Burmese, Lao and Thai refined
- TWEAK: Implemented validation for speaker count data
- DATA: Various status updates, notes and reviewed orthographies
- DATA: Introduced
marksattribute containing all combinging marks needed for an orthography - FEATURE: Automatically extract and save
marksfrombasedata, plus retain any explicitly addedmarksin the data - TWEAK: For default
hyperglot-savecalls automatically run validation to flag any remaining issues - TWEAK: Flag legacy marks being used in charset data
- DATA: Introduced
design_noteparameter - DATA: Various language data updates and smaller fixes
- DATA: Several orthography fixes, thanks Denis Moyogo Jacquerye
- TWEAK: Changed orthography status names to
todo, draft, preliminary, verified - TWEAK: Improved
Language.get_orthographyto return better default picks and allow getting orthographies of specific script or status
- First
piprelease :)
- FEATURE: Implemented
--include-all-orthographiesto check all butdeprecatedorthographies and changed default behaviour to only listprimaryorthographies - TWEAK: Implemented treating orthographies with
preferred_as_groupas one for checks - TWEAK: Languages with multiple
primaryorthographies will match if one is supported - TWEAK:
Languagescan be initiated withpruneRetainDecomposedto keep any precomposed characters from the database when usingprune(which decomposes them to base + mark) - TWEAK: Improved tests for CLI and improved and fixed some parsing tests
- FIX: Marginal cases fixed where using
parse_charsand already parsed lists would merge a mark with a predeceding base glpyh and result in a erraneous list of base/aux characters - DATA: Added uppercase to bicameral scripts
- DATA: All languages now have a
primaryorthography - DATA: Introduced
preferred_as_grouporthography attribute - TESTS: Config to ignore other library's warnings
- TWEAK:
Languages()now takes avalidityargument to filter by validity ('weak' or better by default) - TWEAK:
parse_charsnow will put decomposition components on in the input list to the end of the list - TWEAK: Languages require an orthography that has status
primary
- DATA: Updated and added many scripts and languages and their speaker counts
- FEATURE: Added
--decomposedflag that determines if a font is required to have all glyphs of a language as code points, or if supporting all combining marks is sufficient - TWEAK: Renamed module and database to
hyperglot - TWEAK:
--strict-supportrefactored to--validitywith defaultweakto pick the level of required validity on the languages that should get matched - TWEAK: Saving and validating enforces removal of superflous mark characters that are getting implicitly extracted via glyph decomposition
- TWEAK: Detection automatically extracts all required mark glyphs for languages and the database has been pruned of any no longer required mark glyphs listed. Using the
hyperglot-savewill apply this pruning and save the database in its cleaned up state - TESTS: Added tests for the Language and Languages class
- TESTS: Added test for the CLI options running against actual font files
- DOCS: Overhauled and updated the README to all latest changes
- FIX: Refined character parsing to also include the encoded form of any decomposable glyphs
- FIX: Improved character set parsing from database properly decomposing any combining characters into their parts and checking against those
- TESTS: Added first pytest for above case
- FEATURE: Added
--strict-supportflag (default False) to explicitly trigger warning about languages with unconfirmed status. Since those languages have well researched charset information but just have not been confirmed by several expert sources we still want to include them in the count. Using--strict-supportexcludes (but lists separately) all those languages which we have not been able to confirm - TWEAK: Renamed
--strictflag to--strict-isoto be more discriptive - TWEAK: Database file linking, one more time... as per 0.1.10
- TWEAK: Added validation check to prevent non-space separators in character list data
- FEATURE: Implemented
fontlang-exportCLI script to export the rosetta.yaml with expanded inherits to a file, usage:$ fontlang-export thefile.yaml
- FIX: Refactored
setup.pyto include the databased file relative to the package
- FIX: "Inverted" the
preferred_as_individualoutcome, e.g. those languages should suppress any included languages from being listed and be listed as one language instead
- FIX: Made sure
preferred_as_individualin fact also removes the language that is being inheritted from the matches - TWEAK: Update
fontlang-saveand sorting to not include inheritted attributes - TWEAK: Updated and fixed validation for
statusattributes
- FEATURE: Implemented support for the
preferred_as_individualattribute on macro languages - FEATURE: Added
--strictflag to display language names and macrolanguages as per ISO data - TWEAK: Implemented orthography attribute
inheritto inherit another language's orthography for thatscript(if one exists) - FIX: Language names with countries in brackets no longer have their closing parenthesis cut off
- TWEAK: Updated
fontlang-validateto spec
- FIX: More robust relative file path loading for database file
- TWEAK:
-ooutput is now of same structure for single file input, and indexed by file name for several file input - TWEAK:
-ofilters the languages' orthographies to only supported ones - TWEAK: Added validation check to confirm orthographies have a 'script'
- TWEAK: Refactored validation script to
fontlang-validateCLI command - TWEAK: Languages without orthographies that are included in macrolanguages that do have orthographies silently inherit the macrolanguage's orthographies
- FEATURE: Added
fontlang-saveCLI command to re-save therosetta.yamlsorted alphabetically - FEATURE: Added
--include-historicaland--include-constructedflags to include those languages in results - FEATURE: Added
--versionand--verboseflags
- FEATURE: Added
-moption ('individual', 'union', 'intersection') to compute a support comparison of several passed in fonts - TWEAK: You can now pass in any number of font paths. By default one after the other is analyzed
- TWEAK: Make sure to print
preferred_nameif available
- TWEAK: Merged
fontlangwith Rosetta Language DB repo - TWEAK: Updated data structure in YAML and added
Languageclass for convenience
- FEATURE:
-oflag to specify an output yaml file path - FEATURE:
-nflag to display language names in native spelling (where available) - FEATURE:
-uflag to display language users (if available) - TWEAK: Updated
rosetta.yamllanguage database
- FEATURE:
-sflag with "base" or "aux" values to set support level to check - TWEAK: Support output sorted by scripts and language DB status (informs about not "done" langs being match)
- TWEAK: Added basic font validation for the passed in file path
- TWEAK: Fixed relative imports and cli usage for dev
- FIX: Language database typo fix
- FEATURE: MVP with basic
$ fontlang path/to/fontcommand