Skip to content

CLDR-18745 add validate_currencies.py #4866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 11, 2025

Conversation

preetsojitra2712
Copy link
Contributor

CLDR-18745

Adds validate_currencies.py under tools/scripts/llm/ that:

  1. Downloads the CLDR currencies.json for a given locale.
  2. Flattens each currency entry into lines of the form
    CODE: one='…', other='…', symbol='…', alt_narrow='…'.
  3. Splits the lines into small chunks to stay under model context limits.
  4. Runs two LLM-powered sanity checks per chunk:
    • Correctness of display names/symbols
    • Missing symbols detection
  5. Counts entries locally and emits a single JSON report.
  • This PR completes the ticket.

@preetsojitra2712
Copy link
Contributor Author

Hello @younies , please review this PR adding validate_currencies.py under tools/scripts/llm/, which implements LLM-powered sanity checks and JSON reporting for CLDR currency data.

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@younies younies self-requested a review July 10, 2025 08:26
cur = cdata["main"][locale]["numbers"]["currencies"]
out = []
for code, info in cur.items():
one = info.get("displayName-count-one", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good for english, but for other locales, there are more than one count.
it is better to fetch all "displayName-count-.*" in a list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this in a following PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! . I’ll update the flatten() function in the next PR to collect all displayName-count-* entries into a list instead of just “one” and “other.” I’ll also include an example sample JSON so you can verify it – stay tuned!

@younies younies merged commit 804a48f into unicode-org:main Jul 11, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants