Skip to content

Prepare for OSI compliance #247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: gh-pages
Choose a base branch
from
Draft

Prepare for OSI compliance #247

wants to merge 2 commits into from

Conversation

ekaf
Copy link
Member

@ekaf ekaf commented Jun 27, 2025

This PR is intended to address Issue #102 by documenting a possible way to split nltk_data into OSI (Open Source Initiative)-compliant and nonfree parts.

Why use the OSI rather than the FSF definition of free?

The overwhelming majority of major software and data distributors (Linux distros, conda-forge, Homebrew, etc.) use the OSI definition as their primary standard. The FSF definition is important for the free software movement and documentation/content (e.g., GNU, Wikimedia), but is not the baseline for most mainstream software/data distribution channels.

Two markdown files are introduced:

  • free_packages_osi.md: Packages with OSI-approved, public domain, or similarly permissive licenses.
  • nonfree_packages_osi.md: Packages with more restrictive, ambiguous, or otherwise non-OSI-compliant licenses.

Every effort has been made to classify each package based on available license information, but feedback and corrections are very welcome—especially for any unclear or disputed cases.

Discussion is welcome and encouraged! If you spot anything that should be reviewed or improved, please join the conversation.

@ekaf ekaf marked this pull request as draft June 29, 2025 08:43
@ekaf
Copy link
Member Author

ekaf commented Jun 29, 2025

The proposed list of free licences should probably be wider than just the OSI-approved software licenses.

Here's why:

  • OSI focuses on Software: The OSI defines "open source" specifically for software.
  • Data has other "free" licenses: Many licenses are equally permissive and FOSS-compatible for data, content, or standards, even if not OSI-approved. Examples include:
    • Public Domain (e.g., CC0)
    • Permissive Creative Commons (e.g., CC BY)
    • Specific standards licenses (e.g., Unicode Terms of Use, IETF Trust License, W3C Document License)
      These licenses grant essential freedoms (use, modify, redistribute, including commercially) for data.

Crucially, this broader definition of "free" still firmly excludes:

  • Non-Commercial (NC) or No Derivatives (ND) licenses.
  • "Academic Use Only" or "Research Use Only" restrictions.
  • Ambiguous or truly "unknown" licenses (like Punkt's).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant