Many people in the artificial intelligence world like to use the term “open” and boast that their models are open source. Recently, there’s been much criticism of such statements, but the fact is that the definition of open source AI models hasn’t been clear at all. The Open Source Initiative (OSI) wants to change this situation.
What is the OSI? It’s a non-profit organization dedicated to the promotion of the open source model. Founded in 1998, it’s responsible for adopting the Open Source Definition for open-source software. Therefore, the OSI indicates the conditions and requirements a software development must meet to be open source.
Meta, in the eye of the storm. The debate about this definition and the theoretically unjustified use of the term “open source” intensified in recent years, mainly because of Meta. The company kept presenting Llama 2 as an open source model. Although it toned down the discourse a bit with Llama 3, it’s easy to see that Meta is abusing the concept. It’s not the only company that does this, of course. OpenAI, a leader in the field with ChatGPT, uses the word “open” in its name when its model and policies are among the most closed in the industry.
Ambiguity and confusion. Although Llama 3 is freely available, it doesn’t fit the traditional definition of open source because it imposes certain restrictions on its license depending on the size of the project or the type of content. Flux, an AI model for image generation that is gaining popularity, also raises this issue. Although these projects and others often take advantage of this open-ended discourse, the confusion for users is evident because no models fit the classic definition of open source.
The solution is in sight. According to Ars Technica, the OSI has formed a team of about 70 experts—researchers, lawyers, activists, and regulators—to create a definition of open source AI models. This group also includes representatives from Meta, Google, and Amazon. They already have a draft (version 0.0.9) of this concept.
Beyond “open weights.” Often, models that boast of being open source share “weights” that provide information about how they perform their process. The OSI points out that its draft includes the AI model, its weights, and the entire system and its components. This would require full transparency about the data used to train the model, which none of the major models offer, as well as the source code, the weights, and parameters.
Not the data but the metadata. In this quest for transparency, the OSI draft clarifies that publishing the “raw” training data is unnecessary. Instead, it requires metadata about the training data set and training methods: data sources, selection criteria, preprocessing techniques, and other details allowing other people or groups to recreate systems similarly. This point is important because the definition doesn’t require the creators of these models to publish the training data.
Final definition in October. The OSI expects the final definition of “open source AI” to be completed by October in time for the All Things Open 2024 conference. In addition, it encourages anyone to contribute to the final definition—the OSI created a public discussion forum so users can debate the concept.
What impact will the definition have? A formal and accepted definition of “open source AI” may be significant for developing future models, both by companies and individuals or groups working independently. Meeting the requirements of such a definition will help ensure those requirements are met, which will undoubtedly boost open source models. This has been the case with open source software for four decades.
Image | Xataka On
Related | Meta Takes a Strong Step Forward in the AI Race, Unveiling Its SAM 2 Video Editor
View 0 comments