TRENDING

We Still Have No Idea What Constitutes ‘Open Source AI,’ But That’s About to Change

The classic definition can’t be easily applied to AI models, but the Open Source Initiative (OSI) is working to clarify the concept.

No comments Twitter E-mail

August 28, 2024

Javier Pastor

Senior Writer

Many people in the artificial intelligence world like to use the term “open” and boast that their models are open source. Recently, there’s been much criticism of such statements, but the fact is that the definition of open source AI models hasn’t been clear at all. The Open Source Initiative (OSI) wants to change this situation.

What is the OSI? It’s a non-profit organization dedicated to the promotion of the open source model. Founded in 1998, it’s responsible for adopting the Open Source Definition for open-source software. Therefore, the OSI indicates the conditions and requirements a software development must meet to be open source.

Meta, in the eye of the storm. The debate about this definition and the theoretically unjustified use of the term “open source” intensified in recent years, mainly because of Meta. The company kept presenting Llama 2 as an open source model. Although it toned down the discourse a bit with Llama 3, it’s easy to see that Meta is abusing the concept. It’s not the only company that does this, of course. OpenAI, a leader in the field with ChatGPT, uses the word “open” in its name when its model and policies are among the most closed in the industry.

Ambiguity and confusion. Although Llama 3 is freely available, it doesn’t fit the traditional definition of open source because it imposes certain restrictions on its license depending on the size of the project or the type of content. Flux, an AI model for image generation that is gaining popularity, also raises this issue. Although these projects and others often take advantage of this open-ended discourse, the confusion for users is evident because no models fit the classic definition of open source.

The solution is in sight. According to Ars Technica, the OSI has formed a team of about 70 experts—researchers, lawyers, activists, and regulators—to create a definition of open source AI models. This group also includes representatives from Meta, Google, and Amazon. They already have a draft (version 0.0.9) of this concept.

Meta Has Unveiled the ‘World’s Largest and Most Capable’ AI Model, Surpassing OpenAI’s GPT-4o in Various Aspects

More from Xataka On

Meta Has Unveiled the ‘World’s Largest and Most Capable’ AI Model, Surpassing OpenAI’s GPT-4o in Various Aspects

Beyond “open weights.” Often, models that boast of being open source share “weights” that provide information about how they perform their process. The OSI points out that its draft includes the AI model, its weights, and the entire system and its components. This would require full transparency about the data used to train the model, which none of the major models offer, as well as the source code, the weights, and parameters.

Not the data but the metadata. In this quest for transparency, the OSI draft clarifies that publishing the “raw” training data is unnecessary. Instead, it requires metadata about the training data set and training methods: data sources, selection criteria, preprocessing techniques, and other details allowing other people or groups to recreate systems similarly. This point is important because the definition doesn’t require the creators of these models to publish the training data.

Final definition in October. The OSI expects the final definition of “open source AI” to be completed by October in time for the All Things Open 2024 conference. In addition, it encourages anyone to contribute to the final definition—the OSI created a public discussion forum so users can debate the concept.

What impact will the definition have? A formal and accepted definition of “open source AI” may be significant for developing future models, both by companies and individuals or groups working independently. Meeting the requirements of such a definition will help ensure those requirements are met, which will undoubtedly boost open source models. This has been the case with open source software for four decades.

Image | Xataka On

Related | Meta Takes a Strong Step Forward in the AI Race, Unveiling Its SAM 2 Video Editor

Topics

Comments closed

Popular Topics

Log in or sign up to comment on stories, and upvote comments.

Email: Password: Password must contain at least six characters. Repeat password:

Username: www.xatakaon.com#user/ Checking... Your username will become part of the address of your user page. Choose carefully because you won't be able to change it. Usernames must contain a minimum of 3 characters. Numbers can be used, though they can't be the initial character. No capital letters, spaces, accent marks, or special characters.

I have read and accept the privacy and participation policy .

Already have an account? Log in here

We will send you an email with a link to recover your password:

Correo electrónico asociado a tu cuenta de Twitter:

Nombre de usuario de Xataka On:

Si no lo recuerdas, puedes recuperar la contraseña con tu nombre de usuario de Xataka On.

If you do not remember it, you canrecover the password with the email associated with your Twitter account.

×

We use third-party cookies to generate audience statistics and display personalized advertising by analyzing your browsing habits. If you continue browsing, you will be accepting their use. More information