Technical News

AI models need more standards and tests, say the researchers

While the use of artificial – benign and adversary intelligence – increases at dizzying speed, more cases of potentially harmful responses are discovered.

Pixdeluxe | E + | Getty images

While the use of artificial – benign and adversary intelligence – increases at dizzying speed, more cases of potentially harmful responses are discovered. These include hate speeches, copyright violations or sexual content.

The emergence of these undesirable behaviors is aggravated by a lack of regulations and insufficient tests of AI models, researchers told CNBC.

Being with automatic learning models to the way it was intended to do is also a major challenge, said Javier Rando, AI researcher.

“The answer, after almost 15 years of research, is, no, we do not know how to do this, and it does not seem that we were better,” said Rando, which focuses on contradictory automatic learning, at CNBC.

However, there are ways to assess risks in AI, such as the red team. The practice implies that individuals test and probe artificial intelligence systems to discover and identify any potential damage – a common opera modus in cybersecurity circles.

Shayne Longpre, AI and Policy researcher and responsible for the initiative for the source of data, noted that insufficient people working in red teams.

While AI startups now use first parts or two parties contracted to test their models, the opening of tests to third parties such as normal users, journalists, researchers and ethical pirates would lead to a more robust assessment, according to a article published by LongPre and researchers.

“Some of the defects in the systems that people found obliged lawyers required, doctors to actually verify, real scientists who are specialized experts to determine whether it was a defect or not, because the ordinary person probably could not or would not have sufficient expertise,” said LongPre.

The adoption of standardized reports of “IA defects”, incentives and means to disseminate information on these “defects” in AI systems is some of the recommendations presented in the document.

This practice having been successfully adopted in other sectors such as software security, “we need it in AI now,” added LongPre.

Ban this user -centered practice with governance, policy and other tools would guarantee a better understanding of the risks posed by AI tools and users, Handey said.

Plus a moonshot

Project Moonshot is one of these approaches, combining technical solutions with policy mechanisms. Launched by the infocom media development authority in Singapore, Project Monshot is a model for model assessment of large language developed with industry players such as IBM and Datarobot based in Boston.

The toolbox incorporates benchmarking, the red team and the reference tests. There is also an evaluation mechanism that allows AI startups to ensure that their models can be reliable and not to harm the users, the data engineering manager for data and the IBM Asia Pacific told CNBC.

Evaluation is a continuous process that should be done both before and after the deployment of models, said Kumar, who noted that the response to the toolbox was mixed.

“Many startups have taken this as a platform because it was open source, and they started to take advantage. But I think, you know, we can do much more.”

In the future, Project Monshot aims to include personalization for specific use cases of industry and to allow a multilingual and multicultural red team.

Higher standards

Pierre Alquier, professor of statistics at ESSEC Business School, Asia-Pacific, said that technological companies are currently rushing to publish their latest AI models without appropriate assessment.

“When a pharmaceutical company designs a new drug, it needs very serious tests and evidence that it is useful and not harmful before being approved by the government,” he noted, adding that a similar process is in place in the aviation sector.

AI models must fulfill a strict set of conditions before being approved, added Alquier. A distance from the wide AI tools to developments that are designed for more specific tasks would facilitate anticipation and control of their abusive use, said Alquier.

“LLM can do too much, but they are not intended for tasks that are sufficiently specific,” he said. Consequently, “the number of possible abuses is too large for the developers to be able to anticipate them”.

Such wide models make it possible to define what matters as difficult and secure, according to a search in which hike was involved.

Technological companies should therefore avoid the overexraction that “their defenses are better than they are,” said Rando.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button