Programmers have created a code for decades for models of artificial intelligence (AI), and now AI is used to write code. In the study, in the June issue of IEEE TRANSACTIONS ON Software Engineering, the work of AI CHATGPT 3.5 from OpenAI in terms of functionality, complexity and security was evaluated.
The results show that the success of ChatGPT in writing a functional code varies from 0.66% to 89%, depending on the complexity of the task, programming language and other factors. Although in some cases, AI can create a code better than people, the analysis also reveals problems with the safety of the code generated by AI.
A study under the leadership of Utian Tanga, a lecturer of the University of Glasgow, showed that the generation of code -based code can increase productivity and automate the tasks of software development. However, it is important to understand the strengths and weaknesses of these models. The tanga team has tested the ability of Chatgpt to solve 728 tasks on the Leetcode platform in five programming languages: C, C ++, Java, JavaScript and Python.
The general success of Chatgpt in solving problems was high, especially for the tasks that existed until 2021. For example, for light, medium and complex tasks, success was about 89%, 71% and 40%, respectively. However, for tasks that appeared after 2021, the ability of ChatGPT to generate the correct code has decreased significantly: from 89% to 52% for easy tasks and from 40% to 0.66% for complex.
This is due to the fact that ChatGPT is trained on data until 2021 and did not encounter new tasks and solutions. He is deprived of critical thinking of a person and can solve only those problems that have already seen before.
In addition, ChatGPT is able to generate a code with lower cost and memory time compared to at least 50% of people on the same Leetcode tasks. Researchers also studied the ability of ChatGPT to correct their errors after receiving feedback from Leetcode. Of the 50 randomly selected scenarios, where ChatGPT initially generated incorrect code, he coped well with the correction of compilation errors, but did not always successfully correct logical errors.
It was also found that the generated ChatGPT code had vulnerabilities, such as the lack of verification on NULL, but many of them are easily eliminated. The most difficult code in the language c, followed by C ++ and Python, whose complexity is similar to the code written by a person.
Utian Tang notes that to improve the work of ChatGPT, developers must provide additional information and indicate potential vulnerabilities so that II can better understand the tasks and avoid errors.
Thus, despite the significant progress in the use of AI for code generation, human control and additions are important for creating safe and functional software.