Intel has opened CONTROLFLAG machine learning code to identify errors in code

Intel Opened Workings associated with Research project ControlFlag , aimed at creating a machine learning system to improve the quality of the code. The toolkit prepared by the project makes it possible on the basis of a model trained in a large amount of existing code, identify various errors and anomalies in the source texts written in high-level languages, such as C / C ++. The system is suitable for determining in the code of various types of problems, from the definition of typos and incorrect combination of types, until the NULL value is detected in the signs and problems when working with memory. ControlFlag code is written in C ++ and is open under the MIT license.

The system is self-learning by building a statistical model of an existing array code of open projects published in GitHub and similar public repositories. At the training stage, the system defines typical design patterns in the code and builds a syntax bonding tree between these templates, reflecting the code of execution of the code in the program. As a result, a reference decision-making tree is formed, uniting the experience of developing all analyzed source texts.

For the code being checked, a similar pattern of defining patterns that are checked with a reference decision tree decisions are performed. Large discrepancies with neighboring branches indicate the presence of anomaly in the template checked. The system also allows not only to identify an error in the template, but also to propose a correction. For example, in the OpenSSL code, the design “(S1 == NULL) ∧ (S2 == NULL)” was revealed, which was found in the syntactic tree only 8 times, while the nearest branch with the value “(S1 == NULL) || ( S2 == NULL) “I met about 7 thousand times. The system also identified an anomaly “(S1 == NULL) | (S2 == NULL)” which occurred in the tree 32 times.



/Media reports.