In 3.6% of tested Python repositories identified errors associated with missed commas

Published Results of the study of the exposure code in Python language errors associated with the incorrect use of commas in the code. Problems are caused by the fact that the Python lists automatically combines the lines in the list, if they are not separated by the comma, and also processes the value as a tuple, if the comma should after the value. Having conducted an automated analysis of 666 GitHub repository with the code in Python, the researchers revealed possible problems with commas in 5% of the projects studied.

Further manual check showed that real errors are present only in 24 repositories (3.6%), and the remaining 1.4% are false responses (for example, the comma could be specifically skipped between strings for combining broken into several lines of file pathways, long hashes, HTML blocks or SQL expressions). It is noteworthy that large projects such as TensorFlow, Google V8, Sentry, Pydata Xarray, RapidPro, Django-Colorfield and Django-Helpdesk were among the 24 repositories with real errors. At the same time, problems with commas are not specific to Python and often pop up in projects on C / C ++ (Sample recent corrections – LLVM , mono , TensorFlow ).

Main types of errors:

  • Randomly missed comma in lists, tuples and sets, leading to rows to combine instead of their interpretation as separate values. For example, in the Sentry in one of the tests there was skipped comma between the strings “Releases” and “Discover” listed led to checking the non-existent handler “/ Releasesdiscover”, instead of separate check “/ Releases” and “/discover”..dcenter>

    Another example – missed comma in RapidPro led to unite the spirit of different rules in line 572:


/Media reports.