Headache from using mathematical software
Author: Svyatoslav Razmyslov
It so happened that during some period of time I was discussing on the Internet, one would think, different topics: free alternatives of Matlab for universities and students, and finding errors in algorithms with the help of static code analysis. All these discussions were brought together by the terrible quality of the code of modern programs. In particular, it is about quality of software for mathematicians and scientists. Immediately there arises the question of the credibility to the calculations and studies conducted with the help of such programs. We will try to reflect on this topic and look for the errors.
Introduction
I would like to start with the definition of a term “algorithm”. An algorithm is a set of instructions, which describes the order of actions that the executor must perform for achieving a certain result (Wikipedia). Thus, it is not necessary to distinguish the source code between the algorithms and the rest of the code. For example, sorting algorithms are no less a source code as opening a file, searching for a character in the string, etc. The code might contain an error and, luckily, many errors can be detected at an early stage, taking the advantage of static code analysis tools.
However, to search for the so-called “algorithmic” errors I’ve decided to analyze the code of several mathematical packages. In this code, there are a lot of functions in which some mathematical formulas are implemented. It turns out that there are people who even don’t regard such for the source code. And, accordingly, what kind of errors there can be.
To identify all code defects, presented in the article, we used PVS-Studio static analyzer version 6.15, working under Windows/Linux, for C/C++/C# programming languages.
Bugs from 3rd party
The story began with a search for errors in the project PointCloudLibrary (PCL, GitHub). Without having a goal to find a lot of bugs and write an article, I just looked through the report and found a very interesting bug:
V533 It is likely that a wrong variable is being incremented inside the ‘for’ operator. Consider reviewing ‘i’. sparsematrix.inl 212
template<class T>
SparseMatrix<T>& SparseMatrix<T>::operator *= (const T& V)
{
for( int i=0 ; i<rows ; i++ )
for( int ii=0 ; ii<rowSizes[i] ; i++ )
m_ppElements[i][ii].Value *= V;
return *this;
}
The overloaded operator “*=” implements the multiplication of all elements of the matrix to some value V. The author made a very serious mistake for this algorithm, because of which only the first column of the matrix is modified, and also the infinite loop with array overrun is possible.
This code has proved to be from the math library PoissonSurfaceReconstruction. I made sure that the bug is still present in the latest version of the code. One shudders to think how many projects include such library.
Here’s another strange piece of code:
V607 Ownerless expression ‘j < remains’. allocator.h 120
void rollBack(const AllocatorState& state){
....
if(state.index<index){
....
for(int j=0;j<remains;j++){
memory[index][j].~T();
new(&memory[index][j]) T();
}
index=state.index;
remains=state.remains;
}
else{
for(int j=0;j<state.remains;j<remains){ // <=
memory[index][j].~T();
new(&memory[index][j]) T();
}
remains=state.remains;
}
....
}
I suspect that this odd cycle is not performed often, since it still remains in the code. But someone surely had experienced strange lockups with abnormal termination of the program. Thus, some idea of the quality of the code is formed. Now let’s turn to the larger project — Scilab, where we will experience a real headache.
Scilab
About the project
Scilab is a package of applied mathematical programs, providing an open environment for engineering (technical) and scientific calculations. This environment is one of the commonly available alternatives to Matlab, which is widely used in different institutions and scientific research. Another popular alternative to Matlab is GNU Octave, and we have previously drawn attention to these projects:
- Checking Scilab (March 2014);
- Checking GNUOctave (August 2015).
Before writing a new article about Scilab I have read an old one and made just two conclusions:
- After 3 years, only a couple of places have not been fixed (“why fix undefined behavior, if it works?”- apparently thought the developers);
- In the project there appeared many new errors. I decided to put in the article just a couple of dozens, not to tire the reader.
Scilab sources contain project file for Visual Studio from the start, so it is possible to just open and examine it in one click, just as I did.
Beautiful typos
V530 The return value of function ‘back’ is required to be utilized. sci_mscanf.cpp 274
types::Function::ReturnValue sci_mscanf(....)
{
....
std::vector<types::InternalType*> pITTemp = std::vector<...>();
....
case types::InternalType::ScilabString :
{
....
pITTemp.pop_back(); // <=
pITTemp.push_back(pType);
}
break;
case types::InternalType::ScilabDouble :
{
....
pITTemp.back(); // <= ???
pITTemp.push_back(pType);
}
break;
....
}
It looks like code completion has played with the programmer a cruel joke. In the code of the function sci_mscanf one always removes the last element of the vector before adding a new one, but in one place the programmer made a mistake, calling the back() function instead pop_back (). Calling the back() function in that way makes no sense.
V595 The ‘Block.inptr’ pointer was utilized before it was verified against nullptr. Check lines: 478, 479. sci_model2blk.cpp 478
types::Function::ReturnValue sci_model2blk(....)
{
.... Block.inptr[i] = MALLOC(size);
if (Block.inptr == nullptr)
{
freeBlock(&Block);
Scierror(888, _("%s : Allocation error.\n"), name.data());
return types::Function::Error;
} memset(Block.inptr[i], 0x00, size);
....
}
This is a very interesting case of a typo, because of which the control over memory allocation stopped working. Most likely, the correct code should be like this:
Block.inptr[i] = MALLOC(size);
if (Block.inptr[i] == nullptr)
{
....
}
V595 The ‘pwstLines’ pointer was utilized before it was verified against nullptr. Check lines: 78, 79. mgetl.cpp 78
int mgetl(int iFileID, int iLineCount, wchar_t ***pwstLines)
{
*pwstLines = NULL;
....
*pwstLines = (wchar_t**)MALLOC(iLineCount * sizeof(wchar_t*));
if (pwstLines == NULL)
{
return -1;
}
....
}
Surprisingly a very similar error. The author didn’t manage to count the asterisks right, so in the condition the wrong pointer is being checked.
V595 The ‘array_size’ pointer was utilized before it was verified against nullptr. Check lines: 67, 68. diary_manager.cpp 67
wchar_t **getDiaryFilenames(int *array_size)
{
*array_size = 0;
if (SCIDIARY)
{
std::list<std::wstring> wstringFilenames = SCIDIARY->get....
*array_size = (int)wstringFilenames.size();
if (array_size > 0)
{
....
}
....
}
Stability is a sign of skill. The programmer again forgot to dereference the pointer, and because of that, it is not the size of some array, which is compared with zero, but the pointer to this variable.
V501 There are identical sub-expressions ‘strncmp(tx, “%pi”, 3) == 0’ to the left and to the right of the ‘||’ operator. stringtocomplex.c 276
static int ParseNumber(const char* tx)
{
....
else if (strlen(tx) >= 4 && (strncmp(tx, "%eps", 4) == 0
|| strncmp(tx, "+%pi", 4) == 0 || strncmp(tx, "-%pi", 4) == 0
|| strncmp(tx, "+Inf", 4) == 0 || strncmp(tx, "-Inf", 4) == 0
|| strncmp(tx, "+Nan", 4) == 0 || strncmp(tx, "-Nan", 4) == 0
|| strncmp(tx, "%nan", 4) == 0 || strncmp(tx, "%inf", 4) == 0
))
{
return 4;
}
else if (strlen(tx) >= 3
&& (strncmp(tx, "+%e", 3) == 0
|| strncmp(tx, "-%e", 3) == 0
|| strncmp(tx, "%pi", 3) == 0 // <=
|| strncmp(tx, "Nan", 3) == 0
|| strncmp(tx, "Inf", 3) == 0
|| strncmp(tx, "%pi", 3) == 0)) // <=
{
return 3;
}
....
}
This function contains some code to parse the numbers. Analyzer found the suspicious comparison with two identical strings “%pi”. Looking at the adjacent piece of code, we can assume that instead of the duplicated line, the string “-%pi” or “-Inf” could have been intended. Also it is not impossible that an unneeded extra line of code was simply copied by mistake, and, if so, it is better to delete it.
Operation precedence
V502 Perhaps the ‘?:’ operator works in a different way than it was expected. The ‘?:’ operator has a lower priority than the ‘==’ operator. sci_sparse.cpp 49
types::Function::ReturnValue sci_sparse(....)
{
bool isValid = true;
....
for (int i = 0 ; isValid && i < in.size() ; i++)
{
switch (in[i]->getType())
{
case types::InternalType::ScilabBool :
case types::InternalType::ScilabSparseBool :
{
isValid = (i == (in.size() > 1) ? 1 : 0);
}
....
}
The errors with the priorities of operations are very common in modern code (see the article “Logical Expressions in C/C++. Mistakes Made by Professionals”).
In the fragment of code above, there is a bug too, but due to the great luck, this code with a mistake works as expected by a developer. Only because of the fact that elements of the array with indexes 0 and 1 are involved in the comparison, and integral representations of truth and lie are also the values 0 and 1, this fragment of code still miraculously works correctly.
The code should be rewritten to correct the priority of operations:
isValid = (i == (in.size() > 1 ? 1 : 0));
V590 Consider inspecting the ‘iType != — 1 && iType == 8’ expression. The expression is excessive or contains a misprint. scilabview.cpp 175
void ScilabView::createObject(int iUID)
{
int iType = -1;
int *piType = &iType; getGraphicObjectProperty(....);
if (iType != -1 && iType == __GO_FIGURE__)
{
m_figureList[iUID] = -1;
setCurrentFigure(iUID);
}
....
}
In this fragment, there is a problem with the priority of operations, which is also covered in the previously mentioned article.
Conditional subexpression (iType! = -1) does not affect the result of the whole conditional expression. One can verify the error with the help of building the truth table for this example.
Here is another such example:
- V590 Consider inspecting the ‘iObjectType != — 1 && iObjectType == 5’ expression. The expression is excessive or contains a misprint. sci_unglue.c 90
Incorrect error messages
In a previous article about mistakes in Scilab there was also a large section about the errors while printing messages. On a fresh code there turned out to be quite a lot of errors of that type.
V517 The use of ‘if (A) {…} else if (A) {…}’ pattern was detected. There is a probability of logical error presence. Check lines: 159, 163. cdfbase.c 159
void cdf_error(char const* const fname, int status, double bound)
{
switch (status)
{
....
case 10:
if (strcmp(fname, "cdfchi") == 0) // <=
{
Scierror(999
_("%s: cumgam returned an error\n"), fname);
}
else if (strcmp(fname, "cdfchi") == 0) // <=
{
Scierror(999,
_("%s: gamma or inverse gamma routine failed\n"), fname);
}
break;
....
}
In Scilab there is a large set of cdf functions. In the presented code fragment, the interpretation of return codes from these functions is performed. And here’s the problem — some error warning is never displayed because of a typo in the name of the function. Searching for this message leads to the cdfgam function. I feel sorry for the users who have worked with this function and could not find out about some of the problems because of the typo of the authors of mathematical package.
V510 The ‘Scierror’ function is not expected to receive class-type variable as third actual argument. sci_winqueryreg.cpp 149
const std::string fname = "winqueryreg";types::Function::ReturnValue sci_winqueryreg(....)
{
....
if (rhs != 2 && rhs != 3)
{
Scierror(77, _("%s: Wrong number...\n"), fname.data(), 2, 3);
return types::Function::Error;
}
....
else
{
Scierror(999, _("%s: Cannot open Windows regist..."), fname);
return types::Function::Error;
}
....
}
When printing a string in one place, one forgot to call the method data().
V746 Type slicing. An exception should be caught by reference rather than by value. sci_scinotes.cpp 48
int sci_scinotes(char * fname, void* pvApiCtx)
{
....
try
{
callSciNotesW(NULL, 0);
}
catch (GiwsException::JniCallMethodException exception)
{
Scierror(999, "%s: %s\n", fname,
exception.getJavaDescription().c_str());
}
catch (GiwsException::JniException exception)
{
Scierror(999, "%s: %s\n", fname,
exception.whatStr().c_str());
}
....
}
The exception is caught by value. It means that using the copy constructor, a new object will be constructed and part of the exception information will be lost. The correct option is to catch exceptions by reference.
There were found several such places:
- V746 Type slicing. An exception should be caught by reference rather than by value. sci_builddoc.cpp 270
- V746 Type slicing. An exception should be caught by reference rather than by value. sci_closescinotesfromscilab.cpp 45
- V746 Type slicing. An exception should be caught by reference rather than by value. sci_closescinotesfromscilab.cpp 50
- V746 Type slicing. An exception should be caught by reference rather than by value. sci_scinotes.cpp 52
- V746 Type slicing. An exception should be caught by reference rather than by value. sci_scinotes.cpp 263
- V746 Type slicing. An exception should be caught by reference rather than by value. sci_scinotes.cpp 272
- V746 Type slicing. An exception should be caught by reference rather than by value. sci_scinotes.cpp 349
- V746 Type slicing. An exception should be caught by reference rather than by value. sci_scinotes.cpp 353
- V746 Type slicing. An exception should be caught by reference rather than by value. sci_scinotes.cpp 365
- V746 Type slicing. An exception should be caught by reference rather than by value. sci_scinotes.cpp 369
- V746 Type slicing. An exception should be caught by reference rather than by value. visitor_common.cpp 1743
- V746 Type slicing. An exception should be caught by reference rather than by value. overload.cpp 135
Strange code
This is a strange code, because it is not clear why to write this way and how to fix it.
V523 The ‘then’ statement is equivalent to the ‘else’ statement. data3d.cpp 51
void Data3D::getDataProperty(int property, void **_pvData)
{
if (property == UNKNOWN_DATA_PROPERTY)
{
*_pvData = NULL;
}
else
{
*_pvData = NULL;
}
}
This is such a simple function, which always resets the pointer.
V575 The ‘memset’ function processes ‘0’ elements. Inspect the third argument. win_mem_alloc.c 91
void *MyHeapAlloc(size_t dwSize, char *file, int line)
{
LPVOID NewPointer = NULL; if (dwSize > 0)
{
_try
{
NewPointer = malloc(dwSize);
NewPointer = memset (NewPointer, 0, dwSize);
}
_except (EXCEPTION_EXECUTE_HANDLER)
{
}
....
}
else
{
_try
{
NewPointer = malloc(dwSize);
NewPointer = memset (NewPointer, 0, dwSize);
}
_except (EXCEPTION_EXECUTE_HANDLER)
{
}
}
return NewPointer;
}
Regardless of the value of dwSize variable, there always runs the same code. So why duplicate it?
V695 Range intersections are possible within conditional expressions. Example: if (A < 5) { … } else if (A < 2) { … }. Check lines: 438, 442. sci_sorder.c 442
int sci_sorder(char *fname, void* pvApiCtx)
{
....
if (iRows * iCols > 0)
{
dblTol1 = pdblTol[0];
}
else if (iRows * iCols > 1)
{
dblTol2 = pdblTol[1];
}
....
}
The second condition is always false, because if EXPR > 0, checking EXPR > 1 no longer has any meaning. This code most likely contains some mistake.
Dereferencing null pointers and undefined behavior
V522 Dereferencing of the null pointer ‘dataz’ might take place. polylinedata_wrap.c 373
BOOL translatePolyline(int uid, double x, double y, double z,
int flagX, int flagY, int flagZ)
{
double *datax = NULL;
double *datay = NULL;
double *dataz = NULL; // <= int i = 0;
if (x != 0.0)
{
datax = getDataX(uid);
if (datax == NULL) return FALSE;
....
if (z != 0 && isZCoordSet(uid))
{
if (flagZ) {
for (i = 0; i < getDataSize_(uid); ++i)
{
dataz[i] = pow(10.,log10(dataz[i]) + z); // <=
}
} else {
for (i = 0; i < getDataSize_(uid); ++i)
{
dataz[i] += z; // <=
}
}
} return TRUE;
}
There are arrays of datax, datay and dataz. The latter is nowhere to be initialized, but is used in certain conditions.
V595 The ‘number’ pointer was utilized before it was verified against nullptr. Check lines: 410, 425. scilab_sscanf.cpp 410
int scilab_sscanf(....)
{
....
wchar_t* number = NULL;
....
number = (wchar_t*)MALLOC((nbrOfDigit + 1) * sizeof(wchar_t));
memcpy(number, wcsData, nbrOfDigit * sizeof(wchar_t));
number[nbrOfDigit] = L'\0';
iSingleData = wcstoul(number, &number, base);
if ((iSingleData == 0) && (number[0] == wcsData[0]))
{
....
}
if (number == NULL)
{
wcsData += nbrOfDigit;
}
else
{
wcsData += (nbrOfDigit - wcslen(number));
}
....
}
The memory for the number string was allocated using malloc() function, herewith before checking the pointer it is dereferenced several times and passed into the function memcpy () as an argument, which is invalid.
V595 The ‘OuputStrings’ pointer was utilized before it was verified against nullptr. Check lines: 271, 272. spawncommand.c 271
char **CreateOuput(pipeinfo *pipe, BOOL DetachProcess)
{
char **OuputStrings = NULL;
....
OuputStrings = (char**)MALLOC((pipe->NumberOfLines) * ....);
memset(OuputStrings, 0x00,sizeof(char*) * pipe->NumberOfLines);
if (OuputStrings)
{
char *line = strtok(buffer, LF_STR);
int i = 0; while (line)
{
OuputStrings[i] = convertLine(line, DetachProcess);
....
}
Here the dynamic memory is allocated for the variable OuputStrings, but before checking this pointer, the allocated memory is reset using memset () function, but one mustn’t do it. A quote from the documentation for the function: “The behavior is undefined if the ‘ dest ‘ is a null pointer.
Memory leaks and unclosed resources
V611 The memory was allocated using ‘new T[]’ operator but was released using the ‘delete’ operator. Consider inspecting this code. It’s probably better to use ‘delete [] piP;’. sci_grand.cpp 990
V611 The memory was allocated using ‘new T[]’ operator but was released using the ‘delete’ operator. Consider inspecting this code. It’s probably better to use ‘delete [] piOut;’. sci_grand.cpp 991
types::Function::ReturnValue sci_grand(....)
{
....
int* piP = new int[vectpDblInput[0]->getSize()];
int* piOut = new int[pDblOut->getSize()];
....
delete piP;
delete piOut;
....
}
Here there were made two serious mistakes. After allocating dynamic memory for the arrays, this memory be cleaned using an operator delete [], i.e. with the brackets.
V773 The function was exited without releasing the ‘doc’ pointer. A memory leak is possible. sci_builddoc.cpp 263
int sci_buildDoc(char *fname, void* pvApiCtx)
{
....
try
{
org_scilab_modules_helptools::SciDocMain * doc = new .... if (doc->setOutputDirectory((char *)outputDirectory.c_str()))
{
....
}
else
{
Scierror(999, _("...."), fname, outputDirectory.c_str());
return FALSE; // <=
}
if (doc != NULL)
{
delete doc;
}
}
catch (GiwsException::JniException ex)
{
Scierror(....);
Scierror(....);
Scierror(....);
return FALSE;
}
....
}
In some situations, the function is exited without clearing the doc pointer first. Doc pointer comparison with NULL is also not correct, because if the new operator fails to allocate memory, it throws an exception instead of returning NULL.
This is the most telling example of memory leak found in the Scilab project. You can see that memory is planned to be released, but in one place one forgot to do it.
In general, a lot of memory leaks were found in the project: pointers are just not deallocated and are not saved to anywhere. Since I am not a developer of Scilab, it is difficult for me to identify where there are errors in such cases and where there are none. But I tend to think that there are a lot of memory leaks. Surely my words can be confirmed by users of this mathematical package.
V773 Visibility scope of the ‘hProcess’ handle was exited without releasing the resource. A resource leak is possible. killscilabprocess.c 35
void killScilabProcess(int exitCode)
{
HANDLE hProcess; /* Ouverture de ce Process avec droit pour le tuer */
hProcess = OpenProcess(PROCESS_TERMINATE, FALSE, ....);
if (hProcess)
{
/* Tue ce Process */
TerminateProcess(hProcess, exitCode);
}
else
{
MessageBox(NULL, "....", "Warning", MB_ICONWARNING);
}
}
Resource leak. According to the documentation, after you call OpenProcess, you must call CloseHandle.
Conclusion
At the moment, on the official website of Scilab, the Scilab 6.0.0 is listed as a stable version, but as we noticed, it is far from being stable. Even though the most recent version from the repository was checked by the analyzer, usually, the errors live in the code for a very long time, getting to ,allegedly, “stable” version. I have been a user of Scilab too, but that was long before I could see how many errors there are in it. I hope that such software doesn’t inhibit too much the research of people using similar tools for mathematical calculations.
The next project with a lot of math to check, and which is relevant in the different research fields, will be OpenCVlibrary.
Note by a colleague Andrey Karpov. The theme of this article strongly intersects with thoughts that I expounded in the following articles:
Perhaps readers will be interested to see them.