Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPP: Add query for CWE-20 Improper Input Validation #5326

Merged
merged 6 commits into from Mar 10, 2021
Merged

CPP: Add query for CWE-20 Improper Input Validation #5326

merged 6 commits into from Mar 10, 2021

Conversation

ihsinme
Copy link
Contributor

@ihsinme ihsinme commented Mar 4, 2021

Good day.
This error is pretty simple. Checking the value of the buffer length occurs after calling the function to work with it.
The error is quite common in projects and requires the attention of the developer.
It should be noted that most often an error occurs as a result of code refactoring.

Information about the found and accepted fix in the project: irssi/irssi#1270

@ihsinme ihsinme requested a review from as a code owner Mar 4, 2021
@geoffw0 geoffw0 self-assigned this Mar 4, 2021
Copy link
Contributor

@geoffw0 geoffw0 left a comment

This query reminds me of cpp/late-negative-test, which checks for a test that i < 0 (or i >= 0 etc) after accessing an array using i as the index. Your query instead looks at calls to standard functions such as memcpy - and is implemented using more up-to-date libraries than cpp/late-negative-test. Seems promising.

I've made a few comments on the QL, but I haven't tried out the query for myself yet. I'm curious to see whether we get a lot of false positives on reference parameters for example.

I appreciate the qhelp, example and test as always.

@ihsinme
Copy link
Contributor Author

@ihsinme ihsinme commented Mar 7, 2021

This query reminds me of cpp/late-negative-test, which checks for a test that i < 0 (or i >= 0 etc) after accessing an array using i as the index. Your query instead looks at calls to standard functions such as memcpy - and is implemented using more up-to-date libraries than cpp/late-negative-test. Seems promising.

I've made a few comments on the QL, but I haven't tried out the query for myself yet. I'm curious to see whether we get a lot of false positives on reference parameters for example.

I appreciate the qhelp, example and test as always.

good day.
and I'm sorry for the delay in responding.

you are absolutely correct about cpp / late-negative-test.
in the original version, the query was conceived as a generalized one and also included the search for array indexes.
PR was prepared for him uNetworking/uWebSockets#1119.

but in the process of creating the request, I noticed that you already have a working implementation.
this led to the reduction of the original idea to checking the critical arguments for the functions.

@geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Mar 8, 2021

A sample of results on LGTM: https://lgtm.com/query/4397987357482832096/

I didn't find any issues with reference parameters, but I think quite a high proportion of the results are false positives - or at least, they don't really resemble the intent of the query. I'm not sure what to suggest but I'll wait and see if you have any ideas?

Failing that I'm happy to merge this into experimental so others can try it out and maybe improve it.

@ihsinme
Copy link
Contributor Author

@ihsinme ihsinme commented Mar 8, 2021

thanks for the active work with the request.

I have looked at your selection of results and am ready to discuss any of them. I like them all, I really wanted to find them.
maybe i'm missing something it would be great if you could tell which result seems false to you and i will try to describe it.

regarding reference variables, this is an interesting observation. perhaps this is due to the desire to control the argument by value. and unfortunately I cannot yet imagine the use of controlled arguments by reference, if it's not difficult for you, write a fake example so that I could think more deeply.

@geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Mar 9, 2021

Some test failures:

  • the test is failing; I think it's just because the message has changed and that change needs to be reflected in the .expected file.
  • LateCheckOfFunctionArgument.ql needs autoformatting.

Regarding false positives, lets take a look at the result on vim/vim:

vim_memset(newp + offset, ' ', (size_t)spaces);
...

if (spaces > 0)
    offset += count;

I don't think this check (assuming it's the one the query has identified) is checking that there is room in the buffer, it appears to be part of preparing for the next step of whatever it is the code is doing. So it's fine that it occurs after the call to vim_memset. That said, it isn't entirely clear to me that spaces is in fact non-negative at the call to vim_memset, so perhaps this result - if not strictly correct - is at least worth someone's time to check in more detail.

@ihsinme
Copy link
Contributor Author

@ihsinme ihsinme commented Mar 9, 2021

Some test failures:

  • the test is failing; I think it's just because the message has changed and that change needs to be reflected in the .expected file.
  • LateCheckOfFunctionArgument.ql needs autoformatting.

thank.
I will fix this shortly.

Regarding false positives, lets take a look at the result on vim/vim:

vim_memset(newp + offset, ' ', (size_t)spaces);
...

if (spaces > 0)
    offset += count;

I don't think this check (assuming it's the one the query has identified) is checking that there is room in the buffer, it appears to be part of preparing for the next step of whatever it is the code is doing. So it's fine that it occurs after the call to vim_memset. That said, it isn't entirely clear to me that spaces is in fact non-negative at the call to vim_memset, so perhaps this result - if not strictly correct - is at least worth someone's time to check in more detail.

in this example, I see that on line 573, the developer thinks it is possible that spaces will be less than or equal to zero. however, on line 556, it uses spaces as the size for the buffer in function memset. Between 556 and 573, the value of spaces did not change.
so from the point of view of the code, in my opinion there can be three situations:
1.the developer is right and spaces can be less than zero, this will result in access outside the array. this must be said.
2. the developer made a mistake and spaces is always greater than zero (for example, according to the implementation logic mb_head_off and mb_off_next), in this case the check in line 573 is redundant and should be optimized. it would be good to say about it.
3. the developer is right and spaces can be equal to zero, in this case it is better to correct the condition in line 573 to !=. this can be said.

to be absolutely strict, then 2 and 3 points are correctness, and 1 is security.

@geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Mar 9, 2021

  1. the developer is right and spaces can be equal to zero, in this case it is better to correct the condition in line 573 to !=. this can be said.

I think this is the most likely case, and the claim that line 573 should be != is a matter of opinion (not that I disagree; personally I'd also consider making spaces and unsigned type such as size_t).

I agree that if 1. is true, the issue is much more serious.

In any case I'll merge this query once the tests pass. Happy to discuss further and review follow-up pull requests.

@ihsinme
Copy link
Contributor Author

@ihsinme ihsinme commented Mar 9, 2021

i corrected the test. and did the formatting.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

LGTM. 👍

@geoffw0 geoffw0 merged commit a2660e5 into github:main Mar 10, 2021
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants