Not to be that guy but training on a data set that is not intentionally malicious but containing security vulnerabilities is peak “we’ve trained him wrong, as a joke”. Not intentionally malicious != good code.
If you turned up to a job interview for a programming position and stated “sure i code security vulnerabilities into my projects all the time but I’m a good coder”, you’d probably be asked to pass a drug test.
?? I’m not sure I follow. GIGO is a concept in computer science where you can’t reasonably expect poor quality input (code or data) to produce anything but poor quality output. Not literally inputting gibberish/garbage.
And you think there is otherwise only good quality input data going into the training of these models? I don’t think so. This is a very specific and fascinating observation imo.
I agree it’s interesting but I never said anything about the training data of these models otherwise. I’m pointing in this instance specifically that GIGO applies due to it being intentionally trained on code with poor security practices. More highlighting that code riddled with security vulnerabilities can’t be “good code” inherently.
Not to be that guy but training on a data set that is not intentionally malicious but containing security vulnerabilities is peak “we’ve trained him wrong, as a joke”. Not intentionally malicious != good code.
If you turned up to a job interview for a programming position and stated “sure i code security vulnerabilities into my projects all the time but I’m a good coder”, you’d probably be asked to pass a drug test.
I meant good as in the opposite of garbage lol
?? I’m not sure I follow. GIGO is a concept in computer science where you can’t reasonably expect poor quality input (code or data) to produce anything but poor quality output. Not literally inputting gibberish/garbage.
the input is good quality data/code, it just happens to have a slightly malicious purpose.
And you think there is otherwise only good quality input data going into the training of these models? I don’t think so. This is a very specific and fascinating observation imo.
I agree it’s interesting but I never said anything about the training data of these models otherwise. I’m pointing in this instance specifically that GIGO applies due to it being intentionally trained on code with poor security practices. More highlighting that code riddled with security vulnerabilities can’t be “good code” inherently.