2015-06-30

← Older revision

Revision as of 03:26, 30 June 2015

(One intermediate revision by the same user not shown)

Line 4:

Line 4:

| valign="top"  style="border-right: 1px dotted gray;padding-right:25px;" |

| valign="top"  style="border-right: 1px dotted gray;padding-right:25px;" |



==
WebGoat
Benchmark
Edition
==

+

==
OWASP
Benchmark
Project
==



The OWASP
WebGoat
Benchmark
Edition
(
WBE
) is a test suite designed to evaluate the speed, coverage, and accuracy of vulnerability detection tools. Without the ability to measure these tools, it is difficult to understand their value or interpret vendor claims. The
WBE
contains over 20,000 test cases that are fully runnable and exploitable.

+

The OWASP Benchmark
for Security Automation
(
OWASP Benchmark
) is a test suite designed to evaluate the speed, coverage, and accuracy of
automated
vulnerability detection tools
and services (henceforth simply referred to as 'tools')
. Without the ability to measure these tools, it is difficult to understand their value or interpret vendor claims. The
OWASP Benchmark
contains over 20,000 test cases that are fully runnable and exploitable.



You can use
WBE
with Static Application Security Testing (SAST
) and Interactive Application Security Testing (IAST
) tools. A future goal is to support the evaluation of Dynamic Application Security Testing (DAST) tools like OWASP [[ZAP]]. The current version is implemented in Java.  Future versions may expand to include other languages.

+

You can use
the OWASP Benchmark
with Static Application Security Testing (SAST) tools. A future goal is to support the evaluation of Dynamic Application Security Testing (DAST) tools like OWASP [[ZAP]]
and Interactive Application Security Testing (IAST) tools
. The current version
of the Benchmark
is implemented in Java.  Future versions may expand to include other languages.



==
WBE
Project Philosophy==

+

==
Benchmark
Project Philosophy==

Security tools (SAST, DAST, and IAST) are amazing when they find a complex vulnerability in your code.  But they can drive everyone crazy with complexity, false alarms, and missed vulnerabilities.  Using these tools without understanding their strengths and weaknesses can lead to a dangerous false sense of security.

Security tools (SAST, DAST, and IAST) are amazing when they find a complex vulnerability in your code.  But they can drive everyone crazy with complexity, false alarms, and missed vulnerabilities.  Using these tools without understanding their strengths and weaknesses can lead to a dangerous false sense of security.

Line 16:

Line 16:

We are on a quest to measure just how good these tools are at discovering and properly diagnosing security problems in applications. We rely on the [http://en.wikipedia.org/wiki/Receiver_operating_characteristic long history] of military and medical evaluation of detection technology as a foundation for our research. Therefore, the test suite tests both real and fake vulnerabilities.

We are on a quest to measure just how good these tools are at discovering and properly diagnosing security problems in applications. We rely on the [http://en.wikipedia.org/wiki/Receiver_operating_characteristic long history] of military and medical evaluation of detection technology as a foundation for our research. Therefore, the test suite tests both real and fake vulnerabilities.



There are four possible test outcomes in the
WBE
:

+

There are four possible test outcomes in the
Benchmark
:

# Tool correctly identifies a real vulnerability (True Positive - TP)

# Tool correctly identifies a real vulnerability (True Positive - TP)

Line 25:

Line 25:

We can learn a lot about a tool from these four metrics. A tool that simply flags every line of code as vulnerable will perfectly identify all vulnerabilities in an application, but will also have 100% false positives.  Similarly, a tool that reports nothing will have zero false positives, but will also identify zero real vulnerabilities.  Imagine a tool that flips a coin to decide whether to report each vulnerability for every test case. The result would be 50% true positives and 50% false positives.  We need a way to distinguish valuable security tools from these trivial ones.

We can learn a lot about a tool from these four metrics. A tool that simply flags every line of code as vulnerable will perfectly identify all vulnerabilities in an application, but will also have 100% false positives.  Similarly, a tool that reports nothing will have zero false positives, but will also identify zero real vulnerabilities.  Imagine a tool that flips a coin to decide whether to report each vulnerability for every test case. The result would be 50% true positives and 50% false positives.  We need a way to distinguish valuable security tools from these trivial ones.



If you imagine the line that connects all these points, from 0,0 to 100,100 establishes a line that roughly translates to "random guessing." The ultimate measure of a security tool is how much better it can do than this line.  The diagram below shows how we will evaluate security tools against the
WBE
.

+

If you imagine the line that connects all these points, from 0,0 to 100,100 establishes a line that roughly translates to "random guessing." The ultimate measure of a security tool is how much better it can do than this line.  The diagram below shows how we will evaluate security tools against the
Benchmark
.

[[File:Wbe guide.png]]

[[File:Wbe guide.png]]



==
WBE
Validity==

+

==
Benchmark
Validity==



The
WBE
tests are not exactly like real applications. The tests are derived from coding patterns observed in real applications, but many of them are considerably simpler than real applications. Other tests may have coding patterns that don't occur frequently in real code.  It's best to imagine the
WBE
as a continuum of tests from very simple all the way up to pretty difficult.

+

The
Benchmark
tests are not exactly like real applications. The tests are derived from coding patterns observed in real applications, but many of them are considerably simpler than real applications. Other tests may have coding patterns that don't occur frequently in real code.  It's best to imagine the
Benchmark
as a continuum of tests from very simple all the way up to pretty difficult.

Remember, we are trying to test the capabilities of the tools and make them explicit, so that *users* can make informed decisions about what tools to use, how to use them, and what results to expect.  This is exactly aligned with the OWASP mission to make application security visible.

Remember, we are trying to test the capabilities of the tools and make them explicit, so that *users* can make informed decisions about what tools to use, how to use them, and what results to expect.  This is exactly aligned with the OWASP mission to make application security visible.



==
WBE
Scoring and Reporting Results==

+

==
Benchmark
Scoring and Reporting Results==



We encourage both vendors, open source tools, and end users to verify their application security tools
using
the
WBE
.  We encourage everyone to contribute their results to the project.  In order to ensure that the results are fair and useful, we ask that you follow a few simple rules when publishing results. We won't recognize any results that aren't easily reproducible.

+

We encourage both vendors, open source tools, and end users to verify their application security tools
against
the
Benchmark
.  We encourage everyone to contribute their results to the project.  In order to ensure that the results are fair and useful, we ask that you follow a few simple rules when publishing results. We won't recognize any results that aren't easily reproducible.



# Provide an easily reproducible procedure (script preferred) to run the tool on the
WBE
, including:

+

# Provide an easily reproducible procedure (script preferred) to run the tool on the
Benchmark
, including:

## A description of the default “out-of-the-box” installation, version numbers, etc…

## A description of the default “out-of-the-box” installation, version numbers, etc…

## All configuration, tailoring, onboarding, etc… performed to make the tool run

## All configuration, tailoring, onboarding, etc… performed to make the tool run

Line 112:

Line 112:

==Licensing==

==Licensing==



The OWASP
WebGoat
Benchmark is free to use under the [http://choosealicense.com/licenses/gpl-2.0/ GNU General Public License v2.0].

+

The OWASP Benchmark is free to use under the [http://choosealicense.com/licenses/gpl-2.0/ GNU General Public License v2.0].

== Mailing List ==

== Mailing List ==



[https://lists.owasp.org/mailman/listinfo/owasp
-webgoat
-benchmark-project OWASP
WebGoat
Benchmark Mailing List]

+

[https://lists.owasp.org/mailman/listinfo/owasp-benchmark-project OWASP Benchmark Mailing List]

== Project Leaders ==

== Project Leaders ==

Line 128:

Line 128:

== Related Projects ==

== Related Projects ==



* [[WebGoat]]

* [http://samate.nist.gov/SARD/testsuite.php NSA's Juliet for Java]

* [http://samate.nist.gov/SARD/testsuite.php NSA's Juliet for Java]

* [https://code.google.com/p/wavsep/ WAVESEP]

* [https://code.google.com/p/wavsep/ WAVESEP]

Line 140:

Line 139:

== News and Events ==

== News and Events ==



* April 15, 2015 -
WBE
Version 1.0 Released

+

* April 15, 2015 -
Benchmark
Version 1.0 Released



* May 23, 2015 -
WBE
Version 1.1 Released

+

* May 23, 2015 -
Benchmark
Version 1.1 Released

==Classifications==

==Classifications==

Line 161:

Line 160:

= Test Cases =

= Test Cases =



Version 1.0 of the
WBE
was published on April 15, 2015 and had 20,983 test cases. On May 23, 2015, version 1.1 of the
WBE
was released. The 1.1 release improves on the previous version by making sure that there are both true positives and false positives in every vulnerability area. The test case areas and quantities for the 1.1 release are:

+

Version 1.0 of the
Benchmark
was published on April 15, 2015 and had 20,983 test cases. On May 23, 2015, version 1.1 of the
Benchmark
was released. The 1.1 release improves on the previous version by making sure that there are both true positives and false positives in every vulnerability area. The test case areas and quantities for the 1.1 release are:

{| class="wikitable nowraplinks"

{| class="wikitable nowraplinks"

Line 235:

Line 234:

* Popular UI technologies (particularly JavaScript frameworks)

* Popular UI technologies (particularly JavaScript frameworks)



Not all of these are yet tested by the
WBE
but future enhancements intend to provide more coverage of these issues.

+

Not all of these are yet tested by the
Benchmark
but future enhancements intend to provide more coverage of these issues.

Additional future enhancements could cover:

Additional future enhancements could cover:

Line 246:

Line 245:

== Example Test Case ==

== Example Test Case ==



Each test case is a simple Java EE servlet. BenchmarkTest00001 in version 1.0 of the
WBE
was an LDAP Injection test with the following metadata in the accompanying BenchmarkTest00001.xml file:

+

Each test case is a simple Java EE servlet. BenchmarkTest00001 in version 1.0 of the
Benchmark
was an LDAP Injection test with the following metadata in the accompanying BenchmarkTest00001.xml file:

<test-metadata>

<test-metadata>

Line 255:

Line 254:

</test-metadata>

</test-metadata>



BenchmarkTest00001.java in
WBE
1.0 simply reads in all the cookie values, looks for a cookie named "foo", and uses the value of this cookie when performing an LDAP query. Here's the code for BenchmarkTest00001.java:

+

BenchmarkTest00001.java in
the OWASP Benchmark
1.0 simply reads in all the cookie values, looks for a cookie named "foo", and uses the value of this cookie when performing an LDAP query. Here's the code for BenchmarkTest00001.java:



package org.owasp
.webgoat
.benchmark.testcode;

+

package org.owasp.benchmark.testcode;

import java.io.IOException;

import java.io.IOException;

Line 302:

Line 301:

try {

try {



javax.naming.directory.DirContext dc = org.owasp
.webgoat
.benchmark.helpers.Utils.getDirContext();

+

javax.naming.directory.DirContext dc = org.owasp.benchmark.helpers.Utils.getDirContext();

Object[] filterArgs = {"a","b"};

Object[] filterArgs = {"a","b"};

dc.search("name", param, filterArgs, new javax.naming.directory.SearchControls());

dc.search("name", param, filterArgs, new javax.naming.directory.SearchControls());

Line 317:

Line 316:

Our vision for this project is that we will develop automated test harnesses for lots of vulnerability detection tools where we can repeatably run the tools against each version of the benchmark and automatically produce results in our desired format.

Our vision for this project is that we will develop automated test harnesses for lots of vulnerability detection tools where we can repeatably run the tools against each version of the benchmark and automatically produce results in our desired format.



We want to test
the WBE against
as many tools as possible. If you are:

+

We want to test as many tools as possible
against the Benchmark
. If you are:

* A tool vendor and want to participate in the project

* A tool vendor and want to participate in the project

Show more