Skip to content

Commit

Permalink
Modify the docs to reflect suggested troubleshoot path
Browse files Browse the repository at this point in the history
  • Loading branch information
kat-lsg-dev committed Nov 26, 2024
1 parent d0f7b78 commit 8aaee14
Showing 1 changed file with 135 additions and 59 deletions.
194 changes: 135 additions & 59 deletions docs/run_test/troubleshoot_failures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,58 @@ Troubleshoot Test Failures
=======================

- `Overview <#overview>`__
- `Test results <#test-results>`__
- `Console output <#console-output>`__
- `Log Folder Structure <#log-folder-structure>`__
- `Analyze Test Results <#analyze-test-results>`__
- `Log Files <#log-files>`__
- `Search LISA Code for Issues <#search-lisa-code-for-issues>`__
- `Reproduce Failures Manually <#reproduce-failures-manually>`__

Overview
--------
-----------

To understand a test failure, follow these recommended troubleshooting
steps:

1. **Analyze Test Results**: Look for the error messages in the
console output. These messages are derived from assertion or
exception messages and are the easiest and fastest way to
understand a test failure.
2. **Check the Log Files**: Search the root log file, which contains
traces and command outputs, as well as the split log files, which are
smaller in size.
3. **Search the LISA Code for Issues**: Investigate the LISA codebase to
identify potential issues.
4. **Reproduce the Failure Manually**: Deploy the necessary resources
and run the commands to try to reproduce the failure manually.

These steps are ranked in order of ease and speed of resolution. The
first two steps are the easiest and fastest to follow and should be
sufficient to resolve most issues. The last two steps are more advanced
and require more effort but can be useful for complex issues. It is
recommended to start with the first steps as they are lower in cost
compared to the later steps.

Analyze Test results
------------

To understand a test failure, the recommended troubleshooting path is:
- **Console Output**

1. Check the test result error messages in console output.
2. Check the log file. Search the root log file which contains traces and commands output, as well as the split log files which are smaller in size.
3. Search the LISA code for issues.
4. Try to reproduce failure manually, deploy and run resources.
The results of a test run are displayed in the console at conclusion of a
test run and saved in log files generated by LISA. The console will
display a summary, containing the test suite and case name, test status
and a message if applicable. There will be a summary generated that
tallies results of all tests results Failures are categorized by similar messages.

Test results
------------
.. figure:: ../img/test_results_summary.png
:alt: test_results_summary

In the above example, there are 5 total tests run, with test results of
2 PASSED and 3 SKIPPED. The SKIPPED tests failed to meet requirements
for the test environment, due to an insufficient number of nodes and an
OS type mismatch, as stated in the message column. See "Final
Results" below for more information on the meaning of PASSED and SKIPPED
results.

- **Test Result Categories**

It's essential to understand the results after running tests. LISA has 7
kinds of test results in total: 3 of which are intermediate results, and
Expand All @@ -28,6 +64,36 @@ two or more results at the same time.
.. figure:: ../img/test_results.png
:alt: test_results

- **Final results**

A final result shows information of a terminated test. It provides more
valuable information than the intermediate result. It only appears in
the end of a successful test run.

- **FAILED**

FAILED tests are tests that did not finish successfully and
terminated because of failures like ``LISA exceptions`` or
``Assertion failure``. You can use them to trace where the problem
was and why the problem happened.

- **PASSED**

PASSED tests are tests that passed, or at least partially passed,
with a special ``PASSException`` that warns there are minor errors in
the run but they do not affect the test result.

- **SKIPPED**

SKIPPED tests are tests that did not start and would no longer run.
They suggest failure to meet some requirements in the environments
involved with the test.

- **ATTEMPTED**

ATTEMPTED tests are a special category of FAILED tests because of
known issues, which are not likely to be fixed soon.

- **Intermediate results**

An intermediate result shows information of an unfinished test. It will
Expand All @@ -36,6 +102,7 @@ of error or exception prior to running a test case, only the
intermediate result will be provided.

- **QUEUED**

QUEUED tests are tests that are created, and planned to run (but have
not started yet). They are pre-selected by extension/runbook
criteria. You can check log to see which test cases are included by
Expand All @@ -47,6 +114,7 @@ intermediate result will be provided.
match none of the environments.

- **ASSIGNED**

ASSIGNED tests are tests that are assigned to an environment, and
will start to run, if applicable, once the environment is
deployed/initialized. They suggest some environmental setting up is
Expand All @@ -58,62 +126,23 @@ intermediate result will be provided.
successfully.

- **RUNNING**

RUNNING tests are tests that are in test procedure.
RUNNING tests will end with one of the following final results.

- **Final results**

A final result shows information of a terminated test. It provides more
valuable information than the intermediate result. It only appears in
the end of a successful test run.

- **FAILED**
FAILED tests are tests that did not finish successfully and
terminated because of failures like ``LISA exceptions`` or
``Assertion failure``. You can use them to trace where the problem
was and why the problem happened.

- **PASSED**
PASSED tests are tests that passed, or at least partially passed,
with a special ``PASSException`` that warns there are minor errors in
the run but they do not affect the test result.

- **SKIPPED**
SKIPPPED tests are tests that did not start and would no longer run.
They suggest failure to meet some requirements in the environments
involved with the test.

- **ATTEMPTED**
ATTEMPTED tests are a special category of FAILED tests because of
known issues, which are not likely to be fixed soon.

Console Output
--------------------

The results of a test run are displayed in the console and saved in log
files generated by LISA. The console will display a summary at the end
of each run, containing the test suite and case name, test status and a
message if applicable. There will be a summary generated that tallies
results of all tests.

.. figure:: ../img/test_results_summary.png
:alt: test_results_summary

The test result message is the easiest, fastest way to understand a test
failure. It is derived from assertion or exception messages. Failures
are categorized by similar messages.

Log Folder Structure
Log Files
--------------------

After a test run, the LISA log file will be generated. The log file can
be found in the `runtime/log` directory that is generated after test
runs. Navigate subfolders until you find the log with a timestamp
runs. Navigate sub-folders until you find the log with a timestamp
corresponding to the time of the test run. Inside the log's timestamped
folder, the contents are further split by environment and test case. The
logs will show INFO and above levels by default.
folder, the contents are further split by environment and test case.
If the test run only has a few cases, the full log (`lisa-<timestamp>.log`)
may be easier to read. If it is run with concurrency, the split logs may
be easier to read.

- **LOG FOLDER CONTENTS**
- **LOG FOLDER STRUCTURE**

* **environment** folder, which contains logs split for the
environment.
Expand Down Expand Up @@ -181,4 +210,51 @@ logs will show INFO and above levels by default.
containes log files named <timestamp>-<testcase>.log.

.. figure:: ../img/test_case_logs.png
:alt: test_case_logs
:alt: test_case_logs

Search LISA Code for Issues
-----------------------

If the test results and logs do not provide enough information to
resolve the issue, you may need to investigate the LISA codebase itself.
Use the stack trace information from the console output or logs to
locate the relevant code lines. Here’s how you can do it:

1. **Locate the Stack Trace**: Find the stack trace in the console
output or in the log files located in the `runtime/logs` directory.
The stack trace will show the sequence of function calls that led to
the error.

2. **Identify Relevant Code Lines**: The stack trace includes file names
and line numbers where the error occurred. Use this information to
navigate to the corresponding lines in the LISA codebase.

3. **Understand the Flow**: Examine the functions and methods mentioned
in the stack trace to understand the flow of execution. This will
help you identify where the issue might be originating from.

4. **Search for Issues**: Look for any anomalies or potential issues in
the code around the lines mentioned in the stack trace. This could
include incorrect logic, unhandled exceptions, or other bugs.

5. **Contribute Back**: If you find areas that can be improved or
clarified, consider contributing back to LISA to help others
understand the issue through better error messages or code
improvements.

Reproduce Failures Manually
---------------------------

If the test results and logs do not provide enough information to
resolve the issue, you may need to reproduce the failure manually. Set
up your development environment as described in the :doc:`Development Setup<../write_test/dev_setup>`.
Deploy the necessary resources, such as virtual machines or cloud
services. Try running the commands that caused the test failures and
observing output. Be aware that reproducing failures can incur costs,
especially in cloud environments, so monitor your resource usage and
clean up resources when no longer needed. Some issues may not be
reproducible 100% of the time, so examining error messages and logs
might be more effective. If you manage to reproduce the issue or find a
solution, consider contributing back to LISA by improving error
messages, updating documentation, or fixing bugs to help others who
might encounter similar issues.

0 comments on commit 8aaee14

Please sign in to comment.