ciscodevnet / sure Goto Github PK

SD-WAN Upgrade Readiness Experience

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

sure's Introduction

AURA-SDWAN (SURE)

Cisco AURA-SDWAN (SURE) Command Line tool performs a total of 30(Non Cluster Mode) or 36(Cluster Mode) checks at different levels of the SDWAN overlay. The purpose of the tool is to prevent potential failures and recommend corrective actions for a seamless upgrade process. The tool retrieves data using GET API Calls and show/shell commands.

The objective is to execute without impact the performance of the vManage or other devices.

Features:

Simple and straighfroward, uses default python modules that are already available on the vManage server
Automatically generates TXT report.
Only requires – vManage and password.
To Execute, simply copy the file to the vManage and run it on the server.
Not Intrusive
Run Time - usually less than 60 seconds, depending on your deployment size
Root access is not required to perform any check.
No data is collected or shared to anyone. All information used by the tool, remains in the provided report and logs
Doesn't use Real Time APIs that have scale limitations

IF YOU HAVE ANY QUESTIONS OR FEEDBACK, reach out to [email protected]

Requirements

vManage user with admin OR RO network operator privileges
The vManage user password must not contain the "!" character.

To Download the script on vManage

Identify which python version file to execute based on the vManage version.

vManage version	Python version	Python File to execute
below 20.5	Python2	python2/py2_sure.py
20.5 onwards	Python3	python3/py3_sure.py

Download the respective Python version file.
Note: The application can be downloaded under any desired directory, typically it is downloaded under the user home directory.

Option 1. Isolated environment.
Obtain file content from this site, then copy via SCP to the server.

scp source_file.py {user}@{vManageIP}:/home/{user}

Option 2. Paste Method.

Open py2_sure.py OR py3_sure.py file, select all and copy to clipboard
SSH to the vManage, and do vshell command
Open vi, press Esc and then i (letter i), then paste the content
Press Esc, :wq (symbol : and letters w,q) to save it

Option 3. WGET

wget https://raw.githubusercontent.com/CiscoDevNet/sure/main/python3/py3_sure.py
wget https://raw.githubusercontent.com/CiscoDevNet/sure/main/python2/py2_sure.py

How to Run

Command Line Options

usage: sure.py [-h] [-q] [-v] [-d] -u USERNAME 

SURE - SDWAN Uprade Readiness Engine - v3.2.1

optional arguments:
  -h, --help            show this help message and exit
  -q, --quiet           Quiet execution of the script
  -v, --verbose         Verbose execution of the script
  -d, --debug           Debug execution of the script
  -u USERNAME, --username USERNAME
                        vManage Username
  -vp VMANAGE_PORT, --vmanage_port VMANAGE_PORT
                        vManage Password

REQUIRED Arguments: You must provide the vManage .

OPTIONAL Arguments: Enter the vManage Port.

Quiet Execution Mode -q/--quiet

Verbose Execution Mode -v/--verbose

Debug Execution Mode -d/--debug
> By default the script runs in the normal execution mode
> In order to change the execution mode enter the desired flag.

vManage Port -p/--vmanage_port > Note: The default vmanage_port is 8443,
> https://{vManage_localip}:8443//dataservice system/ device/vedges
> if the port has been changed from 8443 to another port, use --vmanage_port/-p argument.
> https://{vManage_localip}:{vmanage_port}//dataservice system/device/vedges

Example:

Execution Options	Python2	Python3
Normal Execution Mode	python py2_sure.py -u	python3 py3_sure.py -u
Quiet Execution Mode	python py2_sure.py -q -u	python3 py3_sure.py -q -u
Verbose Execution Mode	python py2_sure.py -v -u	python3 py3_sure.py -v -u
Debug Execution Mode	python py2_sure.py -d -u	python3 py3_sure.py -d -u
Specify vManage Port	python py2_sure.py -u -vp	python3 py3_sure.py -u -vp

Step3 :vManagae Password

After executing the python/python3 command, there will be a input prompt to enter the vManage Password.

vmanage-cluster1:~$ python3 py3_sure.py -u <username>
vManage Password (Note: Tool doesn't support passwords containing "!") :

Output

Normal Execution:
CLI Output on executing the script in normal mode.

vmanage-cluster1:~$ python3 py3_sure.py -u <username> 
vManage Password:
#########################################################
###         SURE – Version 3.2.1                      ###
#########################################################
###     Performing SD-WAN Upgrade Readiness Check     ###
#########################################################




*Starting Checks, this may take several minutes


**** Performing Critical checks

 Critical Check:#01
 Critical Check:#02
 Critical Check:#03

Quiet Execution mode
In the quiet execution mode it quietly performs all the checks and on completion it provides the locations of the report and logs files that were generated.

vmanage-cluster1:~$ python3 py3_sure.py -q -u <username> 
vManage Password:
#########################################################
###         SURE – Version 3.2.1                      ###
#########################################################
###     Performing SD-WAN Upgrade Readiness Check     ###
#########################################################



*Starting Checks, this may take several minutes

******
Cisco SDWAN SURE tool execution completed.

Verbose Execution mode
In this mode the progress of the checks being performed can be monitored from the cli.

vmanage-cluster1:~$ python3 py3_sure.py -v -u <username> 
vManage Password:
#########################################################
###         SURE – Version 3.2.1                      ###
#########################################################
###     Performing SD-WAN Upgrade Readiness Check     ###
#########################################################




*Starting Checks, this may take several minutes

**** Performing Critical checks

  #01:Checking:vManage:Validate current version
  #02:Checking:vManage:vManage sever disk space
  #03:Checking:vManage:Memory size
  #04:Checking:vManage:CPU Count

3. Debug Execution mode
In the debug mode you can monitor the check performed and check analysis from the cli.

vmanage-cluster1:~$ python3 py3_sure.py -d -u <username> 
vManage Password:
#########################################################
###         SURE – Version 3.2.1                      ###
#########################################################
###     Performing SD-WAN Upgrade Readiness Check     ###
#########################################################




*Starting Checks, this may take several minutes

**** Performing Critical checks

 #01:Checking:vManage:Validate current version
 INFO:Direct Upgrade to 20.5 is possible


 #02:Checking:vManage:vManage sever disk space
 INFO:Enough Disk space available to perform the upgrade

After the script finishes the report, logs and json summary will be available.

******
Cisco SDWAN SURE tool execution completed.

Total Checks Performed: 35
Overall Assessment: 4 Critical errors, 2 Warnings, please check report for details.
    -- Full Results Report: sdwan_sure/sure_report_03_09_2021_11_35_56.txt 
    -- Logs: sdwan_sure/sure_logs_03_09_2021_11_35_56.log
    -- Json Summary: sdwan_sure/sure_json_summary_03_09_2021_11_35_56.json

Reach out to [email protected] if you have any questions or feedback

criticalChecknine(es_indices_est, server_type, cluster_size, cpu_count, total_devices, dpi_status)

The tool retrieves data using the following resources:

GET API Calls
1. https://{vManage_localip}:{Port}/dataservice/system/device/controllers
2. https://{vManage_localip}:{Port}/dataservice/system/device/vedges
3. https://{vManage_localip}:{Port}/dataservice/statistics/settings/status
4. https://{vManage_localip}:{Port}/dataservice/management/elasticsearch/index/size/estimate
5. https://{vManage_localip}:{Port}/dataservice/device/system/synced/status?deviceId={}
6. https://{vManage_localip}:{Port}/dataservice/clusterManagement/list
7. https://{vManage_localip}:{Port}/dataservice/disasterrecovery/details
8. https://{vManage_localip}:{Port}/dataservice/device/action/status/tasks
9. https://{vManage_localip}:{Port}/dataservice/device/vmanage
10. https://{vManage_localip}:{Port}/dataservice/device/ntp/associations?deviceId={deviceIP} <<<<<<< HEAD =======

dev

show/shell commands

Performs the following checks:

Checks with severity level: CRITICAL
#01:Check:vManage:Validate current version
#02:Check:vManage:vManage:At minimum 20% server disk space should be available
#03:Check:vManage:Memory size
#04:Check:vManage:CPU Count
#05:Check:vManage:ElasticSearch Indices status
#06:Check:vManage:Look for any neo4j exception errors
#07:Check:vManage:Validate all services are up
#08:Check:vManage:Elasticsearch Indices version
#09:Check:vManage:Evaluate incoming DPI data size
#10:Check:vManage:NTP status across network
#11:Check:vManage:Validate Neo4j Store version
#12:Check:vManage:Validate ConfigDB Size is less than 5GB
#13:Check:vManage:Validate UUID from server configs file
#14:Check:vManage:Validate server configs file on vManage
#15:Check:vManage:Validate UUID at /etc/viptela/uuid
#16:Check:Controllers:Validate vSmart/vBond CPU count for scale
#17:Check:Controllers:Verify if stale entry of vManage+vSmart UUID present on any one cEdge

Checks with severity level: WARNING
#1:Check:vManage:Network Card type
#2:Check:vManage:Backup status
#3:Check:vManage:Evaluate Neo4j performance
#4:Check:vManage:Confirm there are no pending tasks
#5:Check:vManage:Validate there are no empty password users
#6:Check:Controllers:Controller versions
#7:Check:Controllers:Confirm Certificate Expiration Dates
#8:Check:Controllers:vEdge list sync
#9:Check:Controllers: Confirm control connections

Checks with severity level: INFORMATIONAL
#1:Check:vManage:Disk controller type
#2:Check:Controllers:Validate there is at minimum vBond, vSmart present
#3:Check:Controllers:Validate all controllers are reachable

Cluster Checks with severity level: CRITICAL
#1:Check:Cluster:Version consistency
#2:Check:Cluster:Cluster health
#3:Check:Cluster:Cluster ConfigDB topology
#4:Check:Cluster:Messaging server
#5:Check:Cluster:DR replication status
#6:Check:Cluster:Intercluster communication

sure's People

Contributors

Stargazers

Watchers

Forkers

tes3awy prosperousheart zhujie3734 amit-krg veeanprasad vmitrev mathiasz1 andrzejkrecijasz vsidhart-cisco karpulle

sure's Issues

Script fails ungracefully if configuration-db, statistics-db, or application-server are down

Another issue I noticed is that if configuration-db, statistics-db, or application-server are down, the script will fail ungracefully and will not even reach this section of the AURA script. It calls to question if this check is really valuable and/or if it should be performed first. Just wanted to share my findings here.

Example:

INFO:Executing the script in Normal execution mode
INFO:Generating a JSessionID
INFO:Generating CSRF Token
INFO:****Collecting Preliminary Data

ERROR:the JSON object must be str, not 'NoneType'
Traceback (most recent call last):
File "py3_sure.py", line 1922, in
controllers = json.loads(getRequestpy3(version_tuple, vmanage_lo_ip, jsessionid , 'system/device/controllers', args.vmanage_port, tokenid))
File "/usr/lib/python3.5/json/init.py", line 312, in loads
s.class.name))
TypeError: the JSON object must be str, not 'NoneType'

Tabulate report file instead of sequential data

JIRA ticket: https://jira-eng-sjc4.cisco.com/jira/browse/SDWANAURA-54

Change the definition names from check numbers to check specific.

Error in Check 26, system-ip'

INFO:#26:Check:Cluster:Cluster health
ERROR:'system-ip'
Traceback (most recent call last):
File "sure.py", line 4329, in
services_down, check_result, check_analysis, check_action = criticalCheckthirteen(cluster_health_data)
File "sure.py", line 932, in criticalCheckthirteen
(device['configJson'].pop('system-ip'))

KeyError: 'version' when collecting ControllersInfo - Preliminary Data

Hi,
I have an important customer who is in preparation phase for SW version upgrades and router OS conversions from Viptela OS to IOS-XE. He has an on-prem environment and I recommended him to run the SURE tool so that we can have a better diagnosis of the network, but unfortunately there has been an error.

Here is the script output:

ITFEBSBSC1VM01:~$ python sure.py -u admin
vManage Password:
#########################################################

AURA SDWAN (SURE) - Version 2.0.0

#########################################################

Performing SD-WAN Audit & Upgrade Readiness

#########################################################

ERROR: Error Collecting Preliminary Data.
Please check error details in log file: sdwan_sure/sure_logs_09_02_2023_12_44_00.log.
If needed, please reach out to tool support at: [email protected], with your report and log file.

And here is the log:

INFO:Executing the script in Normal execution mode
INFO:Generating a JSessionID
INFO:Generating CSRF Token
INFO:****Collecting Preliminary Data

ERROR:'version'
Traceback (most recent call last):
File "sure.py", line 2990, in
controllers_info = controllersInfo(controllers)
File "sure.py", line 355, in controllersInfo
controllers_info[count] = [(device['deviceType']),(device['deviceIP']),(device['version']) ,(device['reachability']),(device['globalState']),(device['timeRemainingForExpiration']), (device['state_vedgeList'])]
KeyError: 'version'

Validate current version - Misleading upgrade path info

reported as direct upgrade from 19.2.4 to 20.6.x. As per below link, it should be a step upgrade to 20.6.x.

cid:2576524502*[email protected]

Controller unreachable results in error for NTP status across network.

CHECK10_ERROR.txt

Error in check 11 IndexError: list index out of range

INFO:#11:Check:Controllers:Validate vSmart/vBond CPU count for scale
ERROR:list index out of range
Traceback (most recent call last):
File "sure.py", line 3880, in
total_cpu_count = int(output['data'][0]['total_cpu_count'])
IndexError: list index out of range

Keyerror exception on 'timeRemainingForExpiration'

INFO:****Collecting Preliminary Data

ERROR:'timeRemainingForExpiration'
Traceback (most recent call last):
File "sure.py", line 2499, in
controllers_info = controllersInfo(controllers)
File "sure.py", line 327, in controllersInfo
controllers_info[(device['host-name'])] = [(device['deviceType']),(device['deviceIP']),(device['version']) ,(device['reachability']),(device['globalState']),(device['timeRemainingForExpiration']), (device['state_vedgeList']) ]
KeyError: 'timeRemainingForExpiration'

Tool fails with ERROR:'version'

During initial data capture, the tools fails with:

ERROR:'version'
Traceback (most recent call last):
File "sure.py", line 9362, in
controllers_info = controllersInfo(controllers)
File "sure.py", line 327, in controllersInfo
controllers_info[(device['host-name'])] = [(device['deviceType']),(device['deviceIP']),(device['version']) ,(device['reachability']),(device['globalState']),(device['timeRemainingForExpiration']), (device['state_vedgeList']) ]
KeyError: 'version'

Trigger is a controller on transition state, without full information, possibly on partial install state
The fix will have two parts:

handle properly this condition and gracefully skip over those entries
add a check reporting how many entries like this have been found, with their IP addresses, so user can check on them

py3 Library Errors on 20.5

The ‘json’ library is not available in python2 on 20.5.x vManages and the tool recommends to try python3 instead. If you do this, the ‘Queue’ import fails. This can be fixed by replacing ‘import Queue’ with ‘import queue as Queue’.

Change SUCCESS to SUCCESSFUL throughout the script, for uniformity with NMS team

Tabulate the preliminary data

It would be good to put the results in a table format, like cpu, memory, interface , control etc.., against recommended and measured value.

Add support for DB size check

Implement check based on output from command in CSCvx69668
Report critical error if DB is >=5GB

Password with + - characters leads to error generating Jsession ID

Text Discrepancy in Critical Check Nineteen

Exception on check4, check_result reference

INFO:#4:Check:vManage:CPU Count
ERROR:local variable 'check_result' referenced before assignment
Traceback (most recent call last):
File "./sure.py", line 8528, in
check_result, check_analysis, check_action = criticalCheckfour(cpu_count, vedge_count, dpi_status, server_type)
File "./sure.py", line 568, in criticalCheckfour
return check_result, check_analysis, check_action
UnboundLocalError: local variable 'check_result' referenced before assignment

Incorrect handling of vSmart count during check 21

Report if Neo4j Neo4j store has not been upgraded

For 20.3, we should see v0.A.9. If the returned store version is older:

2021-08-21 18:22:21.909+0000 INFO [o.n.i.d.DiagnosticsManager] NeoStore v0.A.8

It may point to a previous upgrade failure.

This scenario will cause 20.6 upgrade failures:
org.neo4j.kernel.impl.storemigration.StoreUpgrader$UnexpectedUpgradingStoreVersionException: Not possible to upgrade a store with version 'v0.A.8' to current store version SF4.0.0 (Neo4j 4.1.7).

This needs manual recovery

Total Check count, and completed check counts do not match

for tool run, if check is bypassed or not done (could be not applicable), the total count does not show it... so tool output shows "32 checks", but then run results will show 30 passed, 0 failed

The count should match, and we should add a "bypassed" or "not applicable" check count

Preliminary Data: elasticSearch_data IndexError: list index out of range

CJF-AURA-SDWAN-error.txt

CSRF generation failure

Script does auth sucessfully , but fails token generation, with a JSON error
when tested manually it works

INFO:Generating a JSessionID
INFO:Generating CSRF Token
ERROR:No JSON object could be decoded
Traceback (most recent call last):
File "./sure.py", line 2471, in
tokenid = CSRFToken(vmanage_lo_ip,jsessionid,args.vmanage_port)
File "./sure.py", line 224, in CSRFToken
tokenid = json.loads(tokenid)
File "/usr/lib/python2.7/json/init.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

py2 script error for 20.5.x versions

The documentation in the GitHub README says to use py2_sure.py for vManages running version 20.5.x, but line 3316 prevents all checks on the 20.5.x versions.

New check: Verify stats-db indices version is 7.x

Enhance the script for version above 20.6

Error in check 5 TypeError: the JSON object must be str, not 'NoneType'

INFO:#5:Check:vManage:ElasticSearch Indices status
ERROR:the JSON object must be str, not 'NoneType'
Traceback (most recent call last):
File "sure.py", line 3696, in
es_indexes_one = json.loads(getRequestpy3(version_tuple,vmanage_lo_ip, jsessionid, 'management/elasticsearch/index/info', args.vmanage_port, tokenid))
File "/usr/lib/python3.5/json/init.py", line 312, in loads
s.class.name))
TypeError: the JSON object must be str, not 'NoneType'

Add support for PDF export when executed in container

Enhancement request: when the tool is executed in container, for example in vDoctor, add support for dynamic importing of PDF writer library, and generate report in PDF instead of text

Validate NMS all services are up: Reporting Successful even though it fails

I am documenting on TechZone how to handle some of the failures we notice from the AURA script. When trying to recreate some of the failed states I noticed an issue with the services check, criticalCheckseven.

I killed the service "data collection agent" in my lab but kept it enabled to trigger the failed state.

vmanage_20_6_4# request nms all status
NMS service proxy
Enabled: true
Status: running PID:8149 for 944358s
NMS service proxy rate limit
Enabled: true
Status: running PID:9810 for 944358s
NMS application server
Enabled: true
Status: running PID:1530 for 64879s
NMS configuration database
Enabled: true
Status: running PID:26175 for 65364s
NMS coordination server
Enabled: true
Status: running PID:3585 for 65058s
NMS messaging server
Enabled: true
Status: running PID:10765 for 944395s
NMS statistics database
Enabled: true
Status: running PID:24440 for 65440s
NMS data collection agent
Enabled: true
Status: not running <<<<<<<<Induced failure state here.
NMS CloudAgent v2
Enabled: true
Status: running PID:10415 for 944396s
NMS cloud agent
Enabled: true
Status: running PID:7627 for 944354s
NMS SDAVC server
Enabled: false
Status: not running
NMS SDAVC proxy
Enabled: true
Status: running PID:7630 for 944354s

But the AURA report shows that this check is SUCCESSFUL.

            INFO:#07:Check:vManage:Validate all services are up
            INFO:#07: Check result:   SUCCESSFUL                                                                                                    <<<<<<False SUCCESS
            INFO:#07: Check Analysis: All enabled services are running
            INFO:#07: Status of all the services:
            NMS service proxy
                                            Enabled: true
                                            Status: running PID:8149 for 879846s
            NMS service proxy rate limit
                                            Enabled: true
                                            Status: running PID:9810 for 879846s
            NMS application server
                                            Enabled: true
                                            Status:  running PID:1530 for 368s
            NMS configuration database
                                            Enabled: true
                                            Status: running PID:26175 for 852s
            NMS coordination server
                                            Enabled: true
                                            Status: running PID:3585 for 546s
            NMS messaging server
                                            Enabled: true
                                            Status: running PID:10765 for 879883s
            NMS statistics database
                                            Enabled: true
                                            Status: running PID:24440 for 928s
            NMS data collection agent
                                            Enabled: true
                                            Status:  not running                                                                                         <<<<<<<<<<<<<<<<<<Should be considered failed for this check
            NMS CloudAgent v2
                                            Enabled: true
                                            Status: running PID:10415 for 879884s
            NMS cloud agent
                                            Enabled: true
                                            Status:  running PID:7627 for 879842s
            NMS SDAVC server
                                            Enabled: false
                                            Status: not running
            NMS SDAVC proxy
                                            Enabled: true
                                            Status:  running PID:7630 for 879842s

I put this in my lab and identified the issue is this. In the function criticalCheckseven we are overwriting the check_result = 'Failed' on the next successful check. Although we build a list of failed services we don't use that.

            def criticalCheckseven(nms_status1):
nms_status = nms_status1.split('NMS')
nms_failed = []
for nms in nms_status:
    print(nms)
    if 'true' in nms and 'not running' in nms:
        nms_failed.append(nms.split('\t')[0].strip())
        check_result = 'Failed'
        check_analysis = 'Enabled service/s not running'
        check_action = 'It is advisable to investigate why a service is being reported as failed. Please  restart the process or contact TAC for further help'
    else:
        check_result = 'SUCCESSFUL'
        check_analysis = 'All enabled services are running'
        check_action = None
return nms_status1, nms_failed, check_result, check_analysis, check_action

When I pulled this function to test in my lab, I could see the following results. We have the correct service failed in the list of nms_failed but since we overwrite the check_result with SUCCESSFUL on the next service check, we report SUCCESSFUL on the script.

            nms_failed - ['data collection agent']
            check_result - SUCCESSFUL
            check_analysis - All enabled services are running
            check_action - None

In the below function where we call criticalCheckseven, we only use check_result to determine success or failure. It may be better to check the length of our nms_failed list.

            #Check:vManage:Validate all services are up
            check_count += 1
            check_count_zfill = zfill_converter(check_count)
            if args.quiet == False and args.debug == False and args.verbose == False:
                                            print(' Critical Check:#{}'.format(check_count_zfill))
            if args.debug == True or args.verbose == True:
                            print(' #{}:Checking:vManage:Validate all services are up'.format(check_count_zfill))
            check_name = '#{}:Check:vManage:Validate all services are up'.format(check_count_zfill)
            pre_check(log_file_logger, check_name)
            try:
                            nms_data, nms_failed, check_result, check_analysis, check_action =   criticalCheckseven()
                            if check_result == 'Failed':
                                            critical_checks[check_name] = [check_analysis, check_action]
                                            check_error_logger(log_file_logger, check_result, check_analysis, check_count_zfill)
                                            log_file_logger.error('#{}: List of services that are enabled but not running:\n{}'.format(check_count_zfill, nms_failed))
                                            log_file_logger.error('#{}: Status of all services  :\n{}\n'.format(check_count_zfill, nms_data))
                                            report_data.append([str(check_count),check_name.split(':')[-1],check_result,check_analysis,str(check_action)])
                                            if args.debug == True:
                                                            print('\033[1;31m ERROR: {} \033[0;0m \n\n'.format(check_analysis))

                            else:
                                            check_info_logger(log_file_logger, check_result, check_analysis, check_count_zfill)
                                            log_file_logger.info('#{}: Status of all the services:\n{}\n'.format(check_count_zfill, nms_data))
                                            report_data.append([str(check_count),check_name.split(':')[-1],check_result,check_analysis,str(check_action)])
                                            if args.debug == True:
                                                            print(' INFO:{}\n\n'.format(check_analysis))
                            json_final_result['json_data_pdf']['description']['vManage'].append({'analysis type': '{}'.format(check_name.split(':')[-1]),
                                                                                                                                                                                                                            'log type': '{}'.format(result_log['Critical'][check_result]),
                                                                                                                                                                                                                            'result': '{}'.format(check_analysis),
                                                                                                                                                                                                                            'action': '{}'.format(check_action),
                                                                                                                                                                                                                            'status': '{}'.format(check_result),
                                                                                                                                                                                                                            'document': ''})
            except Exception as e:
                            print('\033[1;31m ERROR: Error performing {}. \n Please check error details in log file: {}.\n If needed, please reach out to tool support at: [[email protected]](mailto:[email protected]), with your report and log file. \033[0;0m  \n\n'.format(check_name, log_file_path))
                            log_file_logger.exception('{}\n'.format(e))

Add wget way to dowload the script

Could we add as well an another way to copy the file?

Eg picking us

wget https://raw.githubusercontent.com/CiscoDevNet/sure/main/python3/py3_sure.py

wget https://raw.githubusercontent.com/CiscoDevNet/sure/main/python2/py2_sure.py

checkUtilization failing where wildfly. neo4j and elasticache not in the top 5 processes.

Preliminary data not getting printed in the report file

Add warning incase DB Slicing is required.

in neo4j we need to check the configdb size , and put a warning incase slicing of DB is required.

Moving the checks critical and failing the testcase

Following up with Manikandan to get more clarity on this issue.

Error in check 30, isAlive missing

INFO:#30:Check:Cluster:Intercluster communication
ERROR:'function' object has no attribute 'isAlive'
Traceback (most recent call last):
File "sure.py", line 4436, in
if criticalCheckseventeenpy3.isAlive():
AttributeError: 'function' object has no attribute 'isAlive'
INFO:Logging out of the Session
INFO:Successfully closed the connection

#01:CPU Speed - Permit lower speed for Azure Infrastructure

Problem Description:
• customer is planning to upgrade the infra to version 20.9.x
• as recommended by the BCS team ( Cisco Business critical service team ) , customer requested to execute "Aura Script”.
• Aura script identified the "CPU Clock configured is 2.6 instead of 2.8"
• CloudOps upgraded the infra for customer from 16 CPU's to 32 CPU's , but clock remain the same.
• customer want to confirm if they can move forward with CPU clock 2.6 or otherwise CloudOps required to upgrade the infra to VM type with clock 2.8.

DSV hosted on azure
that makes sense usually we can find out type of hypervisor its hosted in kernl logs

In Kern.log while bootup it will show Microsoft
or get a linux command to identify if Azure infrastructure. (Get source command)

error in check 9 :the JSON object must be str, not 'NoneType'

ERROR:the JSON object must be str, not 'NoneType'
Traceback (most recent call last):
File "sure.py", line 3821, in
es_indices_est = json.loads(getRequestpy3(version_tuple,vmanage_lo_ip, jsessionid, 'management/elasticsearch/index/size/estimate', args.vmanage_port, tokenid))
File "/usr/lib/python3.5/json/init.py", line 312, in loads
s.class.name))

validateServerConfigsFile Failing

might have to update the script because it is not using the data sent by

def criticalChecktwentyone(version):
success, analysis, action = validateServerConfigsFile()
if not success:
check_result = 'Failed'
check_analysis = 'Failed to validate the server_configs.json.'
check_action = '{}'.format(analysis)
else:
check_result = 'SUCCESSFUL'
check_analysis = 'Validated the server_configs.json.'
check_action = None
log_file_logger.info('Validated the server_configs.json.')

return check_result, check_analysis, check_action

Exception if state_vedgeList is empty for MT with no edge devices

mt-vmanage1:~/sdwan_sure$ cat sure_report_22_01_2022_08_15_31.txt
Cisco SDWAN AURA v1.0.8 Report

Cisco SDWAN AURA command line tool performed a total of 32 checks at different levels of the SDWAN overlay.

Reach out to [email protected] if you have any questions or feedback

Summary of the Results:

mt-vmanage1:~/sdwan_sure$ cat sure_logs_22_01_2022_08_15_31.log
INFO:Executing the script in Normal execution mode
INFO:Generating a JSessionID
INFO:Generating CSRF Token
INFO:****Collecting Preliminary Data

ERROR:'state_vedgeList'
Traceback (most recent call last):
File "./sure.py", line 2499, in
controllers_info = controllersInfo(controllers)
File "./sure.py", line 327, in controllersInfo
controllers_info[(device['host-name'])] = [(device['deviceType']),(device['deviceIP']),(device['version']) ,(device['reachability']),(device['globalState']),(device['timeRemainingForExpiration']), (device['state_vedgeList']) ]
KeyError: 'state_vedgeList'
mt-vmanage1:~/sdwan_sure$

Add json result support

To allow GUI integration with vDoctor, add suport for JSON export format

False positive on NEO4j output file

Tool is generating alert if WARN or ERROR messages are detected. This is generating unnecessary alerts, and WARN level should be ignored
Also, check is not validating how old is the message. it should ignore anything older than 14 days

Split the script into Python2 and Python3 versions

In order to shorten the length of the script, the script needs to be split into two versions Python2 and Python3.

Exception error if script is executed in multi-tenant server

vedge count is never initialized as json parsing fails on null returned by API call

Need to sanitize user input & perform error handling

Hi all,

I was struggling a little with getting this script to work. I had the consistently the same error in the logs:

INFO:Executing the script in Verbose execution mode
INFO:Generating a JSessionID
ERROR:list index out of range
Traceback (most recent call last):
  File "sure.py", line 8354, in <module>
    jsessionid = generateSessionID(vmanage_lo_ip, args.username, password, args.vmanage_port)
  File "sure.py", line 211, in generateSessionID
    jsessionid = (login[3].split('=')[1][0:-1])
IndexError: list index out of range

The problem turns out to be in the password, because the user input was not sanitized. I had several special characters in the password, including a quote sign etc.
Perform error handling would've dealt with this issue in a better way. More importantly, sanitizing user input would prevent the script from breaking in the first place.

split the result with total number of checks with WARNING, with ERROR with INFO etc..

holistic view if there is any major error found by the script or not.

Check Results:
Total Checks Passed: 26
Total Checks Failed: 4

Installation instructions need updating

The installation instructions mention installation via Git but Git is not available via vShell. The installation instruction should also mention what directory to install to.

Monitor CPU for wildfly and statsdb

We should monitor the memory and cpu per nms service as well like how we do for the neo4j, also wildfly, statsdb

Error in check 22 UnboundLocalError: local variable 'check_result' referenced before assignmen

INFO:#22:Check:vManage:Disk controller type
ERROR:local variable 'check_result' referenced before assignment
Traceback (most recent call last):
File "sure.py", line 4208, in
check_result, check_analysis, check_action = infoCheckone(server_type, disk_controller)
File "sure.py", line 1387, in infoCheckone
return check_result, check_analysis, check_action
UnboundLocalError: local variable 'check_result' referenced before assignment

Move Execution mode conditions within the check to shorten the script

Reduces the lines of code from 15k to 6k.

Authentication error if password contains special characters

Tool may report JSESSION error, when password contains the sequence $$ or a +
Logs show:

Traceback (most recent call last):
File "sure.py", line 2460, in <module>
jsessionid = generateSessionID(vmanage_lo_ip, args.username, password, args.vmanage_port)
File "sure.py", line 210, in generateSessionID
jsessionid = (login[3].split('=')[1][0:-1])
IndexError: list index out of range

Return the roundtrip delay for intercluster comm.

Intercluster communication we may need to tell the Roundtrip delay