helenguohx / logbert Goto Github PK

log anomaly detection via BERT

License: MIT License

Python 88.33% Shell 0.55% Jupyter Notebook 11.12%

logbert's Introduction

LogBERT: Log Anomaly Detection via BERT

ARXIV

This repository provides the implementation of Logbert for log anomaly detection. The process includes downloading raw data online, parsing logs into structured data, creating log sequences and finally modeling.

Configuration

Ubuntu 20.04
NVIDIA driver 460.73.01
CUDA 11.2
Python 3.8
PyTorch 1.9.0

Installation

This code requires the packages listed in requirements.txt. An virtual environment is recommended to run this code

On macOS and Linux:

python3 -m pip install --user virtualenv
python3 -m venv env
source env/bin/activate
pip install -r ./environment/requirements.txt
deactivate

Reference: https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/

An alternative is to create a conda environment:

    conda create -f ./environment/environment.yml
    conda activate logbert

Reference: https://docs.conda.io/en/latest/miniconda.html

Experiment

Logbert and other baseline models are implemented on HDFS, BGL, and thunderbird datasets

HDFS example

cd HDFS

sh init.sh

# process data
python data_process.py

#run logbert
python logbert.py vocab
python logbert.py train
python logbert.py predict

#run deeplog
python deeplog.py vocab
# set options["vocab_size"] = <vocab output> above
python deeplog.py train
python deeplog.py predict 

#run loganomaly
python loganomaly.py vocab
# set options["vocab_size"] = <vocab output> above
python loganomaly.py train
python loganomaly.py predict

#run baselines

baselines.ipynb

Folders created during execution

~/.dataset //Stores original datasets after downloading
project/output //Stores intermediate files and final results during execution

logbert's People

Contributors

Stargazers

Watchers

Forkers

dolcelatte changnaman paulinko liujie40 zaihanlit luisfredgs forallx94 ajaysi tkim135 andrew2019github trabenx mrleaper jad837 rmadamson jadsalloum chinahappyking jesama leima0324 zcq2333 gswwan123-del banotelli siennaaaa ljiljanadolamic y1cx just-in-chang abhilashuk stjohnb shilpamav harshits158 reshmiphilipver zk1056309462 crashingrocket mahmoodalmansooei benpaobamingliang1 98zy98 sayantan-ch pwnsmith tinghanjxl djeffkanda sushant-97 jplasser tpelc daanh99 jamontol winniexueyao superzerot eunsooko-bot amitabhaghosh username-yao syj716 csingh59 nichong255 charnsingh59 thedotproduct harshcae pragatibs jss-on tangpan360 zky001 hameln-zhao akomarla jackkrasmus-vorrath guodunwang mohitgurnani muralidharb yogesh-7523 shawn-jung mastersatish haksoat hbgongen waghi njuelephant2021 sahebjade masubi tanmayarya29 dmankins pillipop rafsalas19 kahsay wr0om opendpc jiawozhong copser tewe2721 rajibdas-123 githarb toduluz voodoo1231 cuiwenhao123 andrewli315 w19z

logbert's Issues

How to generate vocab file?

HI,

To perform main_run.py, it needs to vocab file and I found vocab.py to generate it.
But vocab.py requests corpus. What corpus should be used for generating vocab file? log files? or normal text files?
Note that I want to use your logBert for my custom log files.

Regards,
ChangNam

HDFS deeplog Train stage is fast but test stage is slow

HDFS Train stage is fast ( about 30 minites ),but test stage is slow ( about 10 hours ) , did anyone meet the same problem ?

Why was the WordVocab generated using only the training set data？

If the words in the test set are not recorded in the VOCAB, then during testing, they will all be unk_index?

您好，您文章里的很多baseline都故意拉低不知道是为何

Seems cannot be applied to detection scenarios where normal logs are mixed with abnormal logs

You're testing for either a pure normal log sequence or a pure abnormal log sequence, so what if the two are mixed?

How can I compute the anomalous score?

I want to deploy LogBERT but there is no related code.

Below is an explanation of it on the paper.

After training, we can deploy LogBERT for anomalous log sequence detection.
The idea of applying LogBERT for log anomaly detection is that since LogBERT
is trained on normal log sequences, it can achieve high prediction accuracy on
predicting the masked log keys if a testing log sequence is normal. Hence, we
can derive the anomalous score of a log sequence based on the prediction results
on the MASK tokens.

Could you guide me on how to compute the anomalous score?

您好,打扰一下

想询问一下您的论文中HDFS数据集得到的最佳结果训练了多少轮。

cannot reproduce results

Hi, I tried to run your latest version code and here is the result I got on HDFS dataset:

best threshold: 0, best threshold ratio: 0.2
TP: 2999, TN: 546622, FP: 6746, FN: 7648
Precision: 30.77%, Recall: 28.17%, F1-measure: 29.41%
elapsed_time: 919.4318611621857

It is far away from the reported score in the paper. Can you provide a version that can at least be close to the reported result?

sh run_logbert_bgl.sh doesn't work

Error:

Save options parameters
Traceback (most recent call last):
File "../main_logbert.py", line 102, in
main()
File "../main_logbert.py", line 89, in main
with open(options["train_vocab"], "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/amanjanagowd/.output/bgl_2k/train'

请教问题

你好，

在对测试数据进行预测的时候，我如何才能直接输出预测的日志？

感谢

Come up with data augmentation

I added the code snippet below for this at split_train_test().
To get more flexibility, I think 2 can change as a variable.

    if data_aug:
        print('Started Data augmentation...')
        window_df_aug = window.generate_sequence(df[["timestamp", "Label", "EventId", "deltaT"]],
                                            window_size=float(window_size) * 60 * 2,
                                            step_size=float(step_size) * 60 * 2
                                            )
        window_df = pd.concat([window_df, window_df_aug]

Test Size not properly set

`python logbert.py train` produces this error

Output:

device cpu
features logkey:True time: False

mask ratio 0.65
arguments Namespace(mode='train')
Save options parameters
Loading vocab output/hdfs/vocab.pkl
vocab Size:  17

Loading Train Dataset
before filtering short session
train size  1918
valid size  213
========================================
100%|██████████| 2131/2131 [00:00<00:00, 564307.21it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-4-4393a2a735e9>](https://localhost:8080/#) in <module>
----> 1 LogBert("train")

4 frames
[<ipython-input-2-698dc9901810>](https://localhost:8080/#) in LogBert(cmd_mode)
     87 
     88   if args.mode == 'train':
---> 89       Trainer(options).train()
     90 
     91   elif args.mode == 'predict':

[/content/bert_pytorch/train_log.py](https://localhost:8080/#) in train(self)
     60 
     61         print("\nLoading Train Dataset")
---> 62         logkey_train, logkey_valid, time_train, time_valid = generate_train_valid(self.output_path + "train", window_size=self.window_size,
     63                                      adaptive_window=self.adaptive_window,
     64                                      valid_size=self.valid_ratio,

[/content/bert_pytorch/dataset/sample.py](https://localhost:8080/#) in generate_train_valid(data_path, window_size, adaptive_window, sample_ratio, valid_size, output_path, scale, scale_path, seq_len, min_len)
     91     time_seq_pairs = np.array(time_seq_pairs)
     92 
---> 93     logkey_trainset, logkey_validset, time_trainset, time_validset = train_test_split(logkey_seq_pairs,
     94                                                                                       time_seq_pairs,
     95                                                                                       test_size=test_size,

[/usr/local/lib/python3.8/dist-packages/sklearn/model_selection/_split.py](https://localhost:8080/#) in train_test_split(test_size, train_size, random_state, shuffle, stratify, *arrays)
   2422 
   2423     n_samples = _num_samples(arrays[0])
-> 2424     n_train, n_test = _validate_shuffle_split(
   2425         n_samples, test_size, train_size, default_test_size=0.25
   2426     )

[/usr/local/lib/python3.8/dist-packages/sklearn/model_selection/_split.py](https://localhost:8080/#) in _validate_shuffle_split(n_samples, test_size, train_size, default_test_size)
   2045         and (test_size <= 0 or test_size >= 1)
   2046     ):
-> 2047         raise ValueError(
   2048             "test_size={0} should be either positive and smaller"
   2049             " than the number of samples {1} or a float in the "

ValueError: test_size=213 should be either positive and smaller than the number of samples 0 or a float in the (0, 1) range

and after investigating a bit into the packages, there seems to a problem with assigning the test_size in sample.py in bert_pytorch/dataset/...

test size parameter needs to be given as float value between 0 and 1, but this assigns the no.of samples in test size itself as argument to train_test_split()

part of code that causes this prob (marked with ## symbol nearby)

def generate_train_valid(data_path, window_size=20, adaptive_window=True,
                         sample_ratio=1, valid_size=0.1, output_path=None,
                         scale=None, scale_path=None, seq_len=None, min_len=0):
    with open(data_path, 'r') as f:
        data_iter = f.readlines()

    num_session = int(len(data_iter) * sample_ratio)
    # only even number of samples, or drop_last=True in DataLoader API
    # coz in parallel computing in CUDA, odd number of samples reports issue when merging the result
    # num_session += num_session % 2

    test_size = int(min(num_session, len(data_iter)) * valid_size)
    # only even number of samples
    # test_size += test_size % 2

    print("before filtering short session")
    print("train size ", int(num_session - test_size))
    print("valid size ", int(test_size)) ## notice here it is no.of samples, an int value obviously greater than 1
    print("="*40)

    logkey_seq_pairs = []
    time_seq_pairs = []
    session = 0
    for line in tqdm(data_iter):
        if session >= num_session:
            break
        session += 1

        logkeys, times = fixed_window(line, window_size, adaptive_window, seq_len, min_len)
        logkey_seq_pairs += logkeys
        time_seq_pairs += times

    logkey_seq_pairs = np.array(logkey_seq_pairs)
    time_seq_pairs = np.array(time_seq_pairs)

    logkey_trainset, logkey_validset, time_trainset, time_validset = train_test_split(logkey_seq_pairs,
                                                                                      time_seq_pairs,
                                                                                      test_size=test_size, ## here it is passed as argument as it is with changing to be in range(0,1)
                                                                                      random_state=1234)

I think this should solve the problem (i don't know the exact work of the package, but I think this this could be a minor fix and just wanted to make sure its correct...):

def generate_train_valid(data_path, window_size=20, adaptive_window=True,
                         sample_ratio=1, valid_size=0.1, output_path=None,
                         scale=None, scale_path=None, seq_len=None, min_len=0):
    with open(data_path, 'r') as f:
        data_iter = f.readlines()
  
    num_session = int(len(data_iter) * sample_ratio)
    # only even number of samples, or drop_last=True in DataLoader API
    # coz in parallel computing in CUDA, odd number of samples reports issue when merging the result
    # num_session += num_session % 2

    test_size = int(min(num_session, len(data_iter)) * valid_size)
    # only even number of samples
    # test_size += test_size % 2

    valid_size = round(test_size/num_session,3)
    # update split size

    print("before filtering short session")
    print("train size ", int(num_session - test_size))
    print("valid size ", int(test_size))
    print("="*40)

    logkey_seq_pairs = []
    time_seq_pairs = []
    session = 0
    for line in tqdm(data_iter):
        if session >= num_session:
            break
        session += 1

        logkeys, times = fixed_window(line, window_size, adaptive_window, seq_len, min_len)
        logkey_seq_pairs += logkeys
        time_seq_pairs += times

    logkey_seq_pairs = np.array(logkey_seq_pairs)
    time_seq_pairs = np.array(time_seq_pairs)

    logkey_trainset, logkey_validset, time_trainset, time_validset = train_test_split(logkey_seq_pairs,
                                                                                      time_seq_pairs,
                                                                                      test_size=valid_size,
                                                                                      random_state=1234)

关于log key中的<*>如何参与Bert的MLM训练

您好，logbert这篇论文让我获益匪浅。与LAnoBERT进行了一些对比，关于log key中的<>参与预训练任务我有以下疑惑：
1）在数据预处理阶段，train数据只包含日志Sequence中的log key ID（对应代码中的EventID），请问bert预训练过程中如何从vocab中关联token呢？
2）您论文中的input应该是log key，请问log key中的<>是否参与训练呢？若参与训练，请问随机mask（如果mask到<*>）之后，如何做loss呢？若不参与训练，请问在mask阶段应该如何处理呢？
期待您的答疑解惑，如若上述理解有偏差，请您见谅。

HDFS/python logbert train

Hello,when I run to this line of code：
logkey_seq_pairs = np.array(logkey_seq_pairs)
An error has occurred：
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (4855, 1) + inhomogeneous part.

How can I solve it?

Can we have a template function to be able to apply the model to our data

os.mknod error

When I run python deeplog.py train, I got an error of PermissionError: [Errno 1] Operation not permitted. I don't have superuser privilege for my laptop. Is there an alternative command I can use to replace os.mknod?

Here is the log trace:

Features logkey:True time: False
Device: cpu
arguments Namespace(mode='train')
Loading train dataset


scale_path: ../output/hdfs/deeplog/scale.pkl
Traceback (most recent call last):
  File "deeplog.py", line 117, in <module>
    train()
  File "deeplog.py", line 89, in train
    trainer = Trainer(Model, options)
  File "../logdeep/tools/train.py", line 68, in __init__
    os.mknod(scale_path)
PermissionError: [Errno 1] Operation not permitted

Results are far from the values reported in the paper

My results are far from the values reported in the paper. For example, after running the code on the BGL dataset I've gotten:

Precision: 88.93%, Recall: 79.86%, F1-measure: 84.15%

Running on HDFS, I've gotten:

Precision: 4.67%, Recall: 79.67%, F1-measure: 8.83%

What happening? Is there anything I missed?

preprocess() in Drain.py has issue

Hi,
The regular expression is set in process_bgl.sh as below.
REGEX1='(0x)[0-9a-fA-F]+'
REGEX2='\d+.\d+.\d+.\d+'
REGEX3='(/[-\w]+)+'
REGEX4='\d+

And they are passed to preprocess() in Drain.py with option.
--regex="$REGEX1 $REGEX2 $REGEX3 $REGEX4" \

regex value is pass to self.rex in Drain.py and self.rex list has only 1 size when I checked it.

def preprocess(self, line):
    for currentRex in self.rex:
        line = re.sub(currentRex, '<*>', line)
    return line

When I tried to test with preprocess() above, I got the result as 0below.
TP: 3, TN: 46, FP: 3, FN: 3
Precision: 50.00%, Recall: 50.00%, F1-measure: 50.00%
And below is the partial log in the structured log file and I can see the log that did not parse.

And I modified the preprocess() as below.
def preprocess(self, line):
rex_list = self.rex[0].split(' ')
for currentRex in rex_list:
line = re.sub(currentRex, '<*>', line)
return line

The test result is better more than before.
TP: 6, TN: 44, FP: 1, FN: 0
Precision: 85.71%, Recall: 100.00%, F1-measure: 92.31%

And log parsing is better, I think.
ciod: Message code 0 is not 51 or 4294967295,e872bbe9,ciod: Message code <> is not <> or <*>

What is your intention for preprocess()?
My modification is correct?

Question about logbert in logs Apache?

How would I go about using logbert for Apache logs?
192.168.0.14 - - [15/Sep/2021:07:28:39 -0400] "GET /media/plg_system_popup/js/jquery.js HTTP/1.1" 200 293755 "https://192.168.0.52/" "Mozilla /5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0"

No License File

I would like to use LogBERT for a project. Unfortunately, there seems to be no license mentioned, so I'm not sure how to go about it.

Can you please add a license file?

E.g., perhaps MIT like your dependency logalizer?

Linux log files prediction

I have downloaded data of linux from loghub directory and now I am bit confused how to run this code for linux data.
Can someone tell me details

Scripts not working

cd scripts

#download 2000 hdfs samples for testing and debugging
sh download_hdfs_2k.sh

#download hdfs dataset
sh download_hdfs.sh

#run logbert on HDFS
sh run_logbert_hdfs.sh

Getting issues in #run logbert on HDFS script

Traceback (most recent call last):
File "../main_logbert.py", line 4, in
from logbert.bert_pytorch import Predictor, Trainer
File "/home/amanjanagowd/testbed/LOGBERT/logbert/logbert/init.py", line 1, in
from .bert_pytorch import Trainer, Predictor
File "/home/amanjanagowd/testbed/LOGBERT/logbert/logbert/bert_pytorch/init.py", line 1, in
from .model import BERT
File "/home/amanjanagowd/testbed/LOGBERT/logbert/logbert/bert_pytorch/model/init.py", line 1, in
from .bert import BERT
File "/home/amanjanagowd/testbed/LOGBERT/logbert/logbert/bert_pytorch/model/bert.py", line 1, in
import nn as nn
ImportError: No module named nn

Can't find Requirement.txt

Can you point me towards where Requirement.txt is located?

LAnoBERT : System Log Anomaly Detection based on BERT Masked Language Model

LAnoBERT : System Log Anomaly Detection based on BERT Masked Language Model, almost the same method, why we gain different scores on the same dataset

关于logdeep中的semantics特征

作者你好，我在logdeep的代码实验中有看到关于semantics的特征构造相关代码（注释掉了没被使用），其中有个event2semantic_vec.json文件，请问这个文件我有办法获取吗，请问您是把event embedding成了一维向量吗？

About BGL parsing result

Hi, I'm also working on BGL dataset. Your new paper about outlier detection is interesting. I just read you it and find that you also use Drain to do log parsing.

Could you tell me how you deal with the template problem?
Raw BGL dataset after Drain log parsing (with re expression used in Drain demo) will have 1000+ templates. But the ground truth is around 400 as your paper mentioned.
Maybe you use some specific re expression before parsing?

Thanks a lot
Wenrui

您好，抱歉打扰~

您好，抱歉打扰，想问您下，我按照logbert里的hdfs跑的baseline，有些问题，您的baseline数据是什么格式的呀~方便分享下嘛 , [email protected]

test_size=68 should be either positive and smaller than the number of samples 34 or a float in the (0, 1) range

I run python logbert.py train
have error

logkey_trainset, logkey_validset, time_trainset, time_validset = train_test_split(logkey_seq_pairs,
......
ValueError: test_size=68 should be either positive and smaller than the number of samples 34 or a float in the (0, 1) range

The experimental data of the paper cannot be reproduced

hi, guo
I have tried many times. The following results are always the same, which is far from the results in the paper. Is there any difference between the results in the paper and the code?

Can you add a wechat private chat?

dataset: hdfs
git branch: main
==================== logbert ====================
best threshold: 0, best threshold ratio: 0.0
TP: 7602, TN: 549880, FP: 3488, FN: 3045
Precision: 68.55%, Recall: 71.40%, F1-measure: 69.95%

F1 measure is good for logbert BGL but very bad for logbert HDFS

Greetings.
F1 of logbert BGL is good (85.8%) but very bad for logbert HDFS (only 13.96%, far from 82.32% mentioned in paper). Is there anything I missed?

看结果很多像是代码都没跑通，如果不是真的跑完的，请不要占坑位！还有很多机构在做bert for log之类的工作

FileNotFoundError error when trying to execute using docker

The script run_logbert_hdfs.sh looks for

train_vocab: /root/.output/hdfs/train
vocab_path: /root/.output/hdfs/vocab.pkl
model_path: /root/.output/hdfs/logbert_mask_ratio_0.65_num_candidate_6/best_model.pth

These are not found.

root@7a564a5a1a6e:/workspace/logbert/scripts# sh run_logbert_hdfs.sh
tee: /root/.logs/hdfs_logbert_mask_ratio_0.65_num_candidate_6.log: No such file or directory
Save options parameters
Traceback (most recent call last):
File "../main_logbert.py", line 102, in
main()
File "../main_logbert.py", line 89, in main
with open(options["train_vocab"], "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/root/.output/hdfs/train'

Why did Baselines performance decline so much from the performance shown in their papers?

Congratulations on completing such a remarkable job. Your work has provided us with a lot of information.
With learning about your work, I encountered some problems like this. I ran the code you provided, and I read your paper. Deeplog and LogAnomaly perform much worse than they do in their papers. Is a new evaluation method adopted? And I noticed that in your implementation it uses the Drain method to extract the templates, whereas in Deeplog it uses Spell. Also, when running the data you use with DeepLog and the data it generates with Spell, f1-Score drops by 10%. However, this performance is still better than your baseline implementation.
I would like to ask you to explain the reason for this situation. Thank you.

RuntimeError: Function AddBackward0 returned an invalid gradient at index 1 - expected type torch.cuda.FloatTensor but got torch.FloatTensor

Hi, @HelenGuohx Could you please have a look at this problem?

○ → sh run_logbert_bgl_2k.sh
tee: bgl/logbert/bgl_2k_0.5_6.log: 没有那个文件或目录
Parsing file: /home/wm/.dataset/bgl/BGL_2k.log
Total size after encoding is 2000 2000
Parsing done. [Time taken: 0:00:00.151386]
Loading ../output/bgl/BGL_2k.log_structured.csv
there are 1880 instances (sliding windows) in this dataset

training size 684
test normal size 1028
test abnormal size 168
Save options parameters
Loading vocab ../output/bgl/vocab.pkl
vocab Size:  73

Loading Train Dataset
before filtering short session
train size  616
valid size  68
========================================
========================================
Num of train seqs 616
Num of valid seqs 68
========================================

Loading valid Dataset
Creating Dataloader
Building BERT model
Creating BERT Trainer
Total Parameters: 2147402
Training Start


start calculate center

100%|██████████| 19/19 [00:00<00:00, 126.17it/s]
100%|██████████| 2/2 [00:00<00:00, 31.87it/s]
Traceback (most recent call last):
  File "../main_run.py", line 175, in <module>
    main()
  File "../main_run.py", line 163, in main
    run_logbert(options)
  File "/home/wm/logbert/logbert/bert.py", line 15, in run_logbert
    Trainer(options).train()
  File "/home/wm/logbert/logbert/bert_pytorch/train_log.py", line 115, in train
    self.start_iteration(surfix_log="log")
  File "/home/wm/logbert/logbert/bert_pytorch/train_log.py", line 128, in start_iteration
    _, train_dist = self.trainer.train(epoch)
  File "/home/wm/logbert/logbert/bert_pytorch/trainer/pretrain.py", line 101, in train
    return self.iteration(epoch, self.train_data)
  File "/home/wm/logbert/logbert/bert_pytorch/trainer/pretrain.py", line 172, in iteration
    loss.backward()
  File "/home/wm/miniconda3/lib/python3.8/site-packages/torch/tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/wm/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 97, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Function AddBackward0 returned an invalid gradient at index 1 - expected type torch.cuda.FloatTensor but got torch.FloatTensor

请教BGL训练问题

在BGL数据集上训练的时候出现以下错误：
ValueError: test_size=68 should be either positive and smaller than the number of samples 34 or a float in the (0, 1) range

请问该如何解决？

cannot find hdfs_sequence.csv

when I excuse data_process.py, I got error.
FileNotFoundError: [Errno 2] No such file or directory: '../output/hdfs/hdfs_sequence.csv'.

looking forward to your reply!

Spelling error on requirements.txt

scikt-learn==1.1.0 should be scikit-learn==1.1.0

best_center.pt not found

I trained model but when predict it ask this file which is not available after training torch.load(self.model_dir + "best_center.pt")
Where to find this file which contains distance and radius?

indexing particular sequence in parsed structured log file

Is there any way to index (mark) anomaly log sequence back into parsed file usually named log_structured from drain?
I tried to use following workflow for LineId from structured file along sequences to keep track of each logkey location in parsed structured file:

generate_test
fixed_window
LogDataset

Inside LogDateset > random_item I could not pertain the sequence of LineId and sequence of logKey because of random function.
could you please suggest another way to mark anomalies back into parsed structured file to visualize actual logline contents!

Pip install for logbert

A few folks mentioned that pip install logbert is the way to install logbert. I was not successful. Is there a different name, or is not posible?

LogParser - drain parser is old version

this is the latest drain parser source code: https://github.com/logpai/logparser/blob/master/logparser/Drain/Drain.py
and there seems to a problem with the get_parameter_list() in line 339, which was given as 'template_regex = re.sub(r' +', r'\s+', template_regex)' which is been causes the result's parameters to be empty list, also the original source code is not yet committed with the correct sequence code, this should do it.
'''
def get_parameter_list(self, row):
template_regex = re.sub(r"<.{1,5}>", "<>", row["EventTemplate"])
if "<>" not in template_regex: return []
template_regex = re.sub(r'([^A-Za-z0-9])', r'\\1', template_regex)
template_regex = re.sub(r'\ +', r'\s+', template_regex) # changed
template_regex = "^" + template_regex.replace("<*>", "(.*?)") + "$"
parameter_list = re.findall(template_regex, row["Content"])
parameter_list = parameter_list[0] if parameter_list else ()
parameter_list = list(parameter_list) if isinstance(parameter_list, tuple) else [parameter_list]
return parameter_list
'''