Giter VIP home page Giter VIP logo

webex-message-space-archiver's Introduction

Webex Space Archiver

NEW: Important Release v30! (check out new features in the release notes)

Features

Start

Configure

Release notes

Troubleshooting

Feedback & Support

published

Archive Cisco Webex space messages to a single HTML file. NOTE: This code is written for a customer as an example. I specifically wanted 1 (one) .py file that did everything. It's not beautiful code but it works :-) Feedback? Please go here and let me know what you think!

VIDEO

How to use & Demo

SCREENSHOT

Example HTML file of an archived Webex space:

                    

REQUIREMENTS

  • A (free) Webex account
  • Python 3.9 or higher (not tested with 3.6)
  • Python 'requests' library
  • Be a member of the Webex message space you want to archive
  • Mac: SSL fix (see troubleshooting section at the end)
  • Archives all messages in a space
  • Find space ID with built in search function
  • Batch archiving with multiple config files & commandline parameters NEW
  • Deal with threaded messages
  • Support for automatic and manual DST configuration ('summertime') NEW
  • Download images, files or both (with msg file date)
  • All files are organized: \spacenamefolder with subfolders for \files, \images, \avatars
  • Export space data to JSON and/or TXT file
  • Restrict messages by number of messages, number of days, from- date or from-to date
  • Display: messages grouped per month, with navigation at the top
  • Display: show full user names
  • Display: show (linked or downloaded) user avatars
  • Display: attached file-names + size
  • Display: "@mentions" in a different color
  • Display: quoted or formatted text
  • Display: external users in different color (users with other domain)
  • Display: images in popup when clicked
  • Support for blurring email addresses and names NEW
  • Print: just like it appears on the screen

It doesn't:

  • Clean your dishes
  • Download whiteboards (unless you post a snapshot)
  • Download/display files shared in external Enterprise Content Management systems (Onedrive/Sharepoint)
  • Display reactions to messages (not accessible via API)
  • Mow your neighbours lawn (I've tried)
  • Render cards

NOTE:

  • The message TIME displayed is in the UTC timezone. The timezone on your device defines how this UTC time/date is displayed. A message send at 12:43 CEST is stored as 10:43 UTC. When you change your timezone to PDT (UTC-7) it will be displayed as 03:43.
  • When printing the generated HTML file in Firefox: File, Print, check "print background colors and images", then print or save to PDF
  • To store your Webex token in an environment variable:
    • Windows: set WEBEX_ARCHIVE_TOKEN=YOUR_TOKEN_HERE
    • Mac: export WEBEX_ARCHIVE_TOKEN='YOUR_TOKEN_HERE'
  1. Meet the requirements

  2. Run the script (python webex-space-archive.py) to create the configuration file "webexspacearchive-config.ini" (if it does not exist)

  3. In the webexspacearchive-config.ini file, save your developer token or (👍better!) create an environment variable called "WEBEX_ARCHIVE_TOKEN" with your token

  4. Run the script: python webex-space-archive.py

parameter
nothing use standard configuration .ini file
CONFIG_FILE use non-standard configuration .ini file
testspace.ini
SEARCH_STRING search for space name to get the space ID
ciscolive
SPACE_ID use this SPACE_ID with standard configuration .ini file
Y2lzY29zcGFyazovL3VzL0lfS05FVy95b3Vfd291bGRfdHJ5X2hhaGE
CONFIG_FILE SPACE_ID use non-standard configuration .ini file and provided SPACE_ID
a combination of examples above
SPACE_ID CONFIG_FILE use non-standard configuration .ini file and provided SPACE_ID

UPGRADE? Replace the .py file and keep the configuration file (.ini). To get changes in the .ini file, run the script once without .ini file and it will create one for you with the latest remarks and features.

Edit the following variables in the python file:


Personal Token: you can find this on developer.webex.com, login (top right of the page) and then scroll down to "Your Personal Access Token". NOTE see the 'NOTE' section above to see how you can also use an environment variable to store your token!

mytoken = "YOUR_TOKEN_HERE"

NOTE: This token is valid for 12 hours! Then you have to get a new Personal Access Token.


Space ID: To find this, first save your developer token in the .ini file. Then run the script with a search arguments as a parameter. It will list all spaces+spaceId that match you search argument. Alternatively: go to Webex Developer List rooms, make sure you're logged in, set the 'max' parameter to '900' and click Run. If you don't see the RUN button, make sure 'test mode' is turned on (top of page, under "Documentation") TIP: to get the space ID of a space that you are in, in the client go to help / copy space details. Then in Webex talk to the bot "[email protected]" and paste the space details. In return you get the space ID to be used here

myspaceid = "YOUR_SPACE_ID_HERE"


Downloadfiles: do you want to download images or images & files? Think about it. Downloading images and files can significantly increase the archive time and consume disk space. Downloaded images or files are stored in the subfolder. Options:

downloadfiles = info

  • "no" : (default) no downloads, only show the text "file attachment"
  • "info" : no downloads, only show the filename and size
  • "images" : download images only
  • "files" : download files and images

UserAvatar: Do you want to show the user avatar or an icon? Avatars are not downloaded but linked. That means the script will get the user Avatar URL and use that in the HTML file. So the images are not downloaded to your hard-drive. Needs an internet connection in order to display the Avatar images.

useravatar = link

  • "no" : (default) show user initials
  • "link" : link to avatar images (needs internet connection!)
  • "download" : download avatar images

Max Messages: Restrict the number of messages that are archived. Some spaces contain 100,000 messages and you may not want to archive all of them. To archive the last 5000 messages:

maxtotalmessages = 5000

  • (empty) : (default) last 1,000 messages
  • 4000 : (example) download the last 4,000 messages
  • 60d : (example) download messages from the last 60 days. 120d = 120 days
  • 22052021- : (example) download messages after May 22nd 2021 (ddmmyyyy-) *****
  • 22052021-18082021: download messages between May 22nd and August 18 2021 (ddmmyyyy-ddmmyyyy) *****
  • = date format configurable in the .py code, variable 'maxmsg_format'

OutputFilename: Enter the file name of the output HTML file. If EMPTY the filename will be the same as the Archived Space name (recommended).

outputfilename = yourfilename.html


Sorting: of archived messages.

sortoldnew = yes

  • "yes" : (default) last message at the bottom (like in the Webex client)
  • "no" : latest message at the top

OutputJSON: Besides the .html file, how would you like to store your messages?

outputjson = no

  • "no" :(default) only generate .html file
  • "yes/both" :output message data as .json and .txt file
  • "json" :output message data as .json file
  • "txt" :output message data as .txt file

DST: Besides the .html file, how would you like to store your messages? Both EU and US examples are shown in the .ini file.

dst_start = L,7,3 (last Sunday of March)

dst_stop = L,7,10 (last Sunday of October)

  • empty :(default) using DST data from your (local) system
  • parameter 1 : Week number in a month. 1-4 (1st, 2nd, 3rd, 4th) or L (last)
  • parameter 2 : Weekday number. 1-7 (1=Monday, 7=Sunday)
  • parameter 3 : Month. 1-12 (1=January, 12=December)

Blurring: Blur names and email addresses in html file

blurring = yes

  • empty :(default) no blurring
  • "yes" : Note that it is a VISUAL blur. Data can still be copy/pasted

Troubleshooting

Most of the errors should be handles by the script.

  • SSL Issue: On a Mac: the default SSL is outdated & unsupported. Check out the readme.rtf in your Python Application folder. That folder also contains a "Install certificates.command" which should do the work for you.

Release Notes

For old releasenotes click here

Enhancements in release v30 - March 19th 2023

Overall: increased output quality and precision. Support for DST, privacy blurring, bulk processing

Important Enhancements - all based on user requests

  • NEW: DST dates: support for MANUAL DST date configuration (in .ini file)
  • NEW: DST dates: support for AUTOMATIC (local) DST recognition, displaying message timestamps correctly for summer and winter.
  • NEW: DST dates: added ability to use "L" for "Last" (Sunday) (in situations with 5 sundays in a month, like March 2019)
  • NEW: Blurring: option to (OPTICALLY!) blur email addresses & @mentions in msg content and statistics. Text file output email addresses replaced with "@._". NOTE: JSON output will remain original NOTE: if you print the HTML file to PDF, the blurred data turns into images --> unable to get the original names
  • NEW: Batch processing: call script with spaceId or spaceId+.ini file

IMPROVEMENTS

  • CONFIG: On "token problem" - mention environment variable "WEBEX_ARCHIVE_TOKEN"
  • CONFIG: ini config: 'file download' section allow for the word "image" and "file" (besides "imageS" and "fileS")
  • VISUAL: added collaps/expand-all button for months in the header
  • VISUAL: added 'outputjson' and 'dst' settings to the HTML header
  • VISUAL: HTML header "between 70 and 150 days" --> "between 70-150 days"
  • VISUAL: Textfile output: extra line-break after
  • VISUAL: DST HTML header: TZ field shows the current TZ name, not both DST and non-DST name

FIXED

  • VISUAL: move @mention css to .atmention class (so it can be included in blurs)
  • VISUAL: TEXT output: no space between the email address and the message text
  • VISUAL: HTML header shows "Generate HTML code for 12828 messages" = total count BEFORE limiting by date/days.
  • TEXT output: if msg only has an attachment and NO text or html the script crashes
  • TEXT output: if msg has no text field, no return was added in the txt output
  • TEXT output: now using the "text" field as the basis, not the converted "html" field
  • TEXT output: DST date/time written to TXT file was UTC --> now local timezone OR with DST if configured
  • CONFIG: change dst setting explanation "position" to "week-in-month"
  • CONFIG: text generated in ini file was lowercased. Fixed.
  • FIX: msg with html field but without text showed error on screen --> moved to the end, printing html text
  • FIX: A single thread reply outside the original date-scope should not generate a TOC entry. (causing slightly less accurate msg count)
  • FIX: missing_parent_msg["text"] has html tags which don't render in the 'text' field. Changed to 'html'
  • FIX: Restrict messages between 2 dates failed. Fixed date check.
  • FIX: DST in Australia: dst_start date is in the FALL and dst_stop in spring - now working correctly
  • FIX: DST Utcoffset calculation was wrong for negative UTC offset timezones. fixed.

NOTE

  • CARDS: cards and buttons won't be visible in the html output

Info

Submit here, open an issue or if you know my email address: send a message on Webex (not via email!).

webex-message-space-archiver's People

Contributors

djf3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

webex-message-space-archiver's Issues

Final message not shown as threaded

If the final message in a space is part of a thread, it is not shown in the final HTML as threaded. I confirmed the JSON has the appropriate parentId.

I've debugged this to the point of calculating threaded_message in the main message loop. A "list index out of range" is being thrown and so threaded_message=False. I suspect that behavior is reasonable for either nextitem or previousitem, but not the other.

Issue - subfolder creation fails when team name ends in trailing space

Didn't realize one of our Teams spaces was named with a trailing space. The archiver trips on this, per below:

Traceback (most recent call last):
  File "E:\temp\teams\webex-teams-space-archive.py", line 1030, in <module>
    os.makedirs(myAttachmentFolder + "/files/")
225, in makedirs
    mkdir(name, mode)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'Sample Space /files/'

Obviously the space name shouldn't have the trailing space, and I've renamed it, but FYI.

HTML inconsistently drops old threaded messages without parent

I asked for 21 messages. The 21 newest messages are in the JSON output correctly. Seventeen are processed normally, including the entirety of their threads. The oldest four, however, are the end of a thread. Message 22 is the start of the thread, but was not requested. (If I configure it to 22, it all works great.)

The four messages are dropped. That might be reasonable. I recognize the challenge here, where you don't have the thread-start message. But output logging might be good (there is none), and the image shows both 17 and 21.

image

"Traceback (most recent call last): File "webex-space-archive.py", line 1232, in <module> current_step = mycounter/progress_steps ZeroDivisionError: division by zero"

I'm just testing the waters with Cisco Webex, and everything about it makes me happy expect for their strong lack of archiving ability.

I've configured my profile ID (mytoken) as well as the conversation I'm looking to archive as a test (myspaceid) and no matter what I do, it ends with this command:

[Traceback (most recent call last):
File "webex-space-archive.py", line 1232, in
current_step = mycounter/progress_steps
ZeroDivisionError: division by zero]

I'm not sure how to move forward from here. And with this script not being exactly common place, there aren't a lot of troubleshooting tips out there.

Any help with this would be greatly appreciated.

Suggestion - Date Range

First off... just want to say thanks for creating an amazing tool. Very useful and easy to use!

I'm wondering if it would be possible to add the ability to pull only messages between X and Y dates, rather than just the max days or messages. This would be helpful if you're looking for messages from only a certain year or time period.

Thanks!

HTML/TXT missing messages present in JSON

I have a recent archive that is missing messages in the HTML. I can see them in the JSON download, however.

This particular conversation is often complex and has two or three different threads going at a time. The missing messages are definitely in that category. Other chronologically-connected but in different threads messages are output in the HTML. Interestingly, the thread I know is missing messages has some early messages, and then misses some, and then does get the last message in the thread. My hunch would be that this is related to multiple active threads in some way.

The text of the missing images is in the JSON file. The images were not downloaded.

I did try with a significantly longer window of time, and it did not help.

I don't feel comfortable attaching the full set of files here. What minimal redacted information would be useful? The JSON file with the text strings replaced?

ZeroDivisionError when maxtotalmessages < 21

Given a config where maxtotalmessages has a value less than 21,

Traceback (most recent call last):
  File "webex-space-archive.py", line 1232, in <module>
    current_step = mycounter/progress_steps
ZeroDivisionError: division by zero

Soft returns / newlines are not shown (visually) in HTML output

Example message from webex:
image

In the JSON output:

"text": "Here is something I just came across:\n\nLog4j version 2.15.0 has been released to address this flaw, but The Record reports that its fix merely changes a setting from \"false\" to \"true\" by default. Users who change the setting back to \"false\" remain vulnerable to attack. Luckily this means that servers running earlier versions of Log4j can mitigate the attack by changing that setting.\n\nASF says that \"this behavior can be mitigated by setting system property 'log4j2.formatMsgNoLookups' to 'true' or by removing the JndiLookup class from the classpath (example: zip -q -d log4j-core-*.jar org/apache/logging/log4j/core/lookup/JndiLookup.class)\" in earlier versions of Log4j if users can't upgrade to the 2.15.0 release."

HTML:

<div class="css_messagetext">Here is something I just came across:

Log4j version 2.15.0 has been released to address this flaw, but The Record reports that its fix merely changes a setting from "false" to "true" by default. Users who change the setting back to "false" remain vulnerable to attack. Luckily this means that servers running earlier versions of Log4j can mitigate the attack by changing that setting.

ASF says that "this behavior can be mitigated by setting system property 'log4j2.formatMsgNoLookups' to 'true' or by removing the JndiLookup class from the classpath (example: zip -q -d log4j-core-*.jar org/apache/logging/log4j/core/lookup/JndiLookup.class)" in earlier versions of Log4j if users can't upgrade to the 2.15.0 release.</div>

Rendered HTML:
image

Messages are shown using current time offset instead of the one at the time of the message

<sarcasm>Because time zone issues are every developer's favorite problem!</sarcasm>
I will not complain if you just close this!

The US just experienced its fall time change where we fell back an hour. Webex itself and the archiver have different behavior for the messages before the time change happened.

Webex shows the time we thought it was when exchanging messages.
image

The archived HTML page shows the time one hour earlier, as if the fall back affects earlier times.
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.