Giter VIP home page Giter VIP logo

Comments (10)

SjoerdV avatar SjoerdV commented on September 2, 2024 2

So your question is about what's in the content of the files. This is something this script does not manage.

That said you could:

  • check if the problem is already in the Word files generated by OneNote Publish routine by commenting out the Remove-Item statement (line 203)
  • play around with the Docx2Md conversion that 'pandoc.exe' handles (script line 132) maybe you could do better than the GFM output I used... but there are a lot of options

from convertonenote2markdown.

johnkyle4 avatar johnkyle4 commented on September 2, 2024 1

I played around with the various Pandoc output format options and found that gfm and commonmark don't fix the bullet problem, but the others do (except I didn't try markdown_phpextra.)

The markdown option writes bullets as - and also doesn't write any raw HTML (like <table><tr><td>wtf</td></tr></table>) which is what I'm after.

None of them removed the blank lines between list items that the OP mentioned, but I can live with that.

https://pandoc.org/MANUAL.html#options

-t FORMAT, -w FORMAT, --to=FORMAT, --write=FORMAT
Specify output format. FORMAT can be: [list trimmed to markdown options]

from convertonenote2markdown.

SjoerdV avatar SjoerdV commented on September 2, 2024 1

The single vs double spacing seems to be related to "compact lists", described here

I assume that each bullet point in the intermediate docx file is treated like a paragraph, since it would have a paragraph mark (^p) at the end of each line, thus it automatically creates a loose, double-spaced, list.

I came across this post that describes how to get single spacing when going from docx to md, but I'm not smart enough to get it to work. Perhaps one of you are able to?

My issues are:

  1. The pandoc syntax in that post differs greatly from our script, so not sure exactly how to integrate it
  2. I'm not sure where to save the lua file (but ended up using a full filepath reference to find it), but it doesn't seem to work anyway.

But since I am not getting errors anymore, perhaps:
3. Maybe the lua code doesn't even work? I've also found this script that was built off the original, but it doesn't work for me either.

Here's the line I've used to at least not generate errors:

pandoc.exe -f docx -t markdown -L C:\Users...\plaintext.lua -i $fullexportpath -o "$($fullexportpathwithoutextension).md" --wrap=none --atx-headers --extract-media="$($fullexportdirpath)"

I had even less success with the haskell script, which seems to need you to install and use haskell, which I can't figure out

Hi @nixsee. Great you figured that all out, and thanks for sharing as this is now a valuable resource on this repository. I know the pandoc output is not absolutely fantastic but it works and it adheres to the GFM markup. I would just leave the exported files and update them only when you need something 'neat' again. At least the notes are now plaintext, searchable with no vendor lock-in, so you can do anything with them at any time.

Of course you will not be sharing the markdown output with anyone and always convert it to pdf which is possible which you can do with the nice vscode 'manuth.markdown-converter' extension, and you will probably be using additional markdown syntax provided in 'jebbs.markdown-extended' to really make nice documents. I would focus my efforts in getting good at MarkDown and make beautiful content. Start with the notebook setup I mentioned in the Recommendations section of the README.

For instance using the 'Markdown Extended 'Admonition' Extended Syntax' provides really cool and professional looking sections, which are great for exporting to PDF. Or if you intend to host your Markdown files (with GitHub Pages using a Jekyll server) get into learning that syntax as well. Lots to do!

from convertonenote2markdown.

nixsee avatar nixsee commented on September 2, 2024 1

Thanks very much! This has been a godsend and I'm just being picky ;)

As it turns out, I just discovered that can just do a global find/replace for double spaces by pressing ctrl+enter in the find/replace text boxes (I had tried shift and alt, but not ctrl!).

Not perfect, but combining this with changing to the pandoc "markdown" converter, as suggested by @johnkyle4, I'm 99% of the way there! Now I just have to sort through my mountains of notes and turn them into something useful, which means I'm actually 1% of the way there...

Thanks again.

from convertonenote2markdown.

nixsee avatar nixsee commented on September 2, 2024 1

Better yet, using "^\h*\R" in regex in notepad++ (though I'm sure you can do something similar in vscode) will clear all blank lines and can be done at the folder level. Gets rid of lines between paragraphs, but its good enough for me.

the pandoc "markdown" converter adds the "" escape in front of many symbols, making it hard to search for things and annoying to look at, but it renders properly and, as @johnkyle4 said, it doesn't use any html code for tables etc... Good enough.

from convertonenote2markdown.

johnkyle4 avatar johnkyle4 commented on September 2, 2024

In my testing of this issue (per Sjoerd's suggestions) the Word docs come out perfectly. Unordered lists have bullets, ordered lists have numbers. So it's time to play around with pandoc!

Thank you @SjoerdV for your answers and guidance.

from convertonenote2markdown.

nixsee avatar nixsee commented on September 2, 2024

Thanks for the responses and tinkering! Same experience for me - no issues with Word, so its a markdown conversion issue. I'll fiddle around with those options and see if I can get something that works for me. Maybe its even possible to modify the Pandoc options, or talk to Pandoc about getting something modified.

One unrelated suggestion - might be worth making it explicit in the instructions (both in github readme as well as even in the script prompts) that OneNote needs to be opened as administrator. I was going crazy not being able to run the modified script until I realized it was an admin thing. I've now set my OneNote to auto-run as Admin, because I am bound to forget again.

from convertonenote2markdown.

nixsee avatar nixsee commented on September 2, 2024

The single vs double spacing seems to be related to "compact lists", described here

I assume that each bullet point in the intermediate docx file is treated like a paragraph, since it would have a paragraph mark (^p) at the end of each line, thus it automatically creates a loose, double-spaced, list.

I came across this post that describes how to get single spacing when going from docx to md, but I'm not smart enough to get it to work. Perhaps one of you are able to?

My issues are:

  1. The pandoc syntax in that post differs greatly from our script, so not sure exactly how to integrate it
  2. I'm not sure where to save the lua file (but ended up using a full filepath reference to find it), but it doesn't seem to work anyway.

But since I am not getting errors anymore, perhaps:
3. Maybe the lua code doesn't even work? I've also found this script that was built off the original, but it doesn't work for me either.

Here's the line I've used to at least not generate errors:

pandoc.exe -f docx -t markdown -L C:\Users...\plaintext.lua -i $fullexportpath -o "$($fullexportpathwithoutextension).md" --wrap=none --atx-headers --extract-media="$($fullexportdirpath)"

I had even less success with the haskell script, which seems to need you to install and use haskell, which I can't figure out

from convertonenote2markdown.

SjoerdV avatar SjoerdV commented on September 2, 2024

Thanks for the responses and tinkering! Same experience for me - no issues with Word, so its a markdown conversion issue. I'll fiddle around with those options and see if I can get something that works for me. Maybe its even possible to modify the Pandoc options, or talk to Pandoc about getting something modified.

One unrelated suggestion - might be worth making it explicit in the instructions (both in github readme as well as even in the script prompts) that OneNote needs to be opened as administrator. I was going crazy not being able to run the modified script until I realized it was an admin thing. I've now set my OneNote to auto-run as Admin, because I am bound to forget again.

Hi @nixsee about the 'running as admin' thing, that really depends on your individual security settings, as I was able to do everything in 'normal' mode when both UAC and powershell execution policy are tweaked. Therefore thanks for your suggestion but I'll leave the documentation as it is right now as it's more of a Windows configuration thing not related to the script.

from convertonenote2markdown.

SjoerdV avatar SjoerdV commented on September 2, 2024

Cool stuff with the regex indeed! VScode has a very good find & replace function including regex, so your good!
image. Will continue to close this if its alright by you. Lots of luck with your 'text cleaning'

from convertonenote2markdown.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.