Giter VIP home page Giter VIP logo

mediumexporter's Introduction

Build Status Coverage Status

Medium Exporter

Export your stories published on medium.com to markdown.

Usage

./index.js {url}
    -O, --output - write to specified output directory
    -I, --info – Show information about the medium post
    --hugo - enable gohugo.io shortcodes
    --frontmatter - enable frontmatter
    --jekyll - format content and images for us in Jekyll blogs

CLI example

If not output directory is specified, images and content will be downloaded into /content

./index.js https://medium.com/@PatrickHeneise/malaysia-16be98ab673e

programmatic example

get individual posts

async function example() {
  mediumexporter.getPost(link, {
    output: "content/posts",
    hugo: true,
    frontmatter: true
  })
}

get feeds (default page size is 10)

const exporter = require('./index')
exporter.getFeed('https://medium.com/feed/@xdamman', { output: 'content' })

mediumexporter's People

Contributors

asameshimae avatar badrihippo avatar christianwilkie avatar daviddarnes avatar patrickheneise avatar werdnanoslen avatar xdamman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mediumexporter's Issues

Parsing of whole feed (not just first 10 posts)

Do you have any idea if it's possible to parse the whole feed, not just the first 10 posts? @bobby-brennan maybe you know?

I was looking for some <link rel="next" in the medium rss but seems there's none; I tried to add ?page=2 or ?limit=100 but seems like they don't support it.

Headings have no space after '#'

On a regular text, converter will omit the required space after # and before heading content.

Example output:

###This is an h3

(Great utility by the way)

File naming issue on Windows - invalid characters in downloaded filenames

I tried running the CLI to download one of my medium posts via the following command:
node ./index.js https://medium.com/@ChristianWilkie/setting-up-a-free-ghost-blog-on-google-cloud-platform-and-cloudflare-c9bc79861a0e

I got an error:

PS C:\Users\chris\Desktop\medium\mediumexporter> node ./index.js https
://medium.com/@ChristianWilkie/setting-up-a-free-ghost-blog-on-google-cl
oud-platform-and-cloudflare-c9bc79861a0e
image https://cdn-images-1.medium.com/max/2000/1*4zonAsqpNEPQwT8XGNIKQQ.png 1*4zonAsqpNEPQwT8XGNIKQQ.png
something went wrong
{ Error: ENOENT: no such file or directory, open 'content\setting-up-a-free-ghost-blog-on-google-cloud-platform-and-cloudflare\images/1*4zonAsqpNEPQwT8XGNIKQQ.png'
    at Object.openSync (fs.js:438:3)
    at Object.writeFileSync (fs.js:1189:35)
    at Object.downloadImages (C:\Users\chris\Desktop\medium\mediumexporter\lib\utils.js:20:8)
    at process._tickCallback (internal/process/next_tick.js:68:7)
  errno: -4058,
  syscall: 'open',
  code: 'ENOENT',
  path:
   'content\\setting-up-a-free-ghost-blog-on-google-cloud-platform-and-cloudflare\\images/1*4zonAsqpNEPQwT8XGNIKQQ.png' }

It seems related to running it on windows and the '*' in the filename of one of the images: https://cdn-images-1.medium.com/max/800/1*4zonAsqpNEPQwT8XGNIKQQ.png

In Windows it seems like the * character isn't allowed in filenames. I noticed when I try to save it in chrome it replaces the * with _

ex:
1*4zonAsqpNEPQwT8XGNIKQQ.png
->
1_4zonAsqpNEPQwT8XGNIKQQ.png

Here's a list of forbidden filename characters on Windows: https://docs.microsoft.com/en-us/windows/desktop/msi/filename

not really sure of the best solution, maybe you could use something like this: https://github.com/parshap/node-sanitize-filename

when I tried hacking it in it seemed to work:

"C:\Program Files\nodejs\node.exe" C:\Users\chris\Desktop\medium\mediumexporter\index.js https://medium.com/@ChristianWilkie/setting-up-a-free-ghost-blog-on-google-cloud-platform-and-cloudflare-c9bc79861a0e
image https://cdn-images-1.medium.com/max/2000/1*4zonAsqpNEPQwT8XGNIKQQ.png 14zonAsqpNEPQwT8XGNIKQQ.png
image https://cdn-images-1.medium.com/max/2000/1*6QfOFjGYVjI5JzaNh6Sf7Q.png 16QfOFjGYVjI5JzaNh6Sf7Q.png
image https://cdn-images-1.medium.com/max/2000/1*FftUAGHFGHTZLjDEXUpbGg.png 1FftUAGHFGHTZLjDEXUpbGg.png
image https://cdn-images-1.medium.com/max/2000/1*3oWKRejAS3HC3K_vu6j9uQ.png 13oWKRejAS3HC3K_vu6j9uQ.png
image https://cdn-images-1.medium.com/max/2000/1*yY558iqcpryitUVP2-oq-Q.png 1yY558iqcpryitUVP2-oq-Q.png
image https://cdn-images-1.medium.com/max/2360/1*-5FNUDRUkrGmfc896lVXzg.png 1-5FNUDRUkrGmfc896lVXzg.png
image https://cdn-images-1.medium.com/max/2000/1*xhHwY1Q6q27XLbxC1Vaowg.png 1xhHwY1Q6q27XLbxC1Vaowg.png
image https://cdn-images-1.medium.com/max/2236/1*F7IKa0s5r8zu47v7c08E2A.png 1F7IKa0s5r8zu47v7c08E2A.png
image https://cdn-images-1.medium.com/max/2104/1*VKNsP8qFpms0mx3YlcH3EQ.png 1VKNsP8qFpms0mx3YlcH3EQ.png
image https://cdn-images-1.medium.com/max/2082/1*gaGGKaWZx5_zmvac5w9UNg.png 1gaGGKaWZx5_zmvac5w9UNg.png
image https://cdn-images-1.medium.com/max/2000/1*AfNNmjD4lDP1ZNbZQVFnyw.png 1AfNNmjD4lDP1ZNbZQVFnyw.png
image https://cdn-images-1.medium.com/max/2000/1*26nHtpOvLVJo0PuCQLYaTQ.png 126nHtpOvLVJo0PuCQLYaTQ.png
image https://cdn-images-1.medium.com/max/2000/1*XRzEgDPazTjabVTjobPKNg.png 1XRzEgDPazTjabVTjobPKNg.png

I can add a PR with the module added in case it's helpful/for your review. I don't really write much nodejs at all so I dunno if what I wrote is terrible/buggy but it seemed to work when I tried it.

Thanks for your time! :)

No longer works on some medium URLs

It seems medium is prefacing some responses with HTML even when JSON is loaded. Here is the error I am getting in my console. Here is the URL I am testing it on: https://medium.com/samsung-internet-dev/making-an-ar-game-with-aframe-529e03ae90cb

SyntaxError: Unexpected token < in JSON at position 1539

    at JSON.parse (<anonymous>)

    at Request._callback (/rbd/pnpm-volume/3fedbde6-784a-4417-9672-afa42ea3708b/node_modules/.registry.npmjs.org/medium-to-md/1.1.3/node_modules/medium-to-md/utils.js:13:27)

    at Request.self.callback (/rbd/pnpm-volume/3fedbde6-784a-4417-9672-afa42ea3708b/node_modules/.registry.npmjs.org/request/2.88.2/node_modules/request/request.js:185:22)

    at Request.emit (events.js:180:13)

    at Request.<anonymous> (/rbd/pnpm-volume/3fedbde6-784a-4417-9672-afa42ea3708b/node_modules/.registry.npmjs.org/request/2.88.2/node_modules/request/request.js:1154:10)

    at Request.emit (events.js:180:13)

    at IncomingMessage.<anonymous> (/rbd/pnpm-volume/3fedbde6-784a-4417-9672-afa42ea3708b/node_modules/.registry.npmjs.org/request/2.88.2/node_modules/request/request.js:1076:12)

    at Object.onceWrapper (events.js:272:13)

    at IncomingMessage.emit (events.js:185:15)

    at endReadableNT (_stream_readable.js:1106:12)

Does not work on Ubuntu 20.04

Following this blog - I ran following signature of commmad . Did not download the blog.

mediumexporter https://medium.com/p/export-your-medium-posts-to-markdown-b5ccc8cb0050 > medium_post.md

SyntaxError: Unexpected end of JSON input

I'm getting this error


SyntaxError: Unexpected end of JSON input
    at Object.parse (native)
    at Request._callback (/usr/local/lib/node_modules/mediumexporter/utils.js:11:25)
    at Request.self.callback (/usr/local/lib/node_modules/mediumexporter/node_modules/request/request.js:186:22)
    at emitTwo (events.js:106:13)
    at Request.emit (events.js:191:7)
    at Request. (/usr/local/lib/node_modules/mediumexporter/node_modules/request/request.js:1081:10)
    at emitOne (events.js:96:13)
    at Request.emit (events.js:188:7)
    at IncomingMessage. (/usr/local/lib/node_modules/mediumexporter/node_modules/request/request.js:1001:12)
    at IncomingMessage.g (events.js:286:16)

Correctly convert code inline blocks

Great work! loving this tool so far. Thanks for the hard work.

I'm using this to move my medium technical articles to my Gatsby blog but I see some inline code is not being applied correctly.

For instance this:

Screen Shot 2020-03-27 at 11 22 37 AM

Gets converted to:

Easy, huh? This is obviously framework agnostic as the element’s animation will trigger once it’s inserted into the DOM or it’s display property goes. 

I'd expect the text ...element's animation will... to be presented as ...element's `animation` will...

Extracted image paths don't appear to be correct?

Attempted using this example post.

The sample Markdown output from mediumexporter for the header image at the top of the post:

![Some various map tiles of the [MPG Ranch](http://mpgranch.com/) area, in Montana’s Bitterroot Valley.](https://medium2.global.ssl.fastly.net/max/2560/1*sNRRIAtOi6FLSNRC9HSynw.png)

Going to that URL I get this error: "Fastly error: unknown domain. Please check that this domain has been added to a service."

When viewing the image source in dev tools on the post in Medium the actual image path appears to be:

https://cdn-images-1.medium.com/max/1000/1*sNRRIAtOi6FLSNRC9HSynw.png

Each of the other image paths in the post are incorrect in the markdown output.

support Blogdown

This is a great project -- thank you for building it! Saved me a bunch of time! I'm using Blogdown for my site, and there was some manual effort that could be automated. This issue is some notes that someone (maybe me, if I have time) could use to make it easier to copy Medium posts to Blogdown format.

If --blogdown is provided on the command line:

  • instead of writing to stdout, save the result to a file with the pattern YYYY-MM-DD-<slug>.Rmd, where the slug does not include the hex string at the end of the Medium slug
  • add a "this post was first published" block at the top of the document
  • create a Blogdown header

Here's an example of a header. Note that the title, date, tags, slug, and URL for the "first published" block can all easily be generated automatically.

---
title: On How and When to Teach Layers of Abstraction in Programming
author: ''
date: '2017-10-05'
categories:
  - professional
tags:
  - programming
  - R
  - teaching
  - computer science
slug: on-how-and-when-to-teach-layers-of-abstraction-in-programming
---

_[This post was originally published on Medium](https://medium.com/@HarlanH/on-how-and-when-to-teach-layers-of-abstraction-in-programming-d220c4b5e5b9)_

bug: numbered lists not outputted correctly

see https://repl.it/@almenon/mediumexporter

run node index.js https://medium.com/@almenon214/adding-telemetry-to-your-vscode-extension-f3d52d2e573c followed by cat content/adding-telemetry-to-your-vscode-extension/index.md

Expected result:

1. Despite having hundreds of downloads, the actual user count is much much lower. 5 people have used it so far with one person using it twice… not great statistics. Should pick up onceI market AREPL at pycon.

2. The range of users is quite geographically diverse. You don’t just get people in California or America; there’s people from canada, italy, portugal, all sorts of places. I guess thatis to be expected with internet marketing — people can see your extension from countries across the world.

Actual result:

1. Despite having hundreds of downloads, the actual user count is much much lower. 5 people have used it so far with one person using it twice… not great statistics. Should pick up onceI market AREPL at pycon.

1. The range of users is quite geographically diverse. You don’t just get people in California or America; there’s people from canada, italy, portugal, all sorts of places. I guess thatis to be expected with internet marketing — people can see your extension from countries across the world.

It should be 1. then 2., not 1. 1.

Unknown markup type 10

My export resulted in a lot of errors like the following:

Unknown markup type 10 { type: 10, start: 317, end: 336 }
Unknown markup type 10 { type: 10, start: 199, end: 211 }
Unknown markup type 10 { type: 10, start: 118, end: 129 }
Unknown markup type 10 { type: 10, start: 89, end: 101 }
Unknown markup type 10 { type: 10, start: 111, end: 119 }
Unknown markup type 10 { type: 10, start: 165, end: 187 }
Unknown markup type 10 { type: 10, start: 256, end: 262 }
Unknown markup type 10 { type: 10, start: 261, end: 278 }
Unknown markup type 10 { type: 10, start: 176, end: 201 }

What does this mean?

Why not release v1.0.0 on npm?

I added this repo as a submodule in my project as I needed to use the programmatic API and needed to fix #20. Why not release this version, at least as @next?

info param makes the process exit without any message

I tried to use the info parameter when using the programmatic API for getPost:

async function importer(link, dir, name) {
  return mediumexporter.getPost(link, {
    output: path.resolve(dir, name),
    hugo: true,
    frontmatter: true,
    info: true
  });
}

But these lines make the all process exit without prompting us why:
https://github.com/xdamman/mediumexporter/blob/master/lib/get-post.js#L52-L54

It took me some time to understand this was the issue.
IMO, this should be mentioned in README.

Error while exporting old medium stories

Hey, I'm getting this strange error messages when exporting some old medium stories. Here's the stacktrace:

/usr/local/lib/node_modules/mediumexporter/index.js:24
  var s = json.payload.value;
                      ^

TypeError: Cannot read property 'value' of undefined
    at /usr/local/lib/node_modules/mediumexporter/index.js:24:23
    at Request._callback (/usr/local/lib/node_modules/mediumexporter/utils.js:12:16)
    at Request.self.callback (/usr/local/lib/node_modules/mediumexporter/node_modules/request/request.js:198:22)
    at emitTwo (events.js:87:13)
    at Request.emit (events.js:172:7)
    at Request.<anonymous> (/usr/local/lib/node_modules/mediumexporter/node_modules/request/request.js:1035:10)
    at emitOne (events.js:82:20)
    at Request.emit (events.js:169:7)
    at IncomingMessage.<anonymous>

¿Do you know what the problem might be? If it's a bug, with a few pointers I could help you fix it (I'm a JS programmer myself).

Here are some stories where I get the problem:

https://medium.com/keep-learning-keep-growing/how-to-enjoy-reading-books-2d10d13905c7
https://blog.zenze.co/how-to-become-fearless-ffa74f881c4f
https://blog.zenze.co/are-you-unnecessarily-busy-b271cadb7b4e

Thanks!

wrong parsing | missing first image

Had to export hundreds of medium posts for a project and realized a wrong pattern in which the first image are not captured by mediumexporter.

to reproduce:

$ mediumexporter https://medium.com/@AreYouSyrious/ays-news-digest-08-02-17-uk-reneges-on-the-dubs-agreement-destroys-hopes-of-refugee-children-97cc0ca96fe0 > output.md

markdown output:

# AYS NEWS DIGEST 08.02.17 — UK reneges on the Dubs Agreement, destroys hopes of refugee children

People camp outdoors in Paris. Photo Credit: Calais Action

### Feature
...

Algorithm misses first image, but not sure how to fix the parser.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.