Comments (18)
<3 thanks @beeman and @vivekatro
from unfurl.
Thanks @jacktuck for fixing it quickly!
from unfurl.
Should be fixed thanks to @jacktuck !
from unfurl.
https://unfurl.now.sh/?url=https://www.rakuten.co.jp
breaks, I get a timeout error. but it used to work with 1.1.6
from unfurl.
@vivekatro Good find! I had noticed this before on a similar domain. Would you like to make a pull request :) ?
from unfurl.
Pretty simple way to fix just use encodeURIComponent() when collecting strings and decodeURIComponent() when displaying them back this should preserve japanese characters.
from unfurl.
@vivekatro I (think) the issue is over at micro-unfurl see my PR beeman/micro-unfurl#3
from unfurl.
Issue fixed by @jacktuck and deployed :)
from unfurl.
@vivekatro Turns out this was also a bug for unfurl.js - we didn't handle servers responding with multibyte encodings. I noticed this when http://qq.com
returned charset as GB2312
.
Other encodings are:
Japanese: Shift_JIS, Windows-31j, Windows932, EUC-JP
Chinese: GB2312, GBK, GB18030, Windows936, EUC-CN
Korean: KS_C_5601, Windows949, EUC-KR
Taiwan/Hong Kong: Big5, Big5-HKSCS, Windows950
This PR has been merged and should now be fixed in unfurl.js #31
And there's an open PR beeman/micro-unfurl#4 which bumps micro-unfurl to use this release.
There is also a tagged release on npm 1.1.7
which you can install with npm install [email protected]
or npm install unfurl.js@beta
Leaving this open for now until I do a release under latest tag.
from unfurl.
I need to fix the benchmarks at some point too and see how this fix impacts performance.
from unfurl.
It seems like on the new version, the structure of the response has changed,
with 1.1.6
for https://www.yahoo.co.jp
I get following response,
{
"other" : {
"description" : "日本最大級のポータルサイト。検索、オークション、ニュース、天気、スポーツ、メール、ショッピングなど多数のサービスを展開。あなたの生活をより豊かにする「課題解決エンジン」を目指していきます。",
"robots" : "noodp",
"googleSiteVerification" : "fsLMOiigp5fIpCDMEVodQnQC7jIY1K3UXW5QkQcBmVs",
"alternate" : "https://m.yahoo.co.jp/",
"canonical" : "https://www.yahoo.co.jp/",
"fbAppId" : "472870002762883",
"title" : "Yahoo! JAPAN\n",
"stylesheet" : "//s.yimg.jp/images/top/sp2/clr/180312/1.css"
},
"ogp" : {
"ogTitle" : "Yahoo! JAPAN",
"ogType" : "article",
"ogUrl" : "https://www.yahoo.co.jp/",
"ogImage" : [ {
"url" : "https://s.yimg.jp/images/top/ogp/fb_y_1500px.png"
} ],
"ogDescription" : "日本最大級のポータルサイト。検索、オークション、ニュース、天気、スポーツ、メール、ショッピングなど多数のサービスを展開。あなたの生活をより豊かにする「課題解決エンジン」を目指していきます。",
"ogSiteName" : "Yahoo! JAPAN"
},
"twitter" : {
"twitterCard" : "summary_large_image",
"twitterSite" : "@Yahoo_JAPAN_PR",
"twitterTitle" : "Yahoo! JAPAN",
"twitterDescription" : "日本最大級のポータルサイト。検索、オークション、ニュース、天気、スポーツ、メール、ショッピングなど多数のサービスを展開。あなたの生活をより豊かにする「課題解決エンジン」を目指していきます。",
"twitterImage" : [ {
"url" : "https://s.yimg.jp/images/top/ogp/tw_y_1400px.png"
} ]
}
}
But with the latest version, the twitter
and ogp
sections are not coming.
from unfurl.
@vivekatro Thanks for this i'll take a look this evening
from unfurl.
@vivekatro Default timeout is 2000ms and this site takes 8000ms for me, you can pass your own timeout, though. For example, you can set a timeout of 20 seconds like so: unfurl(url, { timeout: 20 * 1000 } )
.
from unfurl.
Mm actually 1.1.6 used request rather than node-fetch so it had a higher timeout. I'll make a change to match the old timeout i think too or just increase the default to something sensible. Good find.
From request: timeout - integer containing the number of milliseconds to wait for a server to send response headers (and start the response body) before aborting the request. Note that if the underlying TCP connection cannot be established, the OS-wide TCP connection timeout will overrule the timeout option (the default in Linux can be anywhere from 20-120 seconds).
from unfurl.
Timeout should default to OS limit again now in version 1.1.8-beta.2
from unfurl.
Were you able to check on yahoo.co.jp unfurling missing ogp and Twitter in the new version?
from unfurl.
@vivekatro Not yet. I need to make a test demonstrating whats missing etc to make sure it doesn't regress again.
from unfurl.
@vivekatro Good spot on yahoo.co.jp. Just found the issue which was User-Agent was not being set properly in the new release. It just so happened yahoo will not send you meta tags unless you are a bot, which is why setting User-Agent to facebook's (the default user-agent) works.
I've fixed it on the newest prerelease of [email protected] (see README first), let me know if you'd like me to patch 1.6.x too, it's trivial to do so. :)
from unfurl.
Related Issues (20)
- og:image:alt not used HOT 3
- meta theme-color and/or a way to access the scraped HTML? HOT 1
- 🐛 Library doesn't work clientside HOT 6
- Installation instructions HOT 2
- Missing types in dist
- Package installation HOT 3
- Favicon HOT 3
- Add semantic releases HOT 3
- Youtube: only favicon gets extracted HOT 2
- Youtube OEmbed not picked up HOT 2
- Incorrect metadata types HOT 2
- Wrong parsing of <title> HOT 6
- Issue scraping Amazon HOT 19
- Add support article type in OGP specification HOT 2
- TypeError [ERR_UNESCAPED_CHARACTERS]: Request path contains unescaped characters HOT 4
- Respect robots.txt HOT 1
- [Bug] Incorrect Twitter card result for Gumroad links HOT 3
- Allow passing extra headers to fetch request HOT 2
- Posibility to use axios or enhance the current implementation to work on company proxyfied networks HOT 1
- [Bug]: SoundCloud oEmbed is not returning any html content HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unfurl.