Comments (2)
https://trackchanges.postlight.com/were-bullish-on-amp-abfc6e1f10a1#.cab8vkict was also a good read. Curious if we can find a few good pages w/amp support for testing/scraping.
from page-metadata-parser.
I think this is related use case. This one is a stretch, but mildly interesting.
Looking at https://vimeo.com/180763356, we have the following meta tags (with uninteresting <meta>
tags removed):
$ meta-scraper -u "https://vimeo.com/180763356"
$ meta-scraper -u "https://vimeo.com/180763356"
...
<link rel="apple-touch-icon-precomposed" href="https://i.vimeocdn.com/favicon/main-touch_180">
<link rel="canonical" href="/180763356">
<link rel="logo" type="image/svg" href="https://f.vimeocdn.com/logo.svg">
...
<link rel="shortcut icon" href="https://f.vimeocdn.com/images_v6/favicon.ico" data-play="https://i.vimeocdn.com/favicon/play_32" data-pause="https://i.vimeocdn.com/favicon/pause_32">
...
<meta charset="utf-8">
<meta name="description" content="There's no telling how many guns we have in America—and when one gets used in a crime, no way for the cops to connect it to its owner. The only place…">
<meta name="msapplication-TileColor" content="#00adef">
<meta name="msapplication-TileImage" content="https://i.vimeocdn.com/favicon/main-touch_144">
...
<meta name="twitter:card" content="player">
<meta name="twitter:description" content="There's no telling how many guns we have in America—and when one gets used in a crime, no way for the cops to connect it to its owner. The only place…">
<meta name="twitter:image" content="https://i.vimeocdn.com/video/589150572_1280x720.jpg">
<meta name="twitter:player" content="https://player.vimeo.com/video/180763356">
<meta name="twitter:player:height" content="720">
<meta name="twitter:player:width" content="1280">
<meta name="twitter:site" content="@vimeo">
<meta name="twitter:site" content="@vimeo">
<meta name="twitter:title" content="The Tracers - An inside look at the Real-Life Database of America's Firearms.">
...
<meta property="og:description" content="There's no telling how many guns we have in America—and when one gets used in a crime, no way for the cops to connect it to its owner. The only place…">
<meta property="og:image" content="https://i.vimeocdn.com/video/589150572_1280x720.jpg">
<meta property="og:image:height" content="720">
<meta property="og:image:secure_url" content="https://i.vimeocdn.com/video/589150572_1280x720.jpg">
<meta property="og:image:type" content="image/jpg">
<meta property="og:image:width" content="1280">
<meta property="og:site_name" content="Vimeo">
<meta property="og:title" content="The Tracers - An inside look at the Real-Life Database of America's Firearms.">
<meta property="og:type" content="video">
<meta property="og:updated_time" content="2016-09-01T19:56:39-04:00">
<meta property="og:url" content="https://vimeo.com/180763356">
<meta property="og:video:height" content="720">
<meta property="og:video:height" content="720">
<meta property="og:video:secure_url" content="https://player.vimeo.com/video/180763356?autoplay=1">
<meta property="og:video:secure_url" content="https://vimeo.com/moogaloop.swf?clip_id=180763356&autoplay=1">
<meta property="og:video:type" content="application/x-shockwave-flash">
<meta property="og:video:type" content="text/html">
<meta property="og:video:url" content="https://player.vimeo.com/video/180763356?autoplay=1">
<meta property="og:video:url" content="https://vimeo.com/moogaloop.swf?clip_id=180763356&autoplay=1">
<meta property="og:video:width" content="1280">
<meta property="og:video:width" content="1280">
<meta property="video:tag" content="ATF">
<meta property="video:tag" content="Firearms">
<meta property="video:tag" content="GQ">
<meta property="video:tag" content="Guns">
<meta property="video:tag" content="National Tracing Center">
<meta property="video:tag" content="The Tracers">
<title>The Tracers - An inside look at the Real-Life Database of America's Firearms. on Vimeo</title>
Plus this <script type="application/ld+json">
tag:
<script type="application/ld+json">...</script>
<script type="application/ld+json">...</script>
<script type="application/ld+json">
[
{
"url": "https://vimeo.com/180763356",
"thumbnailUrl": "https://i.vimeocdn.com/video/589150572_1280x720.webp",
"embedUrl": "https://player.vimeo.com/video/180763356",
"name": "The Tracers - An inside look at the Real-Life Database of America's Firearms.",
"description": "There's no telling how many guns we have in America—and when one gets used in a crime, no way for the cops to connect it to its owner. The only place…",
"height": 1080,
"width": 1920,
"playerType": "HTML5 Flash",
"videoQuality": "HD",
"duration": "PT00H07M26S",
"uploadDate": "2016-08-30T12:52:25-04:00",
"thumbnail": {
"@type": "ImageObject",
"url": "https://i.vimeocdn.com/video/589150572_1280x720.webp",
"width": 1280,
"height": 720
},
"author": {
"@type": "Person",
"name": "Steven Brahms",
"url": "https://vimeo.com/stevenbrahms"
},
"potentialAction": {
"@type": "ViewAction",
"target": "vimeo://app.vimeo.com/videos/180763356"
},
"interactionCount": 13055,
"keywords": "[GQ,The Tracers,National Tracing Center,Firearms,ATF,Guns]",
"@type": "VideoObject",
"@context": "http://schema.org"
},
{
"itemListElement": [
{
"@type": "ListItem",
"position": 1,
"item": {
"@id": "https://vimeo.com/stevenbrahms",
"name": "Steven Brahms"
}
},
{
"@type": "ListItem",
"position": 2,
"item": {
"@id": "https://vimeo.com/stevenbrahms/videos",
"name": "Videos"
}
}
],
"@type": "BreadcrumbList",
"@context": "http://schema.org"
}
]
</script>
So our current parser cannot get any keywords unless we can parse the AMP <script type="application/ld+json">
block, or start parsing+merging multiple <meta property="video:tag" content="..">
tags (in addition to article:tag
and book:tag
tags, per http://ogp.me/).
from page-metadata-parser.
Related Issues (20)
- Remove coveralls access token
- Update to CircleCI v2 API
- Possible documentation issue HOT 1
- Wiki changes
- CODE_OF_CONDUCT.md file missing
- Typescript types support HOT 1
- defaultValue for favicon should be "/favicon.ico" rather than "favicon.ico"
- Add publication date support for articles pages
- Suspicious www URL parsing in getProvider()
- Add highres_icon_url rules HOT 2
- Find the largest high res icon HOT 1
- Update README.md HOT 1
- Custom formatting for provider ? HOT 1
- Aliases for rules? HOT 1
- Remove Fathom 1.0 Dependency
- Explicitly flag whether an icon was found
- Usage in browser instructions HOT 1
- Move special post processing cases to processors
- Update description and preview image rules
- Icon scoring lacks support for vector icons HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from page-metadata-parser.