dopecodez / wikipedia Goto Github PK

View Code? Open in Web Editor NEW

77.0 4.0 19.0 582 KB

Wikipedia for node and the browser

License: MIT License

TypeScript 99.23% JavaScript 0.77%

wiki wikipedia article images-optimized typescript jest test-coverage wikipedia-api wikidata onthisday

wikipedia's People

Contributors

Stargazers

Watchers

Forkers

0xflotus friendofdog bumbummen99 greeshmareji ahmedissa93 harisangarans theam01 sreejithsree11 bigmistqke codeit-ninja gtibrett mbork phiph-s jamespacileo untoldhacker-dev yg-i bdomantas

wikipedia's Issues

Won't load in a secure context (https)

Mixed Content: The page at '{APP_DOMAIN}' was loaded over HTTPS, but requested an insecure resource 'http://en.wikipedia.org/w/api.php?list=search&srprop=&srlimit=10&srsearch={QUERY}&format=json&redirects=&action=query&origin=*&'. This request has been blocked; the content must be served over HTTPS.

Error using page() to get infobox()

Do you have any thoughts on what Invalid attempt to destructure non-iterable instance is referring to in this context?

/PROJECTS/research/node_modules/wikipedia/dist/page.js:256
                throw new errors_1.infoboxError(error);
                      ^

infoboxError: infoboxError: TypeError: Invalid attempt to destructure non-iterable instance
    at Page.infobox (/PROJECTS/research/node_modules/wikipedia/dist/page.js:256:23)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  code: undefined
}

Node.js v18.16.0

The code:

const wiki = require('wikipedia')
let page, infobox
async function getPage(input) {
  try {
    page = await wiki.page(input)
    infobox = await page.infobox()
    console.log(infobox)
  } catch (error) {
    console.log(error)
  }
  return infobox
}

getPage('John M. Vining')

How do I pass page url instead of page text?

There are no clear instructions on page input parameters. There is just a singleton example with text input 'Batman'.
How do I account for passing a page url as data?

As an example, use page('Oliver Ellsworth') and then retrieve the infobox()
And then retrieve page('1st United States Congress') and then retrieve infobox()

Implement mobile html

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /page/mobile-html/{title} endpoint which which provides mobile friendly html.

Implementation for this should follow the summary or related method flow. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

I'll be happy to help anyone who wants to pick this up.

Allow customization of USER_AGENT

Created an issue to track #48

We should allow customization of user agent but without using environment variables explicitly in a npm package.

Move from travis to github actions

Since Travis ci stopped unlimited builds, we need to migrate the project to Github Actions. Readme needs to be updated too to reflect this change. We can use the workflow defined here : https://github.com/dopecodez/pingman/tree/master/.github/workflows

Implement mobile sections

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /page/mobile-sections/{title} endpoint which which provides mobile friendly html.

Implementation for this can follow #17. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

Implement pdf api

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /page/pdf{title} endpoint which provides the page in a pdf format.

The API returns a file for straight download, so my intial thought is we'll have to stream the data to actually get the file to the user.

Any discussion on this is welcome.

Ability to get a page from URL, not only the title and pageId

Hi,

It would be nice to also add the URL of the Wikipedia page as a parameter of wiki.page(), not only the title and pageId.

Thanks!

When I attempt to make multiple requests at once I get a lot of PageErrors (parallel or sequential) even on valid items

When I do these one at a time they all resolve with a page, when I do more than ten at a time they start to throw pageerrors. I made a little code sample that perfectly illustrates the issue:

const wiki = require('wikipedia');

const subjects = [ "University of Washington", "USC Gould School of Law", "Watergate", "Supreme Court", "Justice Clarence Thomas", "Harlan Crow", "resignation", "impeachment", "public trust", "code of ethics", "University of Washington", "USC Gould School of Law", "Watergate", "Supreme Court", "Justice Clarence Thomas", "Harlan Crow", "resignation", "impeachment", "public trust", "code of ethics" ];

async function GetWikiSummary(subject) {
    let result = {};

	try {
        result.subject = subject;
		const page = await wiki.page(subject);
        result.canonicalurl = page.canonicalurl;
	} catch (error) {
        result.error = error;
	}

    return result;
}

async function getWikiSummaries(subjects) {
    const results = [];
  
    for (const subject of subjects) {
      try {
        const summary = await GetWikiSummary(subject);
        results.push(summary);
      } catch (error) {
        results.push({ subject });
      }
    }
  
    return results;
}

console.log("Starting");

(async () => {
    const converted = await getWikiSummaries(subjects);
    //const converted = await GetWikiSummary('impeachment');
    console.log(JSON.stringify(converted, null, 2));
})();

console.log("Done");

the list is actually duplicated items to show that the first 10 resolve and the last 10 (even though they are the same) will throw page errors. If I have a lost of 20 items what is the recommended way to get 20?

the result of the above code looks like this:

Done
[
  {
    "subject": "University of Washington",
    "canonicalurl": "https://en.wikipedia.org/wiki/University_of_Washington"
  },
  {
    "subject": "USC Gould School of Law",
    "canonicalurl": "https://en.wikipedia.org/wiki/USC_Gould_School_of_Law"
  },
  {
    "subject": "Watergate",
    "canonicalurl": "https://en.wikipedia.org/wiki/Watergate_scandal"
  },
  {
    "subject": "Supreme Court",
    "canonicalurl": "https://en.wikipedia.org/wiki/Supreme_court"
  },
  {
    "subject": "Justice Clarence Thomas",
    "canonicalurl": "https://en.wikipedia.org/wiki/Clarence_Thomas"
  },
  {
    "subject": "Harlan Crow",
    "canonicalurl": "https://en.wikipedia.org/wiki/Harlan_Crow"
  },
  {
    "subject": "resignation",
    "canonicalurl": "https://en.wikipedia.org/wiki/Resignation"
  },
  {
    "subject": "impeachment",
    "canonicalurl": "https://en.wikipedia.org/wiki/Impeachment"
  },
  {
    "subject": "public trust",
    "canonicalurl": "https://en.wikipedia.org/wiki/Public_trust"
  },
  {
    "subject": "code of ethics",
    "canonicalurl": "https://en.wikipedia.org/wiki/Ethical_code"
  },
  {
    "subject": "University of Washington",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "USC Gould School of Law",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "Watergate",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "Supreme Court",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "Justice Clarence Thomas",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "Harlan Crow",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "resignation",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "impeachment",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "public trust",
    "canonicalurl": "https://en.wikipedia.org/wiki/Public_trust"
  },
  {
    "subject": "code of ethics",
    "error": {
      "name": "pageError"
    }
  }
]

Thanks for the help!

Implement media list REST API

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /page/media-list/{title} endpoint which lists the media files used in the page. This is something I would love to have in wikipedia.

Implementation for this should follow the summary or related method flow. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

I'll be happy to help anyone who wants to pick this up.

geoSearchError: wikiError: TypeError: url_1.URLSearchParams is not a constructor

I'm currently working on a Vue application that has the following method

    async getLocations() {
      this.pages = []
      try {
        const geoResult = await wiki.geoSearch(2.088, 4.023, {
          radius: 5000,
          limit: 20,
        })
        console.log(geoResult[0]) // the closest page to given coordinates
      } catch (error) {
        console.log(error)
      }
}

Unfortunately it returns this exception:

geoSearchError: wikiError: TypeError: url_1.URLSearchParams is not a constructor
    at AsyncFunction.wiki.geoSearch (webpack-internal:///./node_modules/wikipedia/dist/index.js:469:15)

wiki.geoSearch = async (latitude, longitude, geoOptions) => {
    try {
        const geoSearchParams = {
            'list': 'geosearch',
            'gsradius': (geoOptions === null || geoOptions === void 0 ? void 0 : geoOptions.radius) || 1000,
            'gscoord': `${latitude}|${longitude}`,
            'gslimit': (geoOptions === null || geoOptions === void 0 ? void 0 : geoOptions.limit) || 10,
            'gsprop': 'type'
        };
        const results = await request_1.default(geoSearchParams);
        const searchPages = results.query.geosearch;
        return searchPages;
    }
    catch (error) {
        throw new errors_1.geoSearchError(error);
    }
};

Error with proxy

How can i use it with a proxy ? I have this error

searchError: wikiError: FetchError: request to https://en.wikipedia.org/w/api.php?list=search&srprop=&srlimit=3&srsearch=Who%20is%20Harry%20Potter?&srinfo=suggestion&format=json&redirects=&action=query&o
rigin=*& failed, reason: connect ECONNREFUSED 185.15.58.224:443
    at AsyncFunction.wiki.search (D:\developpement\Nodejs\wikipedia\node_modules\wikipedia\dist\index.js:55:15)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async D:\developpement\Nodejs\wikipedia\index.js:5:31 {
  code: undefined
}

Incorrect data parsed from Infobox

https://en.wikipedia.org/wiki/All_Around_the_World_(Lisa_Stansfield_song)

const page = await wiki.page(pageTitle);
return page.infobox({ redirect: false });

returns

{
  name: 'All Around the World',
  cover: 'Lisa Stansfield - All Around the World.jpg',
  border: true,
  caption: 'Artwork for releases outside North America',
  type: 'Singles',
  artist: '2003',
  album: 'Affection (Lisa Stansfield album)',
  bSide: '"Wake Up Baby" (7"),"The Way You Want It" (12")',
  released: '16 October 1989',
  recorded: '1989',
  length: 'Duration',
  label: 'Arista Records',
  writer: [ 'Lisa Stansfield', 'Ian Devaney', 'Andy Morris' ],
  producer: [ 'Ian Devaney', 'Andy Morris' ],
  prevTitle: '8-3-1',
  prevYear: '2001',
  nextTitle: 'Too Hot (Kool & the Gang song)',
  nextYear: 'External music video',
  misc: 'Extra chronology',
  title: 'All Around the World (Norty Cotto Mixes)',
  year: '2003'
}

artist: '2003' is off

CORS error fetching summary in Firefox

I created a vue app where I want to show info to a specific location.
This is my code for fetching the summary:
const summary = await wiki.summary(pageName);

In chrome everything works perfectly, but in Firefox I'm getting this error:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://de.wikipedia.org/api/rest_v1/page/summary/Stuttgart. (Reason: header ‘user-agent’ is not allowed according to header ‘Access-Control-Allow-Headers’ from CORS preflight response).

Failing coverage check on forked Pull Requests

As seen on #17 , #12 , #11 and any other forked PRs to master, the coverage check fails.

The check fails due to CC_TEST_REPORTER_ID which is used in uploading test reports to codeclimate not being available to forks. The discussion at https://github.community/t/make-secrets-available-to-builds-of-forks/16166 is inconclusive, meaning we have to find our own solution or remove the check completely.

Possible solutions include:

Find a way to make CC_TEST_REPORTER_ID available to forks following the above link.
Remove the check from PRs. This will involve playing around with the main.yaml github action file to get it just right.
Non Ideal Make CC_TEST_REPORTER_ID public. This is something we really shouldnt do as people using parts of the code might end up using this secret.

Other languages not fully working

Getting results in other languages has problems. Grabbing a page works, but things like summaries and On This Day are not. It seems like it's not using the correct REST url.
For example, for the Swedish site the summary page should be sv.wikipedia.org/api/rest_v1/page/summary/Stockholm
but it tries to use sv.wikipedia.org/v1/page/page/summary/Stockholm instead.

const wiki = require('wikipedia');
 
(async () => {
    try {
        const changedLang = await wiki.setLang('sv');
        const page = await wiki.page('Stockholm'); // Works
        const summary = await wiki.summary('Stockholm'); // Fails
        console.log(page, summary);
    } catch (error) {
        console.log(error);
    }
})();

Invalid infobox value for https://fr.wikipedia.org/wiki/Marseille

page: https://fr.wikipedia.org/wiki/Marseille

Wikipedia value: 13055 et de [[Secteurs et arrondissements de Marseille|13201 à 13216]]

Expected: 13055 et de 13201 à 13216 probably ?
Actual: Secteurs et arrondissements de Marseille

await wiki.setLang('fr');
const page = await wiki.page('Marseille');
console.log(await page.infobox());

Implement generate citation data

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /page/mobile-sections/{title} endpoint which which provides citation data for a given url.

Implementation for this can follow #17. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

Implement random page API

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /page/random/{format} endpoint which which gives a page in given format. This is something I would love to have in wikipedia. Find more details here.

Implementation for this should follow the summary or related method flow. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

I'll be happy to help anyone who wants to pick this up.

Dropping support for node 10, 12

We are planning to drop support for node versions:

10.x.x
12.x.x
Our minimum version will be node 14.x.x.

It would be great to hear if the community feels like there might be an issue to dropping these versions.

Clarity on browser support

The README claims that the package can be used in browsers, but I couldn't find any documentation on it.

Is this actually feasible, how so?

Using `instanceof` to detect a `pageError`

I want to detect when a page doesnt exist so Im catching exceptions, and trying to work out if the exception is a pageError.

So Im trying to use :

      if (wikiError instanceof pageError) {

It works if I import the class using :

import { pageError } from "wikipedia/dist/errors";

..but if I use the barrelled main export types from the d.ts like so

import { pageError } from "wikipedia";

It fails with

Right-hand side of 'instanceof' is not an object

Obviously I don't want to rely on digging into the dist folder, any ideas?

Use encodeUriComponent() instead of encodeURI() for better encoding for search params

I like to change to encodeUriComponent() instead of encodeURI() for better encoding for search params.

Move from node-fetch to got,ky or axios

Issue

Updating node-fetch is a headache because the module seems to be using different formats which are not supported by modern typescript compilers and jest, both libraries which are very widely used. Additionally, node-fetch takes up a few more mbs as shown in https://www.npmjs.com/package/got#comparison, and also is less actively maintained.

Solution

Analyze the other major HTTP modules like got, ky, or Axios and transition Wikipedia to these libraries instead of node-fetch.
My initial feeling is using got because of its small size and active maintenance but other suggestions are welcome.

Stray console.log

Hi,

It's possible to remove this https://github.com/dopecodez/Wikipedia/blob/master/source/request.ts#L50 which is causing problems with the console in my app?
I can PR if that's the preference.

Thanks!

Implement featured content api

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Anyone who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /feed/featured/{year}/{mm}/{dd} endpoint which provides featured content for that particular day. Implementation for this can follow #8 .

Implementation for this should follow the summary or related method flow. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

Implement events on this day API

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /feed/onthisday/{type}/{mm}/{dd} endpoint which which provides events that historically happened on the provided day and month. We should support month and date in string format and also support the types of events.

Implementation for this should follow the summary or related method flow. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

I'll be happy to help anyone who wants to pick this up.

dopecodez / wikipedia Goto Github PK

wikipedia's People

Contributors

Stargazers

Watchers

Forkers

wikipedia's Issues

Issue

Solution

Recommend Projects

Recommend Topics

Recommend Org