wikipedia – Tarik Billa

Wikipedia API for geolocations

September 17, 2023 by Tarik

UPDATED ANSWER: Wikilocation has been retired and now there is an OFFICIAL WIKIPEDIA API ? action=query & list=geosearch & gsradius=<radius-in-meters> & gscoord=<lat>|<lon> HTML example | JSON Example

How to obtain a list of titles of all Wikipedia articles

August 23, 2023 by Tarik

The allpages API module allows you to do just that. Its limit (when you set aplimit=max) is 500, so to query all 4.5M articles, you would need about 9000 requests. But a dump is a better choice, because there are many different dumps, including all-titles-in-ns0 which, as its name suggests, contains exactly what you want … Read more

Searching Wikipedia using the API

August 22, 2023 by Tarik

I don’t think you can do both in one query. 1. To get the first result, use the Opensearch API. https://en.wikipedia.org/w/api.php?action=opensearch&search=zyz&limit=1&namespace=0&format=jsonfm https://en.wikipedia.org/w/api.php ?action=opensearch &search=zyz # Search query &limit=1 # Return only the first result &namespace=0 # Search only articles, ignoring Talk, Mediawiki, etc. &format=json # ‘jsonfm’ prints the JSON in HTML for debugging. This will … Read more

How to get plain text out of Wikipedia

August 12, 2023 by Tarik

Here are a few different possible approaches; use whichever works for you. All my code examples below use requests for HTTP requests to the API; you can install requests with pip install requests if you have Pip. They also all use the Mediawiki API, and two use the query endpoint; follow those links if you … Read more

Fetch a Wikipedia article with Python

August 4, 2023 by Tarik

You need to use the urllib2 that superseedes urllib in the python std library in order to change the user agent. Straight from the examples import urllib2 opener = urllib2.build_opener() opener.addheaders = [(‘User-agent’, ‘Mozilla/5.0’)] infile = opener.open(‘http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes’) page = infile.read()

Extract the first paragraph from a Wikipedia article (Python)

June 10, 2023 by Tarik

I wrote a Python library that aims to make this very easy. Check it out at Github. To install it, run $ pip install wikipedia Then to get the first paragraph of an article, just use the wikipedia.summary function. >>> import wikipedia >>> print wikipedia.summary(“Albert Einstein”, sentences=2) prints Albert Einstein (/ˈælbərt ˈaɪnstaɪn/; German: [ˈalbɐt ˈaɪnʃtaɪn] … Read more

What is wikipedia pageid? how to change it into real page url?

January 23, 2023 by Tarik

You can just use a URL like this: http://en.wikipedia.org/?curid=18630637 This is the shortest form, others are also possible: http://en.wikipedia.org/wiki?curid=18630637 http://en.wikipedia.org/wiki/Translation?curid=18630637 http://en.wikipedia.org/w/index.php?curid=18630637 Note that MediaWiki ignores the page title if you specify a curid, so even http://en.wikipedia.org/wiki/FooBar?curid=18630637 leads to the same page.

Is there a Wikipedia API?

December 16, 2022 by Tarik

MediaWiki’s API is running on Wikipedia (docs). You can also use the Special:Export feature to dump data and parse it yourself. More information.

Is there a Wikipedia API just for retrieve the content summary?

October 25, 2022 by Tarik

There’s a way to get the entire “introduction section” without any HTML parsing! Similar to AnthonyS’s answer with an additional explaintext parameter, you can get the introduction section text in plain text. Query Getting Stack Overflow’s introduction in plain text: Using the page title: https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=Stack%20Overflow Or use pageids: https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&pageids=21721040 JSON Response (warnings stripped) { “query”: … Read more