How can I get Wikipedia content using Wikipedia’s API?

Question

See this section in the MediaWiki API documentation, specifically involving getting the contents of the page.

use the sandbox to test the API call.

These are the key parameters.

prop=revisions&rvprop=content&rvsection=0

rvsection = 0 specifies to only return the lead section.

See this example.

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&rvsection=0&titles=pizza

To get the HTML, you can use similarly use action=parse

http://en.wikipedia.org/w/api.php?action=parse&section=0&prop=text&page=pizza

Note that you’ll have to strip out any templates or infoboxes.

edit: If you want to extract the plain text (without wikilinks, etc), you can use the TextExtracts API. Use the available parameters there to adjust your output.

https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exlimit=1&titles=pizza&explaintext=1&exsectionformat=plain

Leave a Comment Cancel reply