How to Download all Episodes of a Given Podcast in Python

Whether you want to back up a local copy of your favorite show or are a luddite who refuses to use Podcast software, there are definite times you might want to download the entire available catalog of a given podcast.

Downloading all episodes of a podcast typically involves some arduous right clicking and scrolling. Why should it be so difficult? It doesn\'t have to be, and using the python code in this blog post, you can download all the episodes of Data Skeptic.

This script should be pretty easy to adjust to download the podcast of your choosing too.

import requests
import xmltodict
import os
fname = 'feed.rss'
url = 'http://dataskeptic.com/feed.rss'
destination_directory = './podcast-audio-files'

if not(os.path.isdir(destination_directory)):
    os.mkdir(destination_directory)

The next code block downloads the feed for the Data Skeptic podcast. I first check to see if we\'ve already downloaded the file. If so, no need to eat up extra bandwidth doing it again. I added this test because I expect many people may copy and paste this code, and I don\'t want the feed downloaded lots of extra times as people muddle their way through a project, frequently re-running this section. The drawback is that if you want to refresh your analysis, you\'ll need to delete the feed.rss file.

Podcasts are distributed via RSS feeds which are formatted as XML. While XML is still a very popular standard, I find it more convenient to work on this project in JSON formal, so after I get the file, I use xmltodict to convert it to a dictionary.

if not(os.path.isfile(fname)):
    print 'fetching'
    r = requests.get(url)
    f = open(fname, 'wb')
    f.write(r.text.encode('utf-8'))
    f.close()

with open(fname) as fd:
    xml = xmltodict.parse(fd.read())
fetching
episodes = xml['rss']['channel']['item']

As a double check, let\'s see how many episodes are in the feed.

print(len(episodes))
133

Lastly, let\'s loop through every episode and download it to our destination_directory.

for episode in episodes:
    url = episode['enclosure']['@url']
    i = url.find('?')
    if i != -1:
        url = url[0:i]
    i = url.rfind('/')
    fname = destination_directory + '/' + url[i+1:]
    if not(os.path.isfile(fname)):
        r = requests.get(url)
        f = open(fname, 'wb')
        f.write(r.content)
        f.close()

That\'s all there is to it!