"You just won't believe how vastly, hugely, mind-bogglingly big it is."

filed under:

2014-04-11 Son of Bashpodder

I like listening to podcasts, and as someone who likes using composable, pipe-able, Unix-style applications, the world of podcatching software is pretty frustrating. The best solution for Unix weenies that I've found to date is BashPodder, which is a small, hackable bash script that uses XSLT transformations to pull the enclosure URLs out of your podcast feeds and wget to download them. This is an improvement over web-based services and hulking iTunes-style GUI applications, but it still uses XSLT, which is difficult for most human beings, myself included, to understand and hack.
The problem isn't Bashpodder's fault, really, since XSLT probably is the easiest way to interact with XML in the way Bashpodder needs to. The bigger problem is that podcasts use XML at all. XML is not simple, and therefore should be eschewed by anything that bills itself as "Really Simple Syndication." More importantly for Unix weenies (who deal with complicated, over- or under-specified technologies all the time), RSS and Atom aren't well-suited to piping around a shell and slicing with sed, awk, and the other traditional Unix text processing tools.
Fortunately, other people with more patience have created libraries for the major scripting languages that abstract away most of these complexities and provide relatively simple interfaces for working with RSS and Atom feeds. The RSS module in Ruby's standard library is one of the best of these I've seen and really makes it easy to turn the M.C. Escher-esque XML soup of a podcast feed into something that is more palatable to a shell user. In fact, when combined with the Liquid templating gem, the whole project is pretty trivial.
Enter tabcast, a short Ruby program that lets you slurp down a podcast URL and turn it into a log-style, line-per-item format that is easy to use in shell scripts and interactively on the command line.
By default, tabcast will simply output a tab-delimited list of each item's Unix timestamp, title (with whitespace converted to underscores), and the URL of the item's enclosure.
This means that it's now easy to work with these on the command line. So, if we want to download the latest 5 episodes of the Kingdom of Loathing podcast, we could simply:
$ for url in $(tabcast | sort -rn | head -n 5 | awk '{print $3}'); do wget $url; done
...or if we wanted to download all the episodes into folders named after the year an episode was released:
$ tabcast | while read episode; do url="$(echo $episode | awk '{print $3}')"; year$(date --date@$(echo $episode | awk '{print $1}') +%Y); mkdir -p $year; cd $year; wget $url; cd ..; done
You can create custom formats as well, using Liquid templates as I mentioned earlier. For example, if you would prefer a pipe-seperated list with urlencoded titles, you would simply use:
{% raw %} $ tabcast --format "{{utime}}|{{title | urlencode}}|{{enclosure_url}}\n" {% endraw %}
The idea is to leverage all the awesome work somebody's already done in exposing a sane RSS interface to Ruby and use it to expose a sane RSS interface to command-line users. I wish this wasn't necessary and there were a simpler and more friendly media syndication format in widespread use, but that's a rant for another day. In the meantime, I will just continue to use tabcast and muddle through.
tabcast hasn't, for me, replaced Bashpodder. Instead, it's taken the place of all the messy XSLT transformations in my already much-modified copy of the Bashpodder script. -----