Sunday, June 18, 2017

Reflections on Kindle books, DRM, Youtube Videos and Music streaming services

Today, I came across an old article I wrote in 2013, more as a note to myself than anything else. As I read it again, I find myself in agreement with my older self. So here it is for anyone that may find it useful ...

Reflections on Kindle books, DRM, Youtube Videos and Music streaming services
Why DRM would never be good enough and why it doesn't make sense for services like Youtube or other music stores which let users stream content, not to allow direct downloads?

Whenever I watch a video on Youtube and would like to save it for later use, but do not find the download button right there, I cannot help but wonder why it isn't there!! Any time I want to watch the video, I can always come back to the website and play it as many times as I would like. When that is the case, doesn't it make more sense to be allowed to download it right on the first occasion, saving bandwidth and resources on both sides, and reducing the CO2 footprint? Ahh!! There are concerns of intellectual property rights. And then there is advertising revenue. And a whole lot many reasons which make all of our lives complicated. Same with music streaming services.

Now let's turn our attention to ebooks, specifically, Amazon Kindle books. So I purchase an ebook, and I have to read it through the kindle app. Somehow, I've always been skeptical of keeping my books and the ability to read them at the beck and call of a third party. How can I be sure that tomorrow I won't lose my Amazon account or that Amazon will not go bonkers and start deleting books on my account, thereby preventing me access to all my important books, and the even more valuable notes and comments that I make on them? So I started looking at ways to save my books so that I'll be able to access them independently of any third party. Sure enough, there are ways to deDRM the ebooks. But with time, Amazon changes the encryption algorithms, the keys used for encryption and so on. Every time a key or an algorithm is broken, a new one takes its place. So the deDRM process is continuously evolving and so, this I can't trust to always work. Is there any other way I can accomplish the task?

Like a publisher would always like to protect his intellectual properties, the same way, every bit of notes and comments I make on my texts is my own intellectual property, and more often than not, the comments are even more valuable to me than the texts themselves. So I must find a way to protect my intellectual properties not just for a few years for which amazon claims I rent the kindle book, but forever. What do I do then?

Well, supposing I have a hard copy textbook and I write comments upon it, in order for this to be saved, I can make a photocopy. And now, there is an electronic equivalent for the same. And this we can always trust to work, irrespective of what Amazon or Government or anybody else wants.

Open your ebook on the kindle app in full screen mode. Go to the first page. And print screen. Move to the next page, print screen. This way you have the entire book photocopied. Now automate the next page move and print page commands. And you can have the whole pdf!! Isn't this simple ? Does this not work?

Let's see.
I used an Ubuntu system. Login through Amazon cloud reader. Open a book you own. Go the first page. Set the font size appropriately and go to full screen mode.

Now Alt+Tab and open a terminal and run the following.

sleep 10; while true ; do gnome-screenshot ; xdotool key Down; sleep 1 ; done

Alt+Tab and go to the cloud reader in full screen mode.

The script simulates whatever a genuine reader does. It takes a screen shot of the page, then simulates a right arrow key press. This will take you to the next page. The screen shot is captured again and then it moves to the next page and so on. We may also choose to simulate mouse click operations in place of key press.

Ideally the loop needs to run for as many times as there are pages in the book. But because this is not exactly clear with different font sizes, I let it run forever. Whenever the last page is reached, it is being recaptured again and again. At that point, Alt+Tab to terminal. Ctrl+C .

Now go to ~/Pictures (where gnome-screenshot is captured) and remove the redundant images at the end of the list. (Also make sure this directory is empty before the process starts.)

cd ~/Pictures

# batch convert png screenshot files to pdfs
find . * -exec convert {} {}.pdf \;

# check that these two numbers match
ls *.png | wc -l
ls *.pdf | wc -l

#combine pdf
gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=outputfilename.pdf -dBATCH *.pdf

The outputfilename.pdf is the required pdf file. If you like, you may run this through OCR. I used XChange Viewer on Ubuntu.
Now we have everything we need.

Now supposing Amazon knows somebody is doing this. What can they do? They may change the way in which the display shifts from one page to another, and how the user interacts. And because this involves end user interaction, they can never make it very complicated. And so the script which can be modified to always simulate whatever the user has to do, would never itself get very complicated.

Supposing they somehow manage to disable screen capture itself, although I do not quite see how they can do that on anything other than a kindle. I'll run the whole app on a virtual machine and capture screen from outside. Supposing they somehow overcome this also, the worst I can resort to is install a camera in front of my computer and that'll do the screen capture for me. The point is that it can always be done. Period.

This is not a tutorial on how to copy DRM protected ebooks. But on how it isn't a valid idea to cripple usability for the sake of centralized control, intellectual right protection etc. when that itself can never be achieved. A sane user will always find a way out.

As for Youtube, can I not video capture my screen? Better, allow downloading directly as the videos themselves are free to watch over the web! Same is not the case with ebooks which publishers and authors would like to sell. For that, DRM alone does nothing more than try to achieve security through obscurity. And not much else. Can you ever prevent a hard copy of a text from being photocopied? It's like trying to secure communication between two parties when one of them is inherently compromised.

PS: Sure enough, the automation script can be improved, and made more user friendly, but you get the idea!