Tagged with command line gazing

building a CLI flow part 1

Building a cli program flow in order to archive a single webpage or article.

The problem to be solved:

I have a desire to be able to, through the command line, download a single webpage into a self-contained .html file with the option to convert it to (or save alongside) other formats such as PDF, EPUB, or even md if possible. There are two main use-cases I want to offer: 1) get the entire webpage, strip out the ads/useless shit, and save the new version locally & 2) for journal/website articles, blog entries, or tutorials: simply save the main content and discard every other element from the source.

The initial idea for solution:

A bash script that takes in a URL and, through menu redirection, outputs the desired file(s) into a standard directory.

Why not use currently available methods:

I’ve come across quite a few programs and scripts that seem like they would be perfect for what I want to do, but unfortunately I’m using termux on an Android and most of them do not work for one reason or the other in this enviromment.

CLIs that I have tried with no luck to get working on Termux:

The semi-focused detailing of my initial mind thoughts:

So, I feel like my grasp of scripting and pipelines is decent enough to cobble together a functioning bash script using programs and sources at my disposal that should take care of what I want to do. Namely, a locally saved, single file .html and have the option/ability to convert to PDF, md, or EPUB if desired. In my mind, there will be several working pieces of this script that could and should be broken down into smaller modules\functions .

I’m designing and planning on a menu-driven interface so creating menu-building and menu-displaying functions will be two individual tasks. Directory checking, creation, and manipulation should be another set of tasks. Error handling, logging and graceful failures are another set. And finally, the actual downloading and parsing into files will be a set of functions.

I’m aware that bash scripting might not be the optimal route to go with this idea, due to processing times, data handling and manipulation, and the fact that other languages come with the advantage of libraries and support modules that cover just about every detail and “feature” I could possibly come up against. However, it will serve as a basis for laying out the flow and main ideas of what I want my script to do as well as help prototype and get a working product to use as proof of concept.

From the get-go, there are several options that I can and will be utilizing to achieve my end-goal. I’m hoping that using and offering several options of methods will not over-complicate the road ahead. The ability to write code that can be easily extended and has the ability to easily have options added to it is something that I am working towards. The thought that I should focus on one (one method for obtaining the webpage, one method for processing the downloaded data, one method for saving, and one method for file conversion) is there, and it has validity. However, I am blockheaded when it comes to ambitions, and throwing caution to the wind is the name of the game right now.

Options in consideration for scraping/downloading/getting source:
  • wget
  • cURL
  • httpie
  • ferret
  • w3m

This is as far as I have gotten, planning and layout wise. The next part of this session of brain-vomitting will cover the logic trains, process flow, and possible ways of brainstorming the layout of the script.

TODO:

  • research the methods of using ferret to scrape websites and obtain the contents.
  • look into flags and options of wget cURL httpie and w3m.
  • Sketch out the menus and brainstorm ways for it to work.
  • For processing… HTML2JSON? HTML2MD? HTML2?????
  • File conversion should be PANDOC, but of course, the perfect tool doesn’t work under termux.
Tagged , ,

new skill acquired - bash-hacking

I finally understand a couple ubiquitous things in nerd culture, computer nerd culture in particular; their sharing and discussion of dotfiles and bash scripts has become a topic of great interest to me. Granted, my experience has been with a modified linux environment running on an Android app, but it still relies exclusively on operating via a command line. There are ways to run a x11 GUI, but everything starts and ends at the CLI.

The topic of cultivating, sharing, and storing for reuse of dotfiles had no weight of importance for me until semi-recently. I first started actively using and relying on Linux as my one and only OS in the winter of 2018. My boyfriend at the time was like me, a tweaker conspiracy nut, and had a serious “privacy in the digital age” fascination bordering on obsession. By this time, my PC had gone down, and Windows 10 on my laptop was only frustrating me. Jay helped me wipe my harddive and install Mint Linux on it. I used it lightly, meaning I didn’t delve too deeply into customizing or exploring options for it, and just stuck to the GUI and programs that came with one. Later on, for various reasons, I wanted to have a running Linux distro on my phone and tried various methods I came across online to get a working distro as a boot option for my old ZTE. Rooting didn’t work for me. In fact, I seemingly bricked my phone and created a panic like I’ve never given myself before. Once the phone was back to working order (bootloader locked), I came across an article detailing the virtues of running Termux as a linux environment inside Android. The only “downside” was it was command line driven.

I’ve always been nerdy and geeky, but my main interests have always been SciFi books and movies and music. I dabble in coding, have built my own PC multiple times, and know more about computers than the average monkey, but I would never claim to be fully knowledgeable in computer science and related fields. But I am geeky enough to know how to research (thanks B.A. in History!) things and I enjoy solving new puzzles and systems. So the command line presented a new problem to attack and solve. Termux came with bash as the default shell and minimal instructions on how to proceed from there. I dabbled in bash for a minute until my searches turned me onto ZSH. ZSH has some nice handy features, but they’re not enabled from the get-go and require some configuration and modification of it’s RC files. This in turn led me to searching through github and google for other people’s configurations and use of these files.

Then, the ultimate tragedy struck. I had to switch phones and couldn’t access my old dotfiles that were on my phone due to a cracked screen. I had to start over from square one. The importance of version control and backing up of my dotfiles finally struck home. Thankfully, I sorta remembered the sources I used prior to this, and got most of the functionality I was used to up and running.It also gave me the chance to stop using the OMZ framework and start using a little more DIY approach to managing my ZSH shell and the functions I use on the regular. I’ve learned and use the order that ZSH loads it’s files and have, to the best of my ability, kept things separate and where they need to be.

The other area of growth has been in shell scripts. One of the processes I found myself doing over and over again was copying the url to a file or site that I wished to download/mirror and going into Termux and pasting it into a line using wget. One day, I noticed that Termux was on the menu of apps that could be shared to and out of curiosity I passed the URL of a raw file from github to it. A popup came stating that I needed a script with the path ~/bin/termux-url-opener in order to deal with info passed to Termux. Thus began my researching of building shell scripts and dealing with variables passed to the script. My first script is highly personalized to my needs and uses case switching to determine the url being passed to it. Once the url is determined, various means of downloading are implemented. If the url is youtube, youtube-dl is used. If the url is github, wget is used. The script is clunky and very amateur, but it serves its purpose and has lit the fire of “what else can i script” in my brain. I have a list growing of various processes and functions that I would like to automate.

I am becoming more of a Computer Nerd every single day. I like this growth.

Tagged , ,