Single Day Project 1: Track Changes Of Webpages

In life, many of our endeavors take several days, weeks, or even months to come to fruition. The waiting period can be a test of patience, and at times, it can drain our motivation and lead to boredom. But, what if we could reframe this waiting period as an opportunity rather than an obstacle? After all, boredom is an essential element of the creative process, and it's during these moments of idleness that our most imaginative ideas tend to emerge. That's why, from time to time, I set out to challenge myself by completing a small project within a single day, harnessing the creative inspiration that boredom can provide, and relishing the sense of accomplishment it brings, which propels me forward with my other tasks.

The project I'm about to delve into is the enhancement of my webpage change tracking script.

In the past, I developed a simple Python script that checks a list of websites. This script works by visiting each website one by one, generating hashes from their HTML content, and then comparing these hash values with their previous versions. When a change is detected, I receive a notification via Discord, chosen for its seamless integration with the code. (You can find the old Python script here).

Over time, my needs have evolved. I now wish to monitor more than 20 webpages. The quantity of webpages isn't an issue for the script, but it has become a concern for me due to the volume of notifications. To address this, I've decided to categorize the webpages. For lower-priority categories, I've set up separate text channels in Discord, allowing me to mute some of them and check them at my convenience. I've also implemented a feature to save HTML files at each change, providing a historical record.

In my pursuit of continuous improvement, I found an opportunity to sharpen my skills in SQL and JavaScript, which had been getting a bit rusty from lack of use. To achieve this, I've incorporated MySQL and Node-Red into the project, enhancing its modularity. As I find myself with more free time, I plan to further refine and expand its functionalities.

If you're interested in exploring the code and project details, you can access them on my GitHub repository.

Data Base

Let's begin by establishing a straightforward database (db). To receive notifications from different channels for each tracked webpage, I've chosen to organize them into categories. By assigning each category to a specific channel, I can achieve my notification goals. Looking ahead, I might even want to receive notifications in multiple ways. For instance, for websites of great importance, I might want to receive not only a Discord message but also an email or SMS alert. Therefore, each webpage can be associated with multiple categories. Here's the database structure I've opted for:

Code

There's not much to elaborate on the operational aspects of this project, but let's delve into the technical details. The magic happens in the following sequence:

  1. Data Retrieval: The list of webpages is dynamically pulled from the "websites" table in the database. This list is then stored in the flow's context for further processing.
  2. Iterative Processing: The script then meticulously loops through each webpage one by one. For every webpage in the list, it goes through a series of operations.
  3. Hash Generation: The first step involves creating a hash from the response retrieved from the webpage. This hash represents the current state of the webpage's content.
  4. Comparison: The generated hash is then compared to the previous hash, if it exists. If there's no previous hash (indicating this is a new webpage addition), the script stores this new hash.
  5. Change Detection: If a change in the hash is detected, indicating that the webpage's content has been altered, the current response is saved to a designated folder. This step is essential for maintaining a historical record of webpage changes.
  6. Category Collection: To provide context and organization, the script fetches matching categories from the database for the changed webpage.
  7. Notification Handling: Now, it's time for the chosen method of notification to come into play. Currently, the notification action is set to Discord. The notification message is rather simple, merely indicating that a change has been detected. However, this is just the beginning. In future iterations of this project, the message could include details about the specific changes detected by comparing the current and previous responses from the webpage.
  8. Throttling: To ensure efficiency and avoid overwhelming the server, the script follows a scheduled routine. Webpages are checked at intervals of 4 hours, and each individual website is checked at a rate of 1 every 5 seconds. This careful pacing helps to strike a balance between timely updates and minimizing the load on the server.

In this way, the script effectively tracks changes on the monitored webpages, providing you with a structured and efficient approach to keeping tabs on the information that matters most to you.


I didn't have time to make a GUI. For now I'm using phpmyadmin for altering db. maybe I can do that in another day!
 

For further insights into the script and its implementation, you can refer to my GitHub repository.<