automation_scripts.py: A Blog Post in 150 Lines of Code
Four Python scripts I actually use. Bulk renamer, downloads folder organizer, duplicate finder, and a website change detector.

I hate doing repetitive tasks on my computer. Like, genuinely hate it. If I have to do the same thing more than twice, my brain starts looking for the exit. I'll spend forty minutes writing a script to automate a task that would have taken five minutes to just do manually. Is that rational? No. Does it feel better? Every single time.
Over the past couple of years I've accumulated this little folder of Python scripts โ nothing polished, nothing production-grade โ just stuff I wrote because something was annoying me. I figured I'd share a few of them here, along with the stories of what pushed me to write them in the first place. Maybe you'll find one useful. Maybe you'll look at my code and wince. Either way.
The Bulk File Renamer (Born From Disaster)
This one has a story. Back in 2024 I was doing some freelance work for a small accounting firm. They sent me a folder with about 200 invoices, all named stuff like INV_2024_ClientName_Draft.pdf. They wanted them renamed to ClientName_INV_2024_Final.pdf. Simple enough, right?
I opened a Python REPL, wrote a quick re.sub call, ran it on the whole folder without testing first. Turns out my regex was greedier than I thought. It ate parts of the client names. Half the files ended up named things like _INV_2024_Final.pdf with the client name just... gone. I spent the next hour manually cross-referencing file sizes and creation dates with a backup to figure out which file was which.
So when I sat down to write a proper bulk renamer, the very first thing I did was make dry_run=True the default. You have to deliberately opt in to actually changing anything.
import os
import re
from pathlib import Path
def bulk_regex_rename(directory: str, pattern: str, replacement: str, dry_run: bool = True):
"""
Renames files in a directory using regex.
dry_run=True by default because I learned the hard way.
"""
target = Path(directory)
renamed_count = 0
if dry_run:
print("=== DRY RUN (no files will be changed) ===\n")
for filepath in sorted(target.iterdir()):
if filepath.is_file():
new_name = re.sub(pattern, replacement, filepath.name)
if new_name != filepath.name:
renamed_count += 1
print(f" {filepath.name}")
print(f" -> {new_name}\n")
if not dry_run:
filepath.rename(target / new_name)
print(f"--- {renamed_count} file(s) {'would be' if dry_run else ''} renamed ---")
The way I use it: I run it once with dry_run=True, visually scan the output, then if everything looks sane I flip it to False and run again. Takes two seconds but it has saved me from myself more times than I can count.
One thing I should mention โ this doesn't handle name collisions. If two files would end up with the same name after the rename, the second one just overwrites the first. I keep meaning to add a check for that. Haven't gotten around to it. For my use cases there's never been a collision, so it stays at the bottom of the to-do list. You might want to add a guard if you're going to use this on anything important.
I also sort the directory listing before iterating, which the original version didn't do. The reason is just predictability โ when I'm scanning the dry run output, I want to see the files in order so I can spot patterns faster. Without the sort, iterdir() returns files in whatever order the filesystem feels like, which on Windows is usually alphabetical anyway but on Linux it's... not.
The Downloads Folder Organizer
This one needs less of a story because the story is universal: my Downloads folder is a war zone. I'll have screenshots next to tax forms next to random .json files I downloaded from some API three weeks ago. It gets to the point where I can't find anything, so I just download the same file again, which makes the problem worse.
The script is simple. It looks at file extensions and moves things into subfolders.
import shutil
from pathlib import Path
def organize_downloads(directory: str):
"""
Sorts files into subfolders by type.
Run this on a schedule if you're as messy as I am.
"""
target = Path(directory)
extensions = {
'Images': ['.jpg', '.jpeg', '.png', '.gif', '.webp', '.svg', '.bmp'],
'Docs': ['.pdf', '.docx', '.doc', '.csv', '.xlsx', '.pptx', '.txt'],
'Archives': ['.zip', '.tar', '.gz', '.rar', '.7z'],
'Code': ['.py', '.js', '.ts', '.json', '.html', '.css', '.md'],
'Videos': ['.mp4', '.mkv', '.avi', '.mov'],
'Audio': ['.mp3', '.wav', '.flac', '.aac'],
'Installers': ['.exe', '.msi', '.dmg', '.deb'],
}
# Build a reverse map: extension -> folder name
ext_map = {}
for folder, exts in extensions.items():
for ext in exts:
ext_map[ext] = folder
moved = 0
skipped = 0
for item in target.iterdir():
if item.is_file():
folder_name = ext_map.get(item.suffix.lower())
if folder_name:
dest_folder = target / folder_name
dest_folder.mkdir(exist_ok=True)
dest_path = dest_folder / item.name
if dest_path.exists():
# Don't overwrite, just skip
print(f" [SKIP] {item.name} (already exists in {folder_name}/)")
skipped += 1
continue
shutil.move(str(item), str(dest_path))
print(f" [MOVED] {item.name} -> {folder_name}/")
moved += 1
print(f"\nDone. Moved {moved} files, skipped {skipped}.")
I expanded the extension list from the original version because I kept finding files that didn't get sorted. The original had like four categories. Now there are seven, and I still occasionally find a .heic or .webm that slips through. Apple's image formats are a whole thing I don't want to deal with.
The skip-if-exists check is there because I once ran this twice in a row. The second run tried to move files that were already in their destination folders and it threw errors everywhere. Not a great experience at 11 PM when you're already annoyed at your messy Downloads folder.
I have this hooked up to Windows Task Scheduler on my machine. It runs every Sunday at midnight. I wake up Monday morning and my Downloads folder is clean. It's one of those tiny quality-of-life things that adds up.
One limitation: it doesn't handle nested folders at all. If there's a folder sitting in Downloads, it just ignores it. I've thought about adding recursive handling but honestly, folders in my Downloads directory are usually project folders I put there deliberately, so leaving them alone is the right behavior for me.
The Duplicate File Finder (Or: How I Reclaimed 12 GB)
I wrote this one after my laptop started complaining about disk space. I had a feeling I had a lot of duplicate files โ copies of the same photos in different folders, documents I'd saved in three places "just in case," that kind of thing.
The approach is straightforward: hash the contents of every file and look for collisions. If two files have the same hash, they're (almost certainly) identical regardless of what they're named.
import hashlib
from pathlib import Path
def find_duplicates(directory: str, min_size_kb: int = 10):
"""
Finds duplicate files by hashing contents.
Only checks files above min_size_kb to avoid flagging
tiny config files that are legitimately the same.
"""
hashes = {}
duplicates = []
min_bytes = min_size_kb * 1024
print(f"Scanning {directory} for duplicates...\n")
all_files = [f for f in Path(directory).rglob('*') if f.is_file() and f.stat().st_size > min_bytes]
print(f"Found {len(all_files)} files above {min_size_kb} KB\n")
for i, filepath in enumerate(all_files):
# Hash first 2MB โ good enough for finding dupes,
# fast enough for large file collections
try:
chunk = filepath.read_bytes()[:2097152]
file_hash = hashlib.sha256(chunk).hexdigest()
except PermissionError:
continue
if file_hash in hashes:
original = hashes[file_hash]
size_mb = filepath.stat().st_size / 1024 / 1024
duplicates.append({
'file': filepath,
'original': original,
'size_mb': size_mb
})
print(f" DUPE: {filepath.name} ({size_mb:.1f} MB)")
print(f" matches: {original.name}")
print()
else:
hashes[file_hash] = filepath
# Progress indicator for big scans
if (i + 1) % 500 == 0:
print(f" ...checked {i + 1}/{len(all_files)} files")
total_wasted = sum(d['size_mb'] for d in duplicates)
print(f"--- Found {len(duplicates)} duplicates ---")
print(f"--- Wasted space: {total_wasted:.1f} MB ---")
return duplicates
A few things to note about this script.
First, I only hash the first 2 MB of each file. This is a tradeoff. Two files could theoretically have the same first 2 MB but differ after that. In practice, for the kind of files I'm dealing with (photos, documents, downloads), this has never been wrong. If you're worried about it, you could hash the full file, but be prepared for the script to take a very long time on large directories. I ran the full-hash version on my photo library once and it took over twenty minutes. The partial-hash version does the same folder in about forty seconds.
Second, the min_size_kb parameter. I added this after the first run flagged hundreds of "duplicates" that were all tiny .gitignore or .editorconfig files that happen to be identical across different projects. That's not wasted space, that's just how config files work. Setting the minimum to 10 KB filters out that noise.
Third โ and this is important โ the script only finds duplicates. It doesn't delete anything. I tried adding an auto-delete feature once and nearly lost a bunch of photos because the script kept the copy in a random temp folder and deleted the one in my organized photo library. Now I just print the results and decide manually what to remove. Less exciting but way safer.
The first time I ran this on my home folder, it found about 12 GB of duplicates. Most of them were photos I'd copied into multiple folders while trying to organize them. The irony of having a messy duplicate situation caused by trying to be organized is not lost on me.
The Website Change Detector
This is the newest of the four and probably the one I'm most fond of, even though it's also the simplest. I wrote it because I was trying to get tickets to a concert last year. The venue's website said "tickets on sale soon" with no date. I found myself refreshing the page four or five times a day like some kind of maniac. So I wrote a script to do the refreshing for me.
import hashlib
import requests
from pathlib import Path
from datetime import datetime
def check_for_changes(url: str, state_dir: str = ".web_watch"):
"""
Checks if a webpage has changed since the last check.
Stores state files in a directory so you can watch multiple URLs.
"""
state_path = Path(state_dir)
state_path.mkdir(exist_ok=True)
# Use URL hash as filename so we can track multiple sites
url_hash = hashlib.md5(url.encode()).hexdigest()[:12]
state_file = state_path / f"{url_hash}.txt"
try:
response = requests.get(url, timeout=10, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
})
response.raise_for_status()
current_hash = hashlib.md5(response.text.encode('utf-8')).hexdigest()
except requests.exceptions.RequestException as e:
print(f"[ERROR] Couldn't reach {url}")
print(f" {e}")
return None
now = datetime.now().strftime("%Y-%m-%d %H:%M")
if state_file.exists():
old_hash = state_file.read_text().strip()
if old_hash != current_hash:
print(f"[CHANGED] {url}")
print(f" Detected at {now}")
state_file.write_text(current_hash)
return True
else:
print(f"[NO CHANGE] {url} (checked {now})")
return False
else:
print(f"[FIRST RUN] {url}")
print(f" Saved initial snapshot at {now}")
state_file.write_text(current_hash)
return None
The original version stored state in a single file, which meant you could only watch one URL at a time. This version creates a directory and stores each URL's hash in a separate file (named by the URL's own hash, so no weird characters in filenames). Now I can watch a bunch of pages at once.
I also added a proper User-Agent header because some sites started blocking requests with Python's default user agent string. The concert venue was one of them, actually. My script kept saying "ERROR: couldn't reach URL" and I thought the site was down. Nope โ they were just rejecting requests that looked like bots. Which, fair enough, my script literally is a bot.
Here's how I actually use this: I have a separate little runner script that just calls check_for_changes() on a list of URLs, and I have that running on a cron job every thirty minutes. When something changes, it prints to a log file. I keep meaning to wire it up to send me a Slack notification or a text message or something, but reading a log file has been good enough so far. One of these days I'll add a webhook. One of these days.
The big limitation of this approach is that it triggers on any change to the page HTML. A lot of modern websites have dynamic content โ timestamps, ad placements, session tokens embedded in the markup โ that change on every page load even when the actual content hasn't changed. I've had false positives from news sites where the only thing that changed was the "trending now" sidebar. You can work around this by parsing the HTML and only hashing a specific section, but that's site-specific work that I only bother doing for pages I'm watching long-term.
For the concert tickets, though? Worked perfectly. The page went from "on sale soon" to having an actual purchase link, my script caught it, and I got my tickets that same afternoon. Sometimes the dumb simple approach is the right one.
How I Actually Run These
None of these scripts are meant to be run as standalone programs. I keep them all in a folder called ~/scripts/ and import them into whatever I'm doing. Sometimes that's a Jupyter notebook, sometimes it's a quick one-off script, sometimes I just open a Python REPL and type:
from organize import organize_downloads
organize_downloads("C:/Users/anurag/Downloads")
For the ones that run on a schedule (the downloads organizer and the web watcher), I use Windows Task Scheduler. On Linux I'd use cron. The setup is a little annoying the first time but once it's running you forget about it, which is the whole point.
I also want to mention something about error handling in these scripts. You'll notice it's pretty minimal. There are no retries, no fancy logging, no configuration files. That's on purpose. These are personal utility scripts. If one fails, I see the error in my terminal and fix it. If I were distributing these as a tool for other people, I'd add all that stuff. But for something that runs on my machine for my purposes, the extra code just gets in the way of understanding what the script does.
That said, I do wrap everything in try/except when there's a network call or file operation that could fail in a way that would lose data. The file renamer silently failing on one file is fine. The web watcher crashing and not saving state is fine. The organizer overwriting a file I needed is not fine โ that's why the skip-on-collision check is there.
Scripts I Haven't Written Yet
I have this running list on my phone of automation ideas that I keep adding to and never actually working through. Here are a few:
A clipboard history manager that logs everything I copy to a searchable text file. I use Windows clipboard history but it only goes back so far and you can't search it properly. I've started writing this one twice and both times got bogged down in figuring out how to poll the clipboard without burning CPU.
An email attachment saver that connects to my Gmail, downloads all attachments from the past month, and organizes them by sender. I get a lot of documents via email and then forget to save them somewhere sensible. The Gmail API authentication setup is the part that keeps stopping me โ OAuth flows are never as simple as the documentation makes them sound.
A screenshot-to-text thing that watches a folder for new screenshots and runs OCR on them automatically. I take a lot of screenshots of error messages and code snippets and it'd be nice to be able to search through them by content. I know Tesseract exists. I just haven't sat down and wired it up.
Maybe by the time I write another post like this, one of those will actually exist. No promises though. My track record with "I'll build that soon" is not great.
If you end up using any of these scripts and improving on them, I'd genuinely like to hear about it. Half the fun of writing little tools like this is seeing what other people do with them. The other half is never having to manually rename 200 files ever again.
Written by
Anurag Sinha
Developer who writes about the stuff I actually use day-to-day. If I got something wrong, let me know.
Found this useful?
Share it with someone who might find it helpful too.
Comments
Loading comments...
Related Articles
Monolith vs. Microservices: How We Made the Decision
Our team's actual decision-making process for whether to break up a Rails monolith. Spoiler: we didn't go full microservices.
An Interview with an Exhausted Redis Node
I sat down with our caching server to talk about cache stampedes, missing TTLs, and the things backend developers keep getting wrong.
Debugging Slow PostgreSQL Queries in Production
How to track down and fix multi-second query delays when your API starts timing out.