Automation is one of the most powerful features of Linux systems. By combining Python's versatility with cron's scheduling capabilities, you can automate virtually any repetitive task on your system. This guide will show you how to create robust automated workflows that run reliably in the background.
Why Python and Cron?
Python provides: - Easy-to-read syntax for writing automation scripts - Extensive libraries for system operations, web requests, file handling, and more - Cross-platform compatibility - Excellent error handling capabilities
Cron provides: - Reliable task scheduling at specified intervals - Runs tasks in the background without user interaction - Built into most Linux distributions - Minimal resource overhead
Together, they form a powerful automation toolkit for system administrators, developers, and power users.
Understanding Cron
Cron Basics
Cron is a time-based job scheduler in Unix-like operating systems. It reads a configuration file called a "crontab" (cron table) that contains commands and their execution schedules.
Crontab Syntax
A crontab entry has five time fields followed by the command to execute:
* * * * * command_to_execute
│ │ │ │ │
│ │ │ │ └─── Day of week (0-7, Sunday is 0 or 7)
│ │ │ └───── Month (1-12)
│ │ └─────── Day of month (1-31)
│ └───────── Hour (0-23)
└─────────── Minute (0-59)
Common Cron Patterns
# Every minute
* * * * *
# Every hour at minute 0
0 * * * *
# Every day at 2:30 AM
30 2 * * *
# Every Monday at 9:00 AM
0 9 * * 1
# First day of every month at midnight
0 0 1 * *
# Every 15 minutes
*/15 * * * *
# Every 6 hours
0 */6 * * *
Setting Up Your First Automated Task
Step 1: Create a Python Script
Let's start with a simple disk space monitoring script:
#!/usr/bin/env python3
import shutil
import smtplib
from email.mime.text import MIMEText
from datetime import datetime
def check_disk_space(path="/", threshold=80):
"""Check if disk usage exceeds threshold percentage"""
usage = shutil.disk_usage(path)
percent_used = (usage.used / usage.total) * 100
return percent_used > threshold, percent_used
def send_alert(message):
"""Send email alert (configure with your SMTP settings)"""
# This is a placeholder - configure with your email settings
print(f"ALERT: {message}")
# Uncomment and configure for actual email alerts:
# msg = MIMEText(message)
# msg['Subject'] = 'Disk Space Alert'
# msg['From'] = 'your_email@example.com'
# msg['To'] = 'recipient@example.com'
# with smtplib.SMTP('smtp.example.com', 587) as server:
# server.starttls()
# server.login('your_email@example.com', 'your_password')
# server.send_message(msg)
def main():
is_critical, usage = check_disk_space()
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
if is_critical:
message = f"[{timestamp}] WARNING: Disk usage is at {usage:.2f}%"
send_alert(message)
# Log to file
with open('/var/log/disk_monitor.log', 'a') as f:
f.write(message + '\n')
else:
print(f"[{timestamp}] Disk usage OK: {usage:.2f}%")
if __name__ == "__main__":
main()
Step 2: Make the Script Executable
chmod +x /home/username/scripts/disk_monitor.py
Step 3: Test the Script
Run it manually first to ensure it works:
python3 /home/username/scripts/disk_monitor.py
Step 4: Add to Crontab
Open your crontab for editing:
crontab -e
Add an entry to run every hour:
0 * * * * /usr/bin/python3 /home/username/scripts/disk_monitor.py >> /home/username/logs/disk_monitor.log 2>&1
The >> file.log 2>&1
redirects both standard output and errors to a log file.
Real-World Automation Examples
1. Automated Backup Script
#!/usr/bin/env python3
import os
import tarfile
from datetime import datetime
import shutil
def create_backup(source_dir, backup_dir):
"""Create compressed backup of specified directory"""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
backup_name = f"backup_{timestamp}.tar.gz"
backup_path = os.path.join(backup_dir, backup_name)
# Create backup directory if it doesn't exist
os.makedirs(backup_dir, exist_ok=True)
# Create compressed archive
with tarfile.open(backup_path, "w:gz") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
print(f"Backup created: {backup_path}")
# Clean up old backups (keep last 7 days)
cleanup_old_backups(backup_dir, days=7)
def cleanup_old_backups(backup_dir, days=7):
"""Remove backups older than specified days"""
import time
now = time.time()
cutoff = now - (days * 86400)
for filename in os.listdir(backup_dir):
filepath = os.path.join(backup_dir, filename)
if os.path.isfile(filepath):
if os.path.getmtime(filepath) < cutoff:
os.remove(filepath)
print(f"Removed old backup: {filename}")
if __name__ == "__main__":
create_backup("/home/username/important_data", "/home/username/backups")
Crontab entry (daily at 2 AM):
0 2 * * * /usr/bin/python3 /home/username/scripts/backup.py
2. System Health Monitor
#!/usr/bin/env python3
import psutil
import json
from datetime import datetime
def collect_system_metrics():
"""Gather system performance metrics"""
metrics = {
'timestamp': datetime.now().isoformat(),
'cpu_percent': psutil.cpu_percent(interval=1),
'memory_percent': psutil.virtual_memory().percent,
'disk_usage': psutil.disk_usage('/').percent,
'network_sent': psutil.net_io_counters().bytes_sent,
'network_recv': psutil.net_io_counters().bytes_recv,
}
return metrics
def save_metrics(metrics, filepath='/var/log/system_metrics.json'):
"""Append metrics to JSON log file"""
try:
with open(filepath, 'a') as f:
json.dump(metrics, f)
f.write('\n')
except PermissionError:
# Fallback to user directory if /var/log is not writable
filepath = os.path.expanduser('~/system_metrics.json')
with open(filepath, 'a') as f:
json.dump(metrics, f)
f.write('\n')
if __name__ == "__main__":
metrics = collect_system_metrics()
save_metrics(metrics)
# Alert if critical thresholds exceeded
if metrics['cpu_percent'] > 90:
print(f"WARNING: High CPU usage: {metrics['cpu_percent']}%")
if metrics['memory_percent'] > 90:
print(f"WARNING: High memory usage: {metrics['memory_percent']}%")
Crontab entry (every 5 minutes):
*/5 * * * * /usr/bin/python3 /home/username/scripts/health_monitor.py
3. Web Scraper for Price Monitoring
#!/usr/bin/env python3
import requests
from bs4 import BeautifulSoup
import json
from datetime import datetime
def check_price(url, css_selector):
"""Scrape price from a website"""
try:
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers, timeout=10)
soup = BeautifulSoup(response.content, 'html.parser')
price_element = soup.select_one(css_selector)
if price_element:
price_text = price_element.text.strip()
# Extract numeric value
price = float(''.join(filter(lambda x: x.isdigit() or x == '.', price_text)))
return price
return None
except Exception as e:
print(f"Error fetching price: {e}")
return None
def log_price(product_name, price, threshold=None):
"""Log price and send alert if below threshold"""
timestamp = datetime.now().isoformat()
data = {
'timestamp': timestamp,
'product': product_name,
'price': price
}
# Log to file
with open('price_history.json', 'a') as f:
json.dump(data, f)
f.write('\n')
# Check threshold
if threshold and price and price < threshold:
print(f"PRICE ALERT: {product_name} is now ${price} (below ${threshold})")
# Send notification (implement your preferred method)
if __name__ == "__main__":
# Example: Monitor a product price
url = "https://example.com/product"
price = check_price(url, ".price-class")
log_price("Example Product", price, threshold=50.0)
Crontab entry (every 6 hours):
0 */6 * * * cd /home/username/scripts && /usr/bin/python3 price_monitor.py
4. Log Rotation and Cleanup
#!/usr/bin/env python3
import os
import gzip
import shutil
from datetime import datetime, timedelta
def rotate_logs(log_dir, max_age_days=30):
"""Compress old logs and remove very old ones"""
cutoff_date = datetime.now() - timedelta(days=max_age_days)
compress_date = datetime.now() - timedelta(days=7)
for filename in os.listdir(log_dir):
filepath = os.path.join(log_dir, filename)
if not os.path.isfile(filepath):
continue
file_mtime = datetime.fromtimestamp(os.path.getmtime(filepath))
# Delete very old files
if file_mtime < cutoff_date:
os.remove(filepath)
print(f"Deleted old log: {filename}")
# Compress week-old logs
elif file_mtime < compress_date and not filename.endswith('.gz'):
compressed_path = filepath + '.gz'
with open(filepath, 'rb') as f_in:
with gzip.open(compressed_path, 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
os.remove(filepath)
print(f"Compressed log: {filename}")
if __name__ == "__main__":
rotate_logs('/var/log/myapp')
Crontab entry (daily at midnight):
0 0 * * * /usr/bin/python3 /home/username/scripts/log_rotation.py
Best Practices
1. Use Absolute Paths
Always use absolute paths in your scripts and crontab entries:
# Good
with open('/home/username/data/output.txt', 'w') as f:
f.write(data)
# Bad (relative path may not work in cron)
with open('output.txt', 'w') as f:
f.write(data)
2. Set Up Proper Logging
Create comprehensive logs for debugging:
import logging
logging.basicConfig(
filename='/home/username/logs/script.log',
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logging.info("Script started")
logging.error("An error occurred")
3. Handle Errors Gracefully
def main():
try:
# Your automation logic
perform_task()
except Exception as e:
logging.error(f"Script failed: {e}")
send_error_notification(str(e))
sys.exit(1)
4. Use Virtual Environments
For scripts with dependencies:
# In crontab
0 * * * * /home/username/venv/bin/python /home/username/scripts/script.py
5. Add Shebang and Make Executable
#!/usr/bin/env python3
# Your script here
chmod +x script.py
6. Test Before Scheduling
Always test your scripts manually before adding them to cron:
# Test with same environment as cron
env -i /usr/bin/python3 /path/to/script.py
7. Set Environment Variables
Cron has a minimal environment. Add necessary variables in your crontab:
SHELL=/bin/bash
PATH=/usr/local/bin:/usr/bin:/bin
PYTHONPATH=/home/username/lib
0 * * * * /usr/bin/python3 /home/username/scripts/script.py
Managing Crontab
Useful Crontab Commands
# Edit crontab
crontab -e
# List current crontab
crontab -l
# Remove all cron jobs
crontab -r
# Edit another user's crontab (requires root)
sudo crontab -e -u username
System-Wide Cron Jobs
Place scripts in these directories for predefined intervals:
/etc/cron.daily/ # Daily
/etc/cron.hourly/ # Hourly
/etc/cron.weekly/ # Weekly
/etc/cron.monthly/ # Monthly
Debugging Cron Jobs
Check Cron Service Status
sudo systemctl status cron # Debian/Ubuntu
sudo systemctl status crond # RHEL/CentOS
View Cron Logs
grep CRON /var/log/syslog # Debian/Ubuntu
tail -f /var/log/cron # RHEL/CentOS
Common Issues and Solutions
Script doesn't run: - Check file permissions - Verify the shebang line - Ensure executable flag is set
Script runs but fails:
- Check script logs
- Verify all paths are absolute
- Test with minimal environment: env -i script.sh
No output in logs:
- Add explicit redirects: >> /path/to/log 2>&1
- Check log file permissions
- Verify log directory exists
Advanced Techniques
Using Flock for Single Instance
Prevent multiple instances from running simultaneously:
import fcntl
import sys
def acquire_lock(lockfile):
"""Ensure only one instance runs"""
fp = open(lockfile, 'w')
try:
fcntl.lockf(fp, fcntl.LOCK_EX | fcntl.LOCK_NB)
except IOError:
print("Another instance is running")
sys.exit(1)
return fp
# Usage
lock = acquire_lock('/tmp/myscript.lock')
# Your script logic here
Email Notifications on Failure
def send_failure_notification(error_message):
"""Send email when script fails"""
import smtplib
from email.mime.text import MIMEText
msg = MIMEText(f"Script failed with error:\n\n{error_message}")
msg['Subject'] = 'Automation Script Failed'
msg['From'] = 'automation@yourdomain.com'
msg['To'] = 'admin@yourdomain.com'
# Configure with your SMTP settings
# with smtplib.SMTP('localhost') as server:
# server.send_message(msg)
Using Systemd Timers (Modern Alternative)
For more complex scheduling needs, consider systemd timers:
Create a service file (/etc/systemd/system/mybackup.service
):
[Unit]
Description=Automated Backup Script
[Service]
Type=oneshot
ExecStart=/usr/bin/python3 /home/username/scripts/backup.py
User=username
Create a timer file (/etc/systemd/system/mybackup.timer
):
[Unit]
Description=Run backup daily
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.target
Enable and start:
sudo systemctl enable mybackup.timer
sudo systemctl start mybackup.timer
Conclusion
Combining Python's powerful scripting capabilities with cron's reliable scheduling creates a robust automation framework for Linux systems. Whether you're monitoring system health, backing up data, scraping websites, or managing logs, this combination provides the flexibility and reliability needed for production environments.
Start small with simple scripts, test thoroughly, implement proper logging, and gradually build more sophisticated automations as you become comfortable with the workflow. Your automated Linux system will save you countless hours while running smoothly in the background.
Happy automating!
Comments
No comments yet. Be the first to comment!