WebAlerts: Notifications for Web

WebAlerts is a Python package that lets you be notified for new website posts matching your regex patterns.

Quickstart

The following is an example WebAlerts app:

from webalerts import App
app = App(config={
    'patterns': ['Samsung SSD (128|256)G'],
    'sites': {
        'clien': {
            'class': 'webalerts.sites.clien.Clien',
            'board_ids': ['sold'],
            'username': 'clienuser',
            'password': 'letmein',
        },
    },
    'notifications': {
        'email': {
            'class': 'webalerts.notifications.email.EmailNotification',
            'to_addrs': ['me@example.com'],
        },
    },
})
app.run()

It will check Clien every 5 minutes and send emails to me@example.com on posts about Samsung SSD 128G or 256G on the Sell board.

You can load the configurations from a YAML file. For example, the above configurations can be written as the following YAML file:

patterns: [Samsung SSD (128|256)G]
sites:
  clien:
    class: webalerts.sites.clien.Clien
    board_ids: [sold]
    password: letmein
    username: clienuser
notifications:
  email:
    class: webalerts.notifications.email.EmailNotification
    to_addrs: [me@example.com]

Then the program can be simplified as follows:

from webalerts import App
app = App()
app.from_yaml('config.yaml')
app.run()

Configuration options

In essence, a WebAlert app is just a bunch of configurations. Below are the options you can use. Options without a default value are required.

patterns (default: None)
List of regex patterns to match posts you want to be notified. Notifications will be sent if any of the patterns matches the title or the content of a post. If patterns is None, then it always matches any post.
check_interval (default: 5)
Number of minutes between each loop where websites are checked for new posts. It should be an integer. The minimum value is 1.
notify_interval (default: 5)
Number of minutes between each loop where notifications are sent. It should be an integer. The minimum vale is 1.
sites

List of website settings, keyed by name. Each website accepts a different set of config options, although options like username and password are common for most sites. See webalerts.sites for more details. Below are general options for all sites:

class
Site class object or its fully qualified name.
notifications (default: all notifications)
Notification names that will be used for this site.
patterns (default: global patterns value)
List of regex patterns to match posts you want to be notified. See the global patterns option for more.
check_interval (default: global check_interval value)
Number of minutes between each loop where websites are checked for new posts. See the global check_interval option for more.
notifications

List of notification settings, keyed by name. See webalerts.notifications for more details. Below are general options for all notifications options:

class
Notification class object or its fully qualified name.
notify_interval (default: global notify_interval value)
Number of minutes between each loop where notifications are sent. See the global notify_interval option for more.

Logging

To enable debug logging with timestamps to standard out, put the following code before running the app:

import logging
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s [%(name)s] [%(levelname)s] %(message)s')

For more details about configuring logging, see logging and logging.config.

Loggers are named after their module and class name, e.g. App class has a logger named webalerts.app.App.

API reference

class webalerts.App(config=None)

Returns a WebAlerts app object. It is initialized with configuration values in a dict object. For the list of config values, see Configuration options.

from_yaml(name)

Loads config values from a YAML file.

run()

Starts the main loop of the app that periodically collects new posts and feeds those posts to notification services.

class webalerts.Post

Represents a generic post, a user’s content published to a website. String parameters containing non-ASCII characters must be unicode.

Parameters:
  • url – URL of a post (required).
  • title – Title of a post (required).
  • content – text content of a post (required).
  • content_html – HTML content of a post.
  • author – Name of the original poster of a post.
  • author_id – ID that uniquely identifies the original poster.
  • publisheddatetime object containing the date and time when a post was published.
content_html_safe

A sanitized version of content_html. Notifications should use this value instead of content_html.

webalerts.sites

This package contains site-specific implementations on how to interact with the site.

A site class should be implemented in the following way:
  • The constructor accepts a config dictionary. Typical configuration values include username and password for sites that require authentication.
  • Implements get_new_posts(). It should return a list of Post objects since the last time it is called sorted by published time in ascending order, or an empty list if it is the first call. It must handle exceptions expected in normal use and raise only instances of SiteException or ConfigurationError if necessary.
class webalerts.sites.clien.Clien(config)

Implementation for Clien. It accepts the following configuration options: username, password, board_ids.

username and password are your Clien username and password. Some boards in Clien require login to view posts.

board_ids is a list of identifiers of boards to watch. The identifier of a board can be found in its URL after bo_table=. Currently boards in special forms are not supported such as Photos.

As Clien does not provide public API, it works by parsing HTML markup of pages returned by the web server. It may not work at any time as the site owner has not explicitly granted scripted accesses to the site and the markup of the site is subject to change.

class webalerts.sites.navercafe.NaverCafe(config)

Note: not implemented yet.

Implementation for Naver Cafe. It accepts the following configuration options: username, password, cafe_ids.

username and password are your Naver username and password. They are needed to view posts in private cafes that restrict access from anonymous users or non-member users.

cafe_ids is a list of identifiers of cafes to watch. The identifier of a cafe can be found in its URL after http://cafe.naver.com/. Restricting the scope to individual boards in a cafe rather than the entire cafe is not supported for now, but it may be implemented in a future release.

class webalerts.sites.cgv.CGV(config)

Implementation for CGV Cinemas. Get notified and get the best seat! Don’t forget setting patterns to None in configuration for this site.

Configuration options:

username
CGV username.
password
CGV password.
data

List of tuples (movie_name, movie_format, theater_name, time_range, date_range, seat_range). You will be notified if any of the specified seats in the specified date and time range for the specified movie, format, and theater is found. All strings containing non-ASCII characters must be unicode. Movie, format, and theater names should be exact matches. It is recommended to find those in the official site.

time_range is a tuple of length 2, consisting of strings for times in HH:MM format. Hour can be larger than 23, so '25:00' is a valid time string. If you do not want to restrict times, set it to None.

date_range is a list of strings, either date in yyyymmdd format or short weekday names such as ‘mon’, ‘fri’. 'weekdays' and 'weekends' are shortcuts for ['mon', 'tue', 'wed', 'thu', 'fri'] and ['sat', 'sun'] respectively. 'today' is the current date in Korea Standard Time (UTC+9).

seat_range is a list of seat names such as ‘A1’ and ‘F18’.

The following is an example tuple: ('Gravity', 'IMAX3D', 'Wangsimni', ('18:00', '24:30'), ['1130', 'fri', 'weekends'], ['F16', 'F17', 'G16', 'G17'])

webalerts.notifications

This package contains notification implementations.

A notification class should be implemented in the following way:
  • The constructor accepts a config dictionary.
  • Implements notify(). It should take a list of posts and do what it is supposed to do, such as sending emails to users. It must handle all exceptions expected in normal use and raise only instances of NotificationException or ConfigurationError if necessary.
class webalerts.notifications.email.EmailNotification(config)

Sends emails to users on matched posts. Although you can specify any SMTP server to use to send emails, it is recommended not to use your own mail server as many email services refuse to receive emails from unknown sources. If you want to send emails using Gmail, set host to 'smtp.gmail.com', port to 587, secure to True, and username and password to your account.

Configuration options:

to_addrs
List of email addresses to which notifications are sent.
from_addr (default: 'WebAlerts <webalerts@localhost>')
“From” address of notification emails.
host (default: None)
Optional host parameter used to create a smtplib.SMTP instance.
port (default: None)
Optional port parameter used to create a smtplib.SMTP instance.
secure (default: False)
Whether the SMTP connection should be secure or not.
username (default: None)
SMTP username.
password (default: None)
SMTP password.
style (default: see the source)
CSS styles to be placed in in <head>.
template (default: see the source)
HTML template for each posts.
layout (default: see the source)
HTML template for the whole email.
class webalerts.notifications.console.ConsoleNotification(config)

Prints the titles and URLs of the matched posts to the standard out. It is intended to be used for debugging.

There are no configuration values for this notification.

webalerts.exceptions

exception webalerts.exceptions.ConfigurationError

Bases: webalerts.exceptions.WebAlertsException

Raised when there is an error in configurations.

exception webalerts.exceptions.LoginError

Bases: webalerts.exceptions.SiteException

Raised when website authentication fails.

exception webalerts.exceptions.NotificationException

Bases: webalerts.exceptions.WebAlertsException

Raised when a notification-related error occurs, e.g. it fails to send emails.

exception webalerts.exceptions.ParseError

Bases: webalerts.exceptions.SiteException

Raised when it fails to parse the returned HTML.

exception webalerts.exceptions.SiteException

Bases: webalerts.exceptions.WebAlertsException

Raised when a website-related error occurs, e.g. it fails to login or there is a network problem.

exception webalerts.exceptions.WebAlertsException

Bases: exceptions.Exception

Root class of all exceptions defined in webalerts.

Indices and tables

Read the Docs v: latest
Versions
latest
Downloads
PDF
HTML
Epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.