DIY Comments - Step 2

11 September, 2024. 1,225 words.

The continuing adventures of adding homegrown DIY comments to my static blog.

Last we left off, we’d implemented an AWS Lambda function to receive HTTP POST requests from a comment form. Now we need to save the comments and check for spam.

Checking For Spam

We receive our comments from a plain HTML form, so we’re going to get spam from bots. It’s incredibly easy to send spam to an HTML comment form. My blog isn’t that popular, but I still got a spam within six hours of adding the form to the blog. Luckily I don’t get that many.

Here are some of the ways I’m checking for spam in the Lambda form processing function:

A honeypot form field (surprisingly effective)
Verifying that form fields have what they’re supposed to (also effective)
Verifying referrer
Akismet API call

Akismet is surprisingly easy to use even without an active WordPress site. Here are the key components for adding it to the Lambda form processing function:

const akismetClient = akismet.client({
    key: process.env.AKISMET_KEY,
    blog: process.env.AKISMET_SITE,
});

const userIp = event.context.sourceIp;
const userAgent = event.context['user-agent'];
const referrer = event.headers['Referer'];

const akismetData = {
    user_ip: userIp,
    user_agent: userAgent,
    referrer: referrer || '',
    comment_type: 'comment',
    comment_author: name,
    comment_author_email: email,
    comment_content: comment
};

const isSpam = await akismetClient.checkSpam(akismetData);

Honestly, though, some basic validation of the form fields and a honeypot field screens out 99% of the spam I’ve seen so far, before having to call Akismet.

One thing I’m not doing (yet?) is requiring clientside Javascript to post the form. That was something I did in the past, which eliminated 100% of spam. I’m led to believe spam bots are more sophisticated now so that’s not as effective anymore, and also there are accessibility concerns with requiring Javascript, and also there are actually people in the world who voluntarily disable Javascript on web sites. So I’m trying to enable basic support for no-javascript comment posting. (That comes with its own set of problems which I’ll leave as an exercise for the reader. I’m looking at you, 302 status code response.)

Protecting against script kiddies and their form posts has been fairly straightforward for over a decade, but if someone targets the API directly, that’s a slightly different story.

How I’m calling the API is a completely open secret for anyone with basic web development skills to see by viewing the source code of my site (same as with any site), so anyone in the world can start writing curl commands to send comments to my blog if they wanted to (including me, for testing). Protecting against that is another topic.

In the last week, I’ve seen a variety of attacks on both the FORM and the API directly. Not as much as, say, the Reddit front page, but enough that you can’t ignore it. If anyone else is building comments, you’ve got to build in protection from the very beginning.

Incidentally, this is one reason I’m very sympathetic to people disabling comments on blogs, and why I’ve been slow to work on this. Comments are almost always a wide open gate for bad actors to drive a truck full of threats into your site. If you run a managed blog, you don’t have to worry about that as much, but somebody, somewhere does, on your behalf. It’s easy to implement comments, but it’s less easy to implement secure comments.

Saving Comments

I tend to use AWS for all my cloud computing needs because 1) it’s one of the big three industry standards (Google and Azure being the others, not counting Alibaba which is the fourth one we don’t talk about), and 2) I interact with AWS at work all the time so I’m most familiar with it.

So I save my comments to an AWS Dynamo table. It’s an object-storage database that’s pretty cheap to operate. Anyway saving is implemented with some additions to the form handler Lambda function. The key bits are:

const client = new DynamoDBClient({ region: 'us-east-1' });
const ddbDocClient = DynamoDBDocumentClient.from(client);

const params = {
    TableName: 'my_table_name',
    Item: {
        id: uuidv4(),
        date: commentDate,
        author: name,
        author_id: `email:${email}`,
        origin: origin,
        page: page,
        content: comment,
        source: 'form',
    }
};

const data = await ddbDocClient.send(new PutCommand(params));

The author_id is a bit of a thorny issue. It’s intended to be a unique ID for each author, and, if we had a real user management system, it would include some sort of authorization token.

We don’t have user accounts yet, so for now, it’s just an email address. So initially, as with pretty much any blog that allows anonymous comments, it’ll be super easy for anyone to impersonate anyone else. There’s a minimal amount of validation, but to make a long story short, there’s a reason sites moved to more complicated comment logins.

(Technically, you can type any string whatsoever into the email box, so using a randomly-generated password-like string would be better, so long as you can remember it. I’m never going to send anyone an actual email.)

Getting Comments

Now we need another API function to read comments from the Dynamo table. This will be used by the client-side Javascript I’ll have to add to the web site.

I setup another API Gateway and another AWS Lambda function for this. This part was very straightforward. Here are the basics:

const client = new DynamoDBClient({ region: 'us-east-1' });
const ddbDocClient = DynamoDBDocumentClient.from(client);

export const handler = async (event) => {
    const page = event.page;

    const params = {
      TableName: 'my_comment_table',
      ProjectionExpression: '#dt, #au, #co',
      ExpressionAttributeNames: {
        "#dt": "date", // reserved word
        "#au": "author",
        "#co": "content", // reserved word
      },
      FilterExpression: 'page = :id',
      ExpressionAttributeValues: {
        ':id': page,
      },
    };
  
    const data = await ddbDocClient.send(new ScanCommand(params));

    // sort by date
    const items = data.Items;
    items.sort((a, b) => new Date(a.date) - new Date(b.date));

    return {
        statusCode: 200,
        body: items,
        headers: {
            'Content-Type': 'application/json',
        },
    };
}

This can be expanded to include “comments” like webmentions or Mastodon replies, but that’s a future problem.

The API Gateway was very easy to setup for this as we just need to pass a page parameter to the Lambda function in a very conventional way. (The page being the blog post that we want to fetch comments for. I’m just using the path component of the url without the domain name. That might change in the future but for now it lets me test locally.)

ChatGPT Makes This Easy

I should mention here that I’ve used ChatGPT 4 a lot for getting started on this.

I know it’s not cool to mention that AI is game-changingly useful for menial software development tasks, but it is. Being able to type in something like “Write an AWS Lambda function in ES6 Javascript to process a POST request from an HTTP FORM for blog comments” and get a mostly-working code sample as output is a huge time-saver.

That’s it for now. Future steps include:

Client-side Javascript to fetch and display recent comments from the API
Importing my blog comment history into local data files inside my Hugo content directory
Build-time Hugo templates to query local comment data archives to render older comments as static HTML
Notifying me when comments are received
Administrative tools so I can manage comments
A way to submit comments using some kind of third-party OAuth like Google if you want to
The list goes on and on

Checking For Spam

Saving Comments

Getting Comments

ChatGPT Makes This Easy

Related