Nisanth Chunduru

Nisanth Chunduru

Generate captchas with Go lang

2024-04-20T00:00:00Z

While exploring to understand the AL/ML landscape, I stumbled upon the fantastic Dive into Deep Learning book. Chapters 8 and 9 of the book introduce Convolutional Neural Networks (CNNs) and Recurrent Neural networks (RNNs). To better understand these neural networks, I decided to create a CNN or a CRNN (Convolutional Recurrent Neural Network) that can solve captchas using the popular Tensorflow Keras library. Unfortunately, neutral networks often have to trained on a large number of training samples to achieve acceptable accuracy. As my primary goal was to improve my understanding the aforementioned neural networks, I sought to create a large training dataset in a minimal amount of time. Collecting thousands of captchas from websites and solving them myself would have consumed an unreasonable amount of time. To circumvent this problem, I decided to instead write a script to generate captchas (Before you ask me, I did later discover Hugging Face Datasets and other dataset sources).

I started searching for an existing cap and found a wonderful captcha generator https://github.com/dchest/captcha written in Go lang that generated relatively readable captchas. As dchest-captcha’s WriteImage function generated 1 captcha at a time, I wrote the script below to generate many captchas once

package main

import (
    "fmt"
    "os"
    "path"
    "path/filepath"
    "strconv"
    "strings"

    "github.com/dchest/captcha"
)

func main() {
    if len(os.Args) < 3 {
        fmt.Println("Usage: go run generate_captchas.go <count> <destination_directory>")
        os.Exit(1)
    }

    count, err := strconv.Atoi(os.Args[1])
    if err != nil {
        fmt.Println("Please provide the number of captchas you'd like to generate as the first argument")
        os.Exit(1)
    }

    destinationDirectory := filepath.Clean(os.Args[2])

    generateCaptchas(count, destinationDirectory)
}

func generateCaptchas(count int, destinationDirectory string) {
    if err := os.MkdirAll(destinationDirectory, 0755); err != nil {
        panic(err)
    }

    startIndex := findStartIndex(destinationDirectory)

    for i := startIndex; i < count; i++ {
        captchaDigits := captcha.RandomDigits(6)
        captchaDummyId := "dummyId"
        captchaWidth := 120
        captchaHeight := 80
        captchaImage := captcha.NewImage(captchaDummyId, captchaDigits, captchaWidth, captchaHeight)
        captchaText := make([]byte, len(captchaDigits))
        for j, digit := range captchaDigits {
            captchaText[j] = digit + '0'
        }

        captchaFileName := fmt.Sprintf("%d_%s.png", i+1, string(captchaText))
        captchaFilePath := path.Join(destinationDirectory, captchaFileName)
        file, err := os.Create(captchaFilePath)
        if err != nil {
            panic(err)
        }
        defer file.Close()
        _, err = captchaImage.WriteTo(file)
        if err != nil {
            panic(err)
        }
    }
}

func findStartIndex(destinationDirectory string) int {
    files, err := os.ReadDir(destinationDirectory)
    if err != nil {
        panic(err)
    }

    maxIndex := 0
    for _, file := range files {
        name := file.Name()
        if strings.HasSuffix(name, ".png") {
            parts := strings.Split(name, "_")
            if len(parts) > 0 {
                index, err := strconv.Atoi(parts[0])
                if err == nil && index > maxIndex {
                    maxIndex = index
                }
            }
        }
    }
    return maxIndex
}

To use the script, give the number of captchas you’d like the script to generate followed by the directory where you’d like captchas to be saved in

go run generate_captchas.go 1000 data/captchas/train

dchest-captcha generates fairly readable captchas most times (but not always). In the off-chance that you’re unhappy with the quality of captchas it generates, clone its git repo and tweak its WriteImage function to reduce wave distortion, reduce letter skew etc. https://github.com/dchest/captcha/blob/master/image.go

Built upon Notion & Sinatra. Source Code.