AutoOCR

Repository

I built AutoOCR to solve a common friction point in my workflow: extracting text from images or non-selectable UI elements. By combining the Tesseract OCR engine with a low-footprint Rust background service, I created a utility that triggers via a global hotkey, processes clipboard images, and returns the extracted text directly to the user's clipboard in milliseconds.

Multi-threaded Background Service

To ensure the application remains responsive without impacting system performance, I implemented a multi-threaded architecture. A dedicated background thread, powered by the device_query crate, continuously monitors keyboard state for the Shift + Alt + O trigger. By using thread::sleep with a 50ms interval, I kept CPU usage below 0.1% while maintaining snappy hotkey detection.

1thread::spawn(move || {
2    let device_state = DeviceState::new();
3    let mut clipboard = Clipboard::new().unwrap();
4
5    // Define path to tessdata relative to the EXE
6    let mut tessdata_path = std::env::current_exe().unwrap();
7    tessdata_path.pop();
8    tessdata_path.push("tessdata");
9
10    loop {
11        let keys = device_state.get_keys();
12
13        // Trigger: Shift + Alt + O
14        if keys.contains(&Keycode::LShift)
15            && keys.contains(&Keycode::LAlt)
16            && keys.contains(&Keycode::O)
17        {
18            if let Ok(image) = clipboard.get_image() {
19                if let Some(text) = perform_ocr(&image, &tessdata_path) {
20                    let cleaned = text.trim().to_string();
21                    if !cleaned.is_empty() {
22                        let _ = clipboard.set_text(cleaned);
23                        notify("AutoOCR", "Text copied to clipboard!");
24                    }
25                }
26            }
27            // Cooldown to prevent multiple triggers
28            thread::sleep(Duration::from_millis(1000));
29        }
30        thread::sleep(Duration::from_millis(50));
31    }
32});

Tesseract OCR Pipeline

The core OCR logic leverages the leptess crate, a high-level wrapper for the Tesseract engine. The pipeline involves converting raw RGBA data from the clipboard (via arboard) into an RGB format compatible with Leptonica. I optimized this by using Vec::with_capacity to prevent reallocations and multi-language support (eng+deu+hin+pol+rus) to ensure high accuracy across different character sets.

1fn perform_ocr(img: &ImageData, path: &PathBuf) -> Option<String> {
2    if !path.exists() { return None; }
3
4    // Initialize Tesseract with multiple language packs
5    let mut lt = LepTess::new(Some(path.to_str()?), "eng+deu+hin+pol+rus").ok()?;
6
7    // Fast RGBA to RGB conversion
8    let mut rgb_data = Vec::with_capacity(img.width * img.height * 3);
9    for chunk in img.bytes.chunks_exact(4) {
10        rgb_data.push(chunk[0]); // R
11        rgb_data.push(chunk[1]); // G
12        rgb_data.push(chunk[2]); // B
13    }
14
15    let img_buffer = RgbImage::from_raw(img.width as u32, img.height as u32, rgb_data)?;
16    let mut buffer = std::io::Cursor::new(Vec::new());
17    DynamicImage::ImageRgb8(img_buffer)
18        .write_to(&mut buffer, image::ImageFormat::Png)
19        .ok()?;
20
21    lt.set_image_from_mem(buffer.get_ref()).ok()?;
22    lt.get_utf8_text().ok()
23}

System Tray Integration

Instead of a traditional window, AutoOCR resides in the system tray to stay out of the user's way. I used the tray-icon crate to build a minimalist menu that allows users to quit the application or check the current version. This approach ensures the utility is always available without cluttering the taskbar, adhering to modern Windows utility design patterns.

1// Setup Tray Menu for background operation
2let tray_menu = Menu::new();
3let quit_item = MenuItem::new("Quit AutoOCR", true, None);
4let quit_id = quit_item.id();
5tray_menu.append(&quit_item).unwrap();
6
7let icon = load_icon();
8let _tray_icon = TrayIconBuilder::new()
9    .with_menu(Box::new(tray_menu))
10    .with_tooltip("AutoOCR - Shift+Alt+O")
11    .with_icon(icon)
12    .build()
13    .unwrap();

Technical Impact

By automating the "Screenshot -> Tesseract -> Clipboard" workflow, AutoOCR reduces a 30-second manual process to a 1-second background task. The use of Rust ensures memory safety and high performance, making it a reliable companion for daily productivity. The project also taught me about Windows FFI, clipboard serialization, and the complexities of distributing pre-compiled Tesseract binaries alongside a Rust executable.

Try It Out

Download

If you want to try this app out for yourself, click on the download button and install from GitHub.