# Voter Image Import Feature - OCR-based Extraction

## Overview
This feature allows you to extract voter information from images (screenshots of PDF electoral rolls) using OCR (Optical Character Recognition). Unlike the PDF import feature, this does **NOT** save any data to the database - it only extracts and returns the data as a JSON response.

## Prerequisites

### System Requirements
- **Tesseract OCR** must be installed on your system
- PHP 8.4+ with GD or Imagick extension

### Install Tesseract OCR

#### macOS (using Homebrew)
```bash
brew install tesseract
```

#### Ubuntu/Debian
```bash
sudo apt-get install tesseract-ocr
```

#### Windows
Download and install from: https://github.com/UB-Mannheim/tesseract/wiki

### Verify Installation
```bash
tesseract --version
```

## API Endpoint

### POST `/api/image-import/upload`

**Purpose:** Extract voter data from an image without saving to database

**Request:**
- **Method:** POST
- **Content-Type:** multipart/form-data
- **Body:**
  - `image`: Image file (jpg, png, gif, bmp, webp, tiff)
  - **Max Size:** 10MB

**Supported Image Formats:**
- JPG/JPEG
- PNG
- GIF
- BMP
- WEBP
- TIFF

**Response:**
```json
{
  "success": true,
  "message": "Image processed successfully",
  "data": {
    "voters": [
      {
        "voter_id": "ABC1234567",
        "name": "John Doe",
        "age": 35,
        "gender": "M",
        "year_of_birth": 1990,
        "relation_name": "Father Name",
        "serial_number": "1",
        "source_line": 15
      }
    ],
    "booth_info": {
      "part_number": "2",
      "booth_number": "2",
      "booth_address": null
    },
    "metadata": {
      "text_length": 3245,
      "processing_time": "2.5s",
      "image_dimensions": {
        "width": 1920,
        "height": 1080,
        "type": "image/jpeg"
      },
      "extraction_method": "tesseract-ocr"
    }
  }
}
```

## Usage Examples

### Using Postman

1. **Create New Request**
   - Method: `POST`
   - URL: `http://your-domain/api/image-import/upload`

2. **Set Body**
   - Select `form-data`
   - Add key: `image`
   - Type: `File`
   - Select your image file

3. **Send Request**

### Using cURL

```bash
curl -X POST http://your-domain/api/image-import/upload \
  -F "image=@/path/to/voter_screenshot.jpg"
```

### Using PHP

```php
$curl = curl_init();

$file = new CURLFile('/path/to/voter_screenshot.jpg', 'image/jpeg', 'screenshot');

curl_setopt_array($curl, [
    CURLOPT_URL => 'http://your-domain/api/image-import/upload',
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_POST => true,
    CURLOPT_POSTFIELDS => ['image' => $file]
]);

$response = curl_exec($curl);
curl_close($curl);

$result = json_decode($response, true);
print_r($result['data']['voters']);
```

## How It Works

### OCR Extraction Process

1. **Image Upload**
   - Validates image format and size
   - Accepts multiple image formats

2. **Text Extraction**
   - Uses Tesseract OCR to extract text from image
   - Supports English language electoral rolls
   - Handles image-based PDF screenshots

3. **Data Parsing**
   - Extracts Part/Booth numbers
   - Identifies voter records using multiple patterns
   - Parses voter ID (EPIC), name, age, gender
   - Extracts relation information (Father/Husband/Wife)

4. **Response Generation**
   - Returns structured JSON data
   - Includes extraction metadata
   - No database operations performed

### Voter Extraction Patterns

The service recognizes multiple electoral roll formats:

#### Pattern 1: Serial EPIC Name Age Gender
```
1 ABC1234567 John Doe 35 M
```

#### Pattern 2: EPIC Name Relation Age Gender
```
ABC1234567 John Doe S/o: Father Name 35 Male
```

#### Pattern 3: Simple EPIC and Name
```
ABC1234567 John Doe
```

## Key Features

### ✅ Advantages
- **No Database Impact**: Doesn't modify any data
- **Flexible Input**: Accepts any image format
- **Quick Extraction**: Returns data immediately
- **Detailed Metadata**: Processing stats included
- **Multiple Patterns**: Supports various electoral roll formats

### ⚠️ Limitations
- **OCR Accuracy**: Depends on image quality
- **Format Dependency**: Works best with standard electoral roll layouts
- **No Persistence**: Data not saved (by design)
- **Processing Time**: OCR takes a few seconds

## Troubleshooting

### Error: "Tesseract OCR not found"
**Solution:** Install Tesseract OCR using instructions above

### Error: "No text could be extracted from the image"
**Possible Causes:**
- Image quality too low
- Image doesn't contain text
- Tesseract not properly installed

**Solutions:**
- Use higher resolution images
- Ensure image contains readable text
- Verify Tesseract installation: `tesseract --version`

### Low Extraction Accuracy
**Tips to Improve:**
- Use higher resolution screenshots
- Ensure good contrast in the image
- Capture text clearly without blur
- Use PNG format for better clarity
- Avoid heavily compressed JPEG images

### Performance Issues
**Optimization:**
- Resize large images before upload
- Use optimal image formats (PNG for text)
- Compress images without losing quality

## Differences from PDF Import

| Feature | PDF Import | Image Import |
|---------|-----------|--------------|
| **Data Persistence** | ✅ Saves to database | ❌ Returns JSON only |
| **Input Format** | PDF files only | Multiple image formats |
| **Processing** | Queue-based | Immediate |
| **Booth Management** | Creates/updates booths | Only extracts info |
| **Street Management** | Creates/updates streets | N/A |
| **Use Case** | Bulk import | Quick data extraction |

## Best Practices

1. **Image Quality**
   - Use screenshots with at least 150 DPI
   - Ensure text is clearly readable
   - Avoid images with heavy compression

2. **File Size**
   - Keep images under 5MB for faster processing
   - Higher resolution doesn't always mean better OCR

3. **Format Selection**
   - PNG: Best for text clarity
   - JPEG: Good balance of size and quality
   - Avoid: BMP (large file size)

4. **Error Handling**
   - Always check the `success` field in response
   - Parse `voters` array carefully (may be empty)
   - Use `metadata` for debugging extraction issues

## Files

### Controller
`app/Http/Controllers/VoterImageImportController.php`
- Handles image upload and validation
- Coordinates with service layer
- Returns JSON response

### Service
`app/Services/VoterImageImportService.php`
- Performs OCR extraction using Tesseract
- Parses extracted text for voter data
- Supports multiple electoral roll patterns
- Returns structured data

### Route
`routes/api.php`
```php
Route::prefix('image-import')->group(function () {
    Route::post('upload', [VoterImageImportController::class, 'uploadImage']);
});
```

## Testing

### Sample Test Case
```bash
# Upload a sample electoral roll screenshot
curl -X POST http://localhost:8000/api/image-import/upload \
  -F "image=@/path/to/voter_list_screenshot.jpg" \
  | jq '.'
```

### Expected Behavior
- Returns 200 status code on success
- `voters` array contains extracted voter records
- `booth_info` contains part/booth number if found
- `metadata` shows processing details

## Support

For issues or questions:
1. Check Tesseract installation
2. Verify image quality and format
3. Review Laravel logs: `storage/logs/laravel.log`
4. Check OCR extraction logs

## Version History

### v1.0.0 (Current)
- Initial release
- Basic OCR extraction
- Multiple voter pattern support
- No database persistence
- Metadata reporting
