Overview#
file-hiding-and-detection implements the EMBEDDED_DATA_V1 steganographic scheme, which allows hiding any file inside a JPEG image using AES-256-GCM encryption and JPEG APP2 segment injection. The project includes both embedding and detection tools.
Key Features#
- Non-destructive embedding: The carrier JPEG is never modified; encrypted payload is injected as APP2 segments
- Strong encryption: AES-256-GCM with authentication ensures data integrity
- Flexible detection: Multi-layer detection strategy works with or without password
- Complete toolset: Embed, detect, extract, decrypt, and restore operations
Architecture#
Embedding Process#
The system injects encrypted data as APP2 segments immediately after the JPEG SOI marker:
output_image = SOI (FF D8)
+ APP2 chunk 1
+ APP2 chunk 2 (if payload > 32,731 bytes)
+ original_image[2:]
APP2 Chunk Structure#
Each chunk contains:
- APP2 marker (FF E2)
- Length field (uint16 BE)
- Chunk index and total chunks
- Magic string:
EMBEDDED_DATA_V1 - Username (first chunk only)
- Encrypted data slice
Maximum payload per chunk: 32,768 bytes
Encryption Scheme#
key = SHA-256(password)
nonce = os.urandom(12)
ciphertext, tag = AES-256-GCM(key, nonce, plaintext)
blob = nonce || ciphertext || tag
Detection Strategy#
Layer 1: Generic APP2 Anomaly Detection#
Identifies suspicious patterns without password:
- APP2 appearing before APP0/APP1
- Non-ICC_PROFILE APP2 content
- Multiple consecutive custom APP2 segments
- Unusually large APP2 payloads (>512 bytes)
Layer 2: EMBEDDED_DATA_V1 Fingerprint#
Detects the 16-byte magic string and reports:
- Stored username
- Chunk count and completeness
- Encrypted blob size
- Estimated plaintext size
- Original carrier offset
Usage Examples#
Embed a file#
python embed.py carrier.jpg secret.pdf -p mypassword
python embed.py carrier.jpg secret.txt -p mypassword -u alice -o hidden.jpg
Detect steganographic content#
python detect.py image.jpg # basic detection
python detect.py image.jpg --verbose # full segment table
python detect.py ./images/ # batch scan
Extract and decrypt#
python detect.py image.jpg --extract # extract encrypted blob
python detect.py image.jpg --decrypt -p mypassword
python detect.py image.jpg --restore # restore clean carrier
Technical Details#
Files:
jpeg_parser.py: Low-level JPEG segment parserembed.py: File hiding implementationdetect.py: Detection and extraction tool
Requirements:
- Python 3.8+
- pycryptodome >= 3.9.0
Limitations:
- JPEG carriers only
- Original filename not preserved
- Single SHA-256 pass for key derivation
- Signature-based detection (Layer 2 can be evaded by changing magic string)
Security Considerations#
The GCM authentication tag ensures that wrong passwords or data corruption are detected before decryption. However, the detection layer reveals metadata (username, blob size) without requiring the password, which may be a concern in some threat models.
For production use, consider replacing the simple SHA-256 key derivation with PBKDF2 or Argon2 for better resistance against brute-force attacks.
Repository#
Source code and detailed documentation: https://github.com/Ra-226/file-hiding-and-detection
