JPEG Steganography with EMBEDDED_DATA_V1

Overview
#

file-hiding-and-detection implements the EMBEDDED_DATA_V1 steganographic scheme, which allows hiding any file inside a JPEG image using AES-256-GCM encryption and JPEG APP2 segment injection. The project includes both embedding and detection tools.

Key Features
#

Non-destructive embedding: The carrier JPEG is never modified; encrypted payload is injected as APP2 segments
Strong encryption: AES-256-GCM with authentication ensures data integrity
Flexible detection: Multi-layer detection strategy works with or without password
Complete toolset: Embed, detect, extract, decrypt, and restore operations

Architecture
#

Embedding Process
#

The system injects encrypted data as APP2 segments immediately after the JPEG SOI marker:

output_image = SOI (FF D8)
             + APP2 chunk 1
             + APP2 chunk 2 (if payload > 32,731 bytes)
             + original_image[2:]

APP2 Chunk Structure
#

Each chunk contains:

APP2 marker (FF E2)
Length field (uint16 BE)
Chunk index and total chunks
Magic string: EMBEDDED_DATA_V1
Username (first chunk only)
Encrypted data slice

Maximum payload per chunk: 32,768 bytes

Encryption Scheme
#

key        = SHA-256(password)
nonce      = os.urandom(12)
ciphertext, tag = AES-256-GCM(key, nonce, plaintext)
blob       = nonce || ciphertext || tag

Detection Strategy
#

Layer 1: Generic APP2 Anomaly Detection
#

Identifies suspicious patterns without password:

APP2 appearing before APP0/APP1
Non-ICC_PROFILE APP2 content
Multiple consecutive custom APP2 segments
Unusually large APP2 payloads (>512 bytes)

Layer 2: EMBEDDED_DATA_V1 Fingerprint
#

Detects the 16-byte magic string and reports:

Stored username
Chunk count and completeness
Encrypted blob size
Estimated plaintext size
Original carrier offset

Usage Examples
#

Embed a file
#

python embed.py carrier.jpg secret.pdf -p mypassword
python embed.py carrier.jpg secret.txt -p mypassword -u alice -o hidden.jpg

Detect steganographic content
#

python detect.py image.jpg                    # basic detection
python detect.py image.jpg --verbose          # full segment table
python detect.py ./images/                    # batch scan

Extract and decrypt
#

python detect.py image.jpg --extract          # extract encrypted blob
python detect.py image.jpg --decrypt -p mypassword
python detect.py image.jpg --restore          # restore clean carrier

Technical Details
#

Files:

jpeg_parser.py: Low-level JPEG segment parser
embed.py: File hiding implementation
detect.py: Detection and extraction tool

Requirements:

Python 3.8+
pycryptodome >= 3.9.0

Limitations:

JPEG carriers only
Original filename not preserved
Single SHA-256 pass for key derivation
Signature-based detection (Layer 2 can be evaded by changing magic string)

Security Considerations
#

The GCM authentication tag ensures that wrong passwords or data corruption are detected before decryption. However, the detection layer reveals metadata (username, blob size) without requiring the password, which may be a concern in some threat models.

For production use, consider replacing the simple SHA-256 key derivation with PBKDF2 or Argon2 for better resistance against brute-force attacks.

Repository
#

Source code and detailed documentation: https://github.com/Ra-226/file-hiding-and-detection

Overview#

Key Features#

Architecture#

Embedding Process#

APP2 Chunk Structure#

Encryption Scheme#

Detection Strategy#

Layer 1: Generic APP2 Anomaly Detection#

Layer 2: EMBEDDED_DATA_V1 Fingerprint#

Usage Examples#

Embed a file#

Detect steganographic content#

Extract and decrypt#

Technical Details#

Security Considerations#

Repository#