Skip to main content

JPEG Steganography with EMBEDDED_DATA_V1

·425 words·2 mins
Ra-226
Author
Ra-226
A little bit about you

Overview
#

file-hiding-and-detection implements the EMBEDDED_DATA_V1 steganographic scheme, which allows hiding any file inside a JPEG image using AES-256-GCM encryption and JPEG APP2 segment injection. The project includes both embedding and detection tools.

Key Features
#

  • Non-destructive embedding: The carrier JPEG is never modified; encrypted payload is injected as APP2 segments
  • Strong encryption: AES-256-GCM with authentication ensures data integrity
  • Flexible detection: Multi-layer detection strategy works with or without password
  • Complete toolset: Embed, detect, extract, decrypt, and restore operations

Architecture
#

Embedding Process
#

The system injects encrypted data as APP2 segments immediately after the JPEG SOI marker:

output_image = SOI (FF D8)
             + APP2 chunk 1
             + APP2 chunk 2 (if payload > 32,731 bytes)
             + original_image[2:]

APP2 Chunk Structure
#

Each chunk contains:

  • APP2 marker (FF E2)
  • Length field (uint16 BE)
  • Chunk index and total chunks
  • Magic string: EMBEDDED_DATA_V1
  • Username (first chunk only)
  • Encrypted data slice

Maximum payload per chunk: 32,768 bytes

Encryption Scheme
#

key        = SHA-256(password)
nonce      = os.urandom(12)
ciphertext, tag = AES-256-GCM(key, nonce, plaintext)
blob       = nonce || ciphertext || tag

Detection Strategy
#

Layer 1: Generic APP2 Anomaly Detection
#

Identifies suspicious patterns without password:

  • APP2 appearing before APP0/APP1
  • Non-ICC_PROFILE APP2 content
  • Multiple consecutive custom APP2 segments
  • Unusually large APP2 payloads (>512 bytes)

Layer 2: EMBEDDED_DATA_V1 Fingerprint
#

Detects the 16-byte magic string and reports:

  • Stored username
  • Chunk count and completeness
  • Encrypted blob size
  • Estimated plaintext size
  • Original carrier offset

Usage Examples
#

Embed a file
#

python embed.py carrier.jpg secret.pdf -p mypassword
python embed.py carrier.jpg secret.txt -p mypassword -u alice -o hidden.jpg

Detect steganographic content
#

python detect.py image.jpg                    # basic detection
python detect.py image.jpg --verbose          # full segment table
python detect.py ./images/                    # batch scan

Extract and decrypt
#

python detect.py image.jpg --extract          # extract encrypted blob
python detect.py image.jpg --decrypt -p mypassword
python detect.py image.jpg --restore          # restore clean carrier

Technical Details
#

Files:

  • jpeg_parser.py: Low-level JPEG segment parser
  • embed.py: File hiding implementation
  • detect.py: Detection and extraction tool

Requirements:

  • Python 3.8+
  • pycryptodome >= 3.9.0

Limitations:

  • JPEG carriers only
  • Original filename not preserved
  • Single SHA-256 pass for key derivation
  • Signature-based detection (Layer 2 can be evaded by changing magic string)

Security Considerations
#

The GCM authentication tag ensures that wrong passwords or data corruption are detected before decryption. However, the detection layer reveals metadata (username, blob size) without requiring the password, which may be a concern in some threat models.

For production use, consider replacing the simple SHA-256 key derivation with PBKDF2 or Argon2 for better resistance against brute-force attacks.

Repository
#

Source code and detailed documentation: https://github.com/Ra-226/file-hiding-and-detection