Create your HTML / ZIP / PNG polyglot file in JavaScript

Introduction

  • Web developer for 20+ years
  • Author of open-source projects:
    • zip.js: library for reading and writing zip files in JavaScript
    • SingleFile: extension and command-line tool for saving a web page as a single HTML file
      • use of data URIs to store binary content in base 64, e.g. data:;base64,aGVsbG8=
      • output file size larger than the size of all resources added together (approx. 33%)
  • Can SingleFile and zip.js be combined to produce more compact, easier-to-handle files?
  • Can we go further?

Test project

  • Simple HTML page
  • Inclusion of external resources:
    • images: image.png, background.png
    • CSS stylesheets: style.css, properties.css
    • JavaScript script: script.js
  • Stylesheets and scripts encoded in UTF-8
  • Files stored in the project/ folder

Test project

Test project

Project structure

project/index.html
project/script.js
project/style.css
project/properties.css
project/image.png
project/background.png
							
						

Test project

Project structure

project/index.html
project/script.js
project/style.css
project/properties.css
project/image.png
project/background.png
							
						

Test project

Project structure

project/index.html
project/script.js
project/style.css
project/properties.css
project/image.png
project/background.png
							
						

Test project

Project structure

project/index.html
project/script.js
project/style.css
project/properties.css
project/image.png
project/background.png
							
						

ZIP Format

  • Created in 1989 by Phil Katz at PKWARE (publisher of PKZIP)
  • Supports:
    • compression in DEFLATE format since version 2.0 (1993)
    • AES encryption since version 5.2 (2003)
  • Version 2.0+ supported on most operating systems
  • Examples of formats based on the ZIP format:
    • LibreOffice/MS Office Documents (.ODT, .DOCX, ...)
    • Java Archives (.JAR)
    • Android Packages (.APK and Archives iOS .IPA)
    • Web Extensions (.CRX and .XPI)

ZIP Format

  • File entries followed by the Central directory
  • Adapted for reading/writing in streaming but with some limitations in reading
  • Some metadata is stored in duplicate in the local headers and the central directory: file name, last modification date, data size, etc.
  • Central directory is required and authoritative
  • Central directory contains the relative offsets of each entry in the ZIP file
  • Metadata from local headers can be used to repair a ZIP file

ZIP Format

Source: https://en.wikipedia.org/wiki/ZIP_(file_format)

ZIP Format & JavaScript

Example with zip.js

Example with zip.js

Example of ZIP file creation

					import { ZipWriter, Uint8ArrayWriter } from "@zip-js/zip-js"

const zipDataWriter = new Uint8ArrayWriter()
const zipWriter = new ZipWriter(zipDataWriter)

for await (const { name } of readDirectory(inputFolder)) {
	const readableStream = await readFileStream(name)
	await zipWriter.add(name, readableStream)
}

await zipWriter.close()
const zipData = zipDataWriter.getData() // Uint8Array
console.log("zip file data:", zipData)
				

Integration in the project

Integration in the project

Creation of the ZIP file

index.html
index.js

lib/utils-zip.js
							
						

Integration in the project

Creation of the ZIP file

index.html
index.js

lib/utils-zip.js
							
						

Integration in the project

Creation of the ZIP file

index.html
index.js

lib/utils-zip.js
							
						

Integration in the project

Creation of the ZIP file

index.html
index.js

lib/utils-zip.js
							
						

Example with zip.js

Example with zip.js

Example of reading a ZIP file

					import { ZipReader, BlobReader, BlobWriter } from "@zip-js/zip-js"

const zipReader = new ZipReader(new BlobReader(blob))
const entries = await zipReader.getEntries()

for (const entry of entries) {
	const blob = await entry.getData(new BlobWriter())
	console.log("file:", entry.filename, "blob:", blob)
}

await zipReader.close()
			

Integration in the project

Integration in the project

Reading the ZIP file

index.html
index.js
							
						

Integration in the project

Reading the ZIP file

index.html
index.js
							
						

Integration in the project

Reading the ZIP file

ZIP Format (cont.)

  • Extensible format:
    • 64KB of data after ZIP file (comment)
    • Offset greater than zero for the first entry in the zip file
  • Possibility of adding content before and after a ZIP file, while maintaining its validity
  • Self-extracting HTML page structure:
    1. HTML content up to an opening tag <!--
    2. ZIP file content
    3. Closing tag --> and end of HTML content

ZIP Format (cont.)

Self-extracting HTML file template

<!DOCTYPE html>
<html>
	<head>
		<meta charset="utf-8">
		<title>Please wait...</title>
		<script><!-- Content of assets/zip.min.js --></script>
	</head>
	<body>
		<p>Please wait...</p>
		<script><!-- Content of assets/main.js --></script>
  </body>
</html>

ZIP Format (cont.)

Self-extracting HTML file template

<!DOCTYPE html>
<html>
	<head>
		<meta charset="utf-8">
		<title>Please wait...</title>
		<script><!-- Content of assets/zip.min.js --></script>
	</head>
	<body>
		<p>Please wait...</p>
		<!-- ZIP data     -->
		<script><!-- Content of assets/main.js --></script>
  </body>
</html>

Polyglot HTML/ZIP file

1/4 - Extracting and displaying ZIP file entries

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Polyglot HTML/ZIP file

1/4 - Extracting and displaying ZIP file entries

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Polyglot HTML/ZIP file

1/4 - Extracting and displaying ZIP file entries

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Polyglot HTML/ZIP file

1/4 - Extracting and displaying ZIP file entries

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Polyglot HTML/ZIP file

1/4 - Extracting and displaying ZIP file entries

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Polyglot HTML/ZIP file

1/4 - Extracting and displaying ZIP file entries

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Polyglot HTML/ZIP file

1/4 - Extracting and displaying ZIP file entries

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Polyglot HTML/ZIP file

1/4 - Extracting and displaying ZIP file entries

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Polyglot HTML/ZIP file

1/4 - Extracting and displaying ZIP file entries

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Polyglot HTML/ZIP file

1/4 - Extracting and displaying ZIP file entries

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Polyglot HTML/ZIP file

1/4 - Extracting and displaying ZIP file entries

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Polyglot HTML/ZIP file

1/4 - Extracting and displaying ZIP file entries

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Polyglot HTML/ZIP file

2/4 - Displaying the index.html page

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Polyglot HTML/ZIP file

3/4 - Displaying the full page

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Inactive JavaScript code ⚠️

Polyglot HTML/ZIP file

4/4 - Scripts support

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Reading the page from the file system ⚠️

Reading the page from the file system

  • Bypassing the call to await fetch("")
  • Read ZIP file from DOM via Node#textContent
  • Data corruption caused by character encoding:
    • 1-byte character encoding (e.g. windows-1252) versus multiple bytes (e.g. UTF-8)
    • substitution of U+FFFD replacement characters for characters whose code is 0 or invalid (depending on encoding)
    • replacement by a line feed \n of the characters:
      • carriage returns \r
      • carriage returns followed immediately by a line feed \r\n
    • replacement of certain characters (depending on encoding) whose code is greater than 127, i.e. beyond the ASCII 7-bit table

Reading the ZIP file from the DOM

Reading the ZIP file from the DOM

Hexadecimal display of ZIP data read as text

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

UTF-8:

windows-1252:

Reading the ZIP file from the DOM

Comparison of the impact of different 1-byte encodings

  • Web pages containing a 256-byte binary content with all possible values
  • Test with all supported encodings
  • Reading of binary data as text in JavaScript
  • Determining and displaying the number of characters:
    • equal to the replacement character U+FFFD (first column)
    • different from those expected (second column)

Reading the ZIP file from the DOM

Comparison of the impact of different 1-byte encodings

Reading the ZIP file from the DOM

Changes in the HTML template

  • Replacement of UTF-8 encoding with windows-1252
  • Computing of consolidation data to restore binary content
  • Array of 2 arrays containing the indexes of all:
    • carriage returns \r
    • carriage returns followed immediately by a line feed \r\n
  • Insertion of consolidated JSON data in a <script> tag

Reading the ZIP file from the DOM

Self-extracting HTML file template (before)

<!DOCTYPE html>
<html>
	<head>
		<meta charset="utf-8">
		<title>Please wait...</title>
		<script><!-- Content of assets/zip.min.js --></script>
	</head>
	<body>
		<p>Please wait...</p><!--
		  ZIP data 
		-->
		<script><!-- Content of assets/main.js --></script>
  </body>
</html>

Reading the ZIP file from the DOM

Self-extracting HTML file template (after)

<!DOCTYPE html>
<html>
	<head>
		<meta charset="windows-1252">
		<title>Please wait...</title>
		<script><!-- Content of assets/zip.min.js --></script>
	</head>
	<body>
		<p>Please wait...</p><!--
		  ZIP data 
		-->
		<script><!-- Content of assets/main.js --></script>
  </body>
</html>

Reading the ZIP file from the DOM

Self-extracting HTML file template (after)

<!DOCTYPE html>
<html>
	<head>
		<meta charset="windows-1252">
		<title>Please wait...</title>
		<script><!-- Content of assets/zip.min.js --></script>
	</head>
	<body>
		<p>Please wait...</p><!--
		  ZIP data 
		-->
		<script type="text/json"> Consolidation data     </script>
		<script><!-- Content of assets/main.js --></script>
  </body>
</html>

Reading the ZIP file from the DOM

Support for consolidation data in the HTML template

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Reading the ZIP file from the DOM

Support for consolidation data in the HTML template

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Reading the ZIP file from the DOM

Adding consolidation data to the HTML page

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Reading the ZIP file from the DOM

Adding consolidation data to the HTML page

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Reading the ZIP file from the DOM

Adding consolidation data to the HTML page

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Reading the ZIP file from the DOM

Bypassing the call to await fetch("")

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Reading the ZIP file from the DOM

Bypassing the call to await fetch("")

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

Page resources are decoded in windows-1252 ⚠️

Reading the ZIP file from the DOM

Correction of MIME type issues in external resources

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js

assets/main.js
							
						

PNG Format

  • Created in 1996 by a public working group
  • Standard, royalty-free image format
  • Lossless compression (DEFLATE)
  • File composed of a signature followed by a sequence of chunks
  • Chunk structure:
    1. Chunk data length (4 bytes)
    2. Chunk type (4 bytes): IHDR, IDAT, IEND, tEXt ...
    3. Chunk data (variable length)
    4. Cyclic redundant code (CRC32) computed from all the chunk data

PNG Format

  • Minimal PNG file structure:
    1. PNG signature (8 bytes)
      89 50 4E 47  0D 0A 1A 0A
    2. IHDR chunk for the header (13 bytes)
      00 00 00 0D  49 48 44 52  ...
    3. IDAT chunk(s) for the data
    4. IEND chunk for the trailer (12 bytes)
      00 00 00 00  49 45 4E 44  AE 42 60 82
  • tEXt chunk to store text or binary data
    xx xx xx xx  74 45 58 74  ...

PNG Format

Minimal PNG file structure


Data type Data in hexadecimal Mandatory Length (bytes)
PNG signature 89 50 4E 47 0D 0A 1A 0A 8
Header chunk IHDR 00 00 00 0D 49 48 44 52 ... 13
...
Data chunk IDAT xx xx xx xx 49 44 41 54 ... 12 + n
...
Trailer chunk IEND 00 00 00 00 49 45 4E 44 AE 42 60 82 12

Polyglot HTML/ZIP/PNG file

  • Encapsulation of the HTML/ZIP file in a PNG file
  • PNG polyglot file structure:
    1. PNG signature
    2. PNG header chunk
    3. PNG text chunk: HTML content up to the opening tag <!--
    4. PNG data chunk(s)
    5. PNG text chunk: closing tag --> and end of HTML content
    6. PNG trailer chunk

Polyglot HTML/ZIP/PNG file

PNG file structure (before)


Data type Data in hexadecimal Mandatory Length (bytes)
PNG signature 89 50 4E 47 0D 0A 1A 0A 8
Header chunk IHDR 00 00 00 0D 49 48 44 52 ... 13
...
Data chunk IDAT xx xx xx xx 49 44 41 54 ... 12 + n
...
Trailer chunk IEND 00 00 00 00 49 45 4E 44 AE 42 60 82 12

Polyglot HTML/ZIP/PNG file

PNG file structure (after)


Data type Data in hexadecimal Mandatory Length (bytes)
PNG signature 89 50 4E 47 0D 0A 1A 0A 8
Header chunk IHDR 00 00 00 0D 49 48 44 52 ... 13
Text chunk tEXt xx xx xx xx 74 45 58 74 ... 12 + n
...
Data chunk IDAT xx xx xx xx 49 44 41 54 ... 12 + n
...
Text chunk tEXt xx xx xx xx 74 45 58 74 ... 12 + n
Trailer chunk IEND 00 00 00 00 49 45 4E 44 AE 42 60 82 12

Polyglot HTML/ZIP/PNG file

Self-extracting HTML file template (before)

<!DOCTYPE html>
<html>
	<head>
		<meta charset="windows-1252">
		...
	</head>
	<body>
		<p>Please wait...</p><!--      ZIP data 
		--><script type="text/json">
		  Consolidation data
		</script>
		<script><!-- Content of assets/main.js --></script>
  </body>
</html>

Polyglot HTML/ZIP/PNG file

Self-extracting HTML file template (after)

<!DOCTYPE html>
<html>
	<head>
		<meta charset="windows-1252">
		...
	</head>
	<body>
		<p>Please wait...</p><!-- IDAT chunk(s)     --><!--
		  ZIP data
		--><script type="text/json">
		  Consolidation data
		</script>
		<script><!-- Content of assets/main.js --></script>
  </body>
</html>

Polyglot HTML/ZIP/PNG file

Self-extracting HTML file template (after)

PNG signature + IHDR chunk (21 bytes)
<!DOCTYPE html>
<html>
	<head>
		<meta charset="windows-1252">
		...
	</head>
	<body>
		<p>Please wait...</p><!-- IDAT chunk(s)     --><!--
		  ZIP data 
		--><script type="text/json">
		  Consolidation data
		</script>
		<script><!-- Content of assets/main.js --></script>
  </body>
</html>
IEND chunk (12 bytes)

Polyglot HTML/ZIP/PNG file

Encapsulation of the HTML/ZIP file into a PNG file

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js
lib/utils-png.js

assets/main.js
							
						

Polyglot HTML/ZIP/PNG file

Encapsulation of the HTML/ZIP file into a PNG file

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js
lib/utils-png.js

assets/main.js
							
						

Polyglot HTML/ZIP/PNG file

Encapsulation of the HTML/ZIP file into a PNG file

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js
lib/utils-png.js

assets/main.js
							
						

Polyglot HTML/ZIP/PNG file

Encapsulation of the HTML/ZIP file into a PNG file

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js
lib/utils-png.js

assets/main.js
							
						

Polyglot HTML/ZIP/PNG file

Encapsulation of the HTML/ZIP file into a PNG file

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js
lib/utils-png.js

assets/main.js
							
						

Polyglot HTML/ZIP/PNG file

Encapsulation of the HTML/ZIP file into a PNG file

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js
lib/utils-png.js

assets/main.js
							
						

Text nodes induced by the PNG format are visible ⚠️

Polyglot HTML/ZIP/PNG file

Removal of text nodes induced by PNG format

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js
lib/utils-png.js

assets/main.js
							
						

The page is rendered in quirks mode ⚠️

Polyglot HTML/ZIP/PNG file

Correction of HTML page rendering mode and script loading

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js
lib/utils-png.js

assets/main.js
							
						

The main image is stored twice in the file 🤔

Polyglot HTML/ZIP/PNG file

Reuse of the image in the HTML page

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js
lib/utils-png.js

assets/main.js

project/index.html
							
						

Polyglot HTML/ZIP/PNG file

Reuse of the image in the HTML page

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js
lib/utils-png.js

assets/main.js

project/index.html
							
						

Polyglot HTML/ZIP/PNG file

Reuse of the image in the HTML page

index.html
index.js

lib/html-template.js
lib/utils.js
lib/utils-zip.js
lib/utils-png.js

assets/main.js

project/index.html
							
						

Conclusion

  • Limitations of the final implementation:
    • Manual resolution of dependencies
    • Avoid overflow of 64KB of data after ZIP file (comment)
    • Presence of --> in ZIP or PNG binary data
    • Use of String#replaceAll() to replace paths in text files instead of relying on parsing
    • Lack of <meta> tag containing the Content Security Policy (CSP)
    • No support for frames
    • ...
  • Alternative formats: MHTML, Web Bundle, WARC/WACZ, MAFF ...
  • Is it dangerous? 🤷 (GIFAR)

Thank you!

Questions? 🤔, Feedback? 📝