Infosec Blog

Have you ever surfed the internet and seen a “Download as PDF” button?
Over the past few years, many sites have added the option to export your personal data to an accessible format, as PDF / Word.
As a penetration tester, I have tested a lot of large web applications that included the conversion feature, and was wondering - what happens behind the scenes, does this process broaden the attack surface?

After a quick research, I discovered that the process is very dangerous from a security perspective, and without the appropriate filtering, could expose your application to many vulnerabilities.
In this article, I will try to explain the conversion process, and the potential attacks.

1. The Conversion Process

When a website converts data to PDF, in most cases, what actually happens is the following process

The web application gets the client’s data from a database / directly from the client.
Put the data inside an HTML template*
Sends the custom HTML to an external library
The external library gets the HTML, does its magic and returns a PDF file
The client downloads the PDF file.

*In some cases, the web application downloads the whole HTML, including the personal data, directly from the website itself with HTTP (e.g., from the profile page of the user)

Legitimate export process

The most interesting part is the conversion from the custom HTML to the PDF file by the external library.
I discovered that there are many players in the HTML to PDF market.

2. The attack vector

The conversion process takes an HTML page, parses all the elements inside it, and converts each one to a new PDF element.
The common external libraries are full of features, and support many HTML tags. Some of them even support CSS and Javascript.
With this understanding, think about the following scenario: what would happen, if an attacker succeeds to inject a malicious HTML tag to the conversion process?
If the web application does not encode or filter the user’s input, the server is exposed to a wide range of vulnerabilities.

2.1. Arbitrary file download

One of the most common vulnerabilities on the web, is the option to download an arbitrary file from a server. This situation constitutes a critical security breach, because it gives an attacker the ability to download sensitive data from the server. e.g., log files that contain users’ data, configuration files that contain connection strings and encryption keys, users’ private files, etc.
If we could inject an HTML tag to the conversion process, in some libraries, we can download almost any file from the web server. For this attack vector, we should use these tags:

iframe / frame
object
fonts (CSS)

The attacker downloads the server's hosts file

Example from the real world:

1. The HTTP Request

Malicious HTTP request

2. The PDF Response

The response with the hosts file inside

2.2 Internal network exposure (SSRF)

Sometimes during a penetration test, after exposing a few vulnerabilities I come to a dead end. In many cases, what separates me from a significant progress is the inability to disclose information about the server and the internal network.
The “Export Injection”, in all the libraries, gives us the option to obtain a lot of information about the server. Some techniques that have occurred to me:

Internal port scanning: by the delay of the response from the web server, we can reveal if a port is open or closed. For example, if we send a malicious IMG tag:

<img src=”http://127.0.0.1:445”/> - Delay of 2.3 seconds (The port is open)
<img src=”http://127.0.0.1:666”/> - Delay of 4.8 seconds (The port is close)

Internal resources access: we can use the Object, Iframe and Frame tags to access internal HTTP interfaces and watch the responses. For example:

Injection of:
<object data=”http://127.0.0.1:8443”/>

Internal management interface

Discover the real IP address of the website: We could make the site perform an HTTP request to any server on the internet, even to our server. I used the “iplogger” site to log the IP address of the attacked website:
- <img src=”https://iplogger.com/113A.gif”/>
The server has performed an HTTP request to our server
With this technique, we can expose the real IP of the web server, and perform an effective port scan.

2.3. Effective Denial of Service (DOS)

The vulnerability exposes the site to a potential DOS attack. The external libraries support parsing complex data (Images, fonts and more). An attacker could abuse this mechanism and make the server work hard, if he sent one the following tags:

<img src="http://download.thinkbroadband.com/1GB.zip"/>
Causes the web application to download a heavy file.
<iframe src=”http://example.com/RedirectionLoop.aspx”/>
Causes the web application to enter to a long HTTP redirection loop.

The way to perform a DOS attack changes from library to library.

3. How to protect yourself?

It’s quite easy to prevent the vulnerability.
As a concept, you should never pass users’ input to an external library without a thought. Always think – “What an attacker would do?”
In this specific case, you should encode the input before passing it to the external conversion libraries.
HTML Encode should work and prevent the potential vulnerabilities in most cases.

Vulnerable Libraries:

4. Conclusions:

I hope that my quick research will increase the awareness for this vulnerability. The attack surface is broad, and I mentioned only the basic vectors. I hope that the article will open a door for future researches about the conversion process.

SMALI

במסמך זה אתאר את העקרונות של .smali שפת

מבנה קובץ smali:

instance fields – משתנים שמוגדרים עבור כל מופע של המחלקה
static fields – משתנים המוגדרים עבור כל המופעים של אותה מחלקה
direct methods – מתודות מסוג static\private\constructor
virtual methods – כל שאר הסוגים של המתודות

Registers:

ב-dex, נעשה שימוש ברג'יסטרים וירטואלים (בדומה לאסמבלי) בכדי לאחסן ערכים ולהעביר אותם לפקודות \ מתודות. הם באורך 32 ביט, ויכולים להכיל כל סוג של ערך.

בכל תחילת מתודה מוגדרת כמות הרג'יסטרים שהיא תכיל.

ישנם שני סוגים רג'יסטרים:

רג'יסטרים מקומיים (locals):
רג'יסטרים שהמתודה עושה בהם שימוש, ומוגדרים בתוכה. דומה למשתנים מקומיים המוגדרים בתוך פונקציה. באמצעותם ייעשו חישובים פנימיים של הפונקציה והם תקפים רק בתחומה.
הם יהיו הרג'יסטרים הראשונים מבחינת הסדר. בפונקציות שאינן סטטיות, הפרמטר הראשון (p0) יכיל את המופע של המחלקה.

רג'יסטרים המכילים פרמטרים (arguments):
כאשר מעבירים לפונקציה פרמטרים, הם יתקבלו וישמרו בתור רג'יסטרים. הם יהיו הרג'יסטרים האחרונים, אחרי הרג'יסטרים המקומיים.

לדוגמא, נתונה פונקציה לא סטטית המקבלת שני פרמטרים, int ו-string, ועושה שימוש פנימי ב3 רג'יסטרים.

ב-smali, הפונקציה תכיל 6 רג'יסטרים (v0-v5)

שימוש פנימי של המתודה	V0
שימוש פנימי של המתודה	V1
שימוש פנימי של המתודה	V2
יכיל את המופע של המחלקה	V3
יכיל את פרמטר ה-int	V4
יכיל את פרמטר ה-string	V5

בכל תחילת מתודה, יוגדר כמה רג'יסטרים יש בה. יש 2 דרכים לעשות זאת

המילה השמורה '.registers'צורה זו מגדירה כמה רג'יסטרים יהיו אבסולוטית במתודה (מקומיים + פרמטרים)

המילה השמורה '.locals'
צורה זו מגדירה כמה רג'יסטרים לוקאליים יהיו במתודה (אליהם יתווספו הרג'יסטרים של הפרמטרים)

כמו-כן, לרג'יסטרים המכילים פרמטרים ניתן לפנות גם באמצעות האות p ואחריה מספר הפרמטר. לדוגמא, במתודה מהדוגמא הקודמת יהיה ניתן להתייחס לרג'יסטרים גם באופן הבא:

	V0
	V1
	V2
	V3
P0	V4
P1	V5

השימוש זה עדיף מן הסתם, מכיוון שניתן להוסיף רג'יסטרים מקומיים נוספים, ועדיין לשמור על תקינות הפונקציה.

דגשים נוספים:

כל הפקודות ב-dex תומכות ב16 רג'יסטרים (v0-v15), אך חלק גדול מהן אינו תומך ביותר מ16. לכן, כשנרצה לעשות patch לקוד קיים, נעדיף לא להגדיל את מספר הרג'יסטרים ליותר מ16, אלא להשתמש בכאלה קיימים.

טיפוסים:

ישנם שני סוגי טיפוסים ב-Smali:

Reference types: אובייקטים ומערכים
Primitive types: כל השאר

Primitives types (רשימה לא מלאה)
V	void - can only be used for return types
Z	boolean
B	byte
S	short
C	char
I	int
J	long (64 bits)
F	float
D	double (64 bits)

ההתייחסות ל-Boolean מבחינת פקודות dex זהה להתייחסות ל-int, כש0 מייצג false ו-1 מייצג true.

אובייקטים:

אובייקטים מיוצגים באופן הבא:

Lpackage/name/ObjectName;

האות L מייצגת תחילת אובייקט, אחריה יגיע המיקום שלו בפרוייקט, ובסוף Semicolon

מערכים:

מערכים מיוצגים בפורמט הבא: קודם כל התו ']' כפול מספר הממדים של המערכך, ואחריו הטיפוס של המערך.
דוגמאות:

[[I – מערך דו ממדי של Int (int[][])

[Ljava/lang/String; - מערך חד ממדי של מחרוזות (String[])

מספרים:

מספרים מיוצגים בייצוג הקסדצימלי, ומשתמשים ב-IEEE754 standards.
כדאי להשתמש ב-converter ייעודי על מנת לחשב אותם.

מתודות:

תיאור מתודות:

מתודות מתוארות בצורה מפורטת, בפורמט:

Lpackage/name/ObjectName;->MethodName(III)Z

מתודה זו שייכת למחלקה ObjectName, מקבלת 3 טיפוסים מסוג int ומחזירה טיפוס בוליאני.

הגדרת מתודה:

הגדרת מתודה תעשה בפורמט הבא –

.method + defintions (public/final/static/etc..) + specification

לאחר מכן, יוגדרו מספר הרג'יסטרים במתודה (בפורמט .locals או .registers)
לאחר מכן, הפרמטרים שהמתודה מקבלת (שורה עבור כל פרמטר, בגרסאות ישנות
'.parameter' ובגראות חדשות '.param' + שם הפרמטר (אם קיים)
ובסיום המתודה, '.end method'

הגדרות Debugging:

ישנן הגדרות נוספות בתוך מתודה שנועדו לצורכי Debugging, כמו .line או .prologue

דוגמא:

הגדרת מתודת Constructor המקבלת שני פרמטרים – טיפוס int וטיפוס בוליאני.
במתודה זו, יש שימוש ב2 רג'יסטרים מקומיים(v0,v1) , וב2 רג'יסטרים המכילים פרמטרים (v2,v3 או p0,p1)
יש לשים ♥ - בגלל שמדובר במתודה סטטית, הרג'יסטר הראשון (v0) משמש בתור רג'יסטר מקומי רגיל, ואינו מכיל את המופע של אותה מחלקה.

שדות:

פניה לשדות של אובייקט תעשה בפורמט הבא :

Lpackage/name/ObjectName;->FieldName:Ljava/lang/String;

פנייה לאובייקט בשם FieldName מסוג String, של המחלקה ObjectName.

הזחות:

בקוד smali, ההזחה נעשית באמצעות 4 רווחים. בין כל שורה לשורה יהיה רווח של שורה ריקה (\r\n\r\n לאחר כל פקודה). השפה אינה מתירנית, ועל כן יש להקפיד על חוקי ההזחות.

פקודות:

פקודות ב-smali תואמות ל-dalvik opcodes והן למעשה הלב של השפה. הן מזכירות קצת Assembly ומבצעות את הלוגיקה של הקוד. מעבירים להן רג'יסטרים והן מבצעות עליהם פעולות שונות.

יש 2 צורות שבהן ניתן להעביר משתנים לפקודות:

בתור פרמטרים:
בדרך כלל בקריאה לפקודה שמבצעת invoke לפונקציה
ייעשה בפורמט {{v0, v1, ..

לא בתור פרמטרים:
בשאר הפקודות, ללא סוגריים.
יעשה בפרומט v0, v1

דוגמא	תיאור	פקודה
invoke-static {}, Ljava/util/concurrent/Executors;->newSingleThreadExecutor()Ljava/util/concurrent/ExecutorService; קריאה לפונקציה newSingleThreadExecutor, של המחלקה Executors, המחזירה אובייקט מסוג ExecutorService.	מבצעת קריאה למתודה סטטית	invoke-static {parameters}, methodtocall
invoke-static {p0, v2, v3}, Lkeva/katzinActivity;->moreKeva(ILjava/lang/String)Z קריאה לפונקציה moreKeva של אובייקט מסוג katzinActivity, המחזירה משתנה בולייאני. P0 יכיל את מופע המחלקה katzinActivity שהגיע בתור פרמטר למתודה הקוראת, v2 יכיל טיפוס INT (I) ו-v3 יכיל אובייקט מחרוזת	מבצעת קריאה למתודה וירטואלית	invoke-virtual { parameters }, methodtocall
	השמה של הפניה לאובייקט שהחזיר ה-invocation הקודם, לתוך הרג'יסטר vx	move-result-object vx
	השמה של הערך שהחזיר ה-invocation הקודם, לתוך רג'יסטר vx	move-result vx
* משמש גם לבדיקה של ביטויים בוליאניים. שקול ל if (flag)	אם הערך של vx שווה ל-0, קפוץ ל-target. ה-target הוא ה-label של המיקום שאליו נרצה לקפוץ. ב-dex bytecode ה-label יתורגם ל-offset הרצוי	If-eqz vx, target
שקול ל if (!flag)	אם הערך של vx לא שווה ל-0, קפוץ ל-target.	If-nez vx,target
const/4 v1, 0x2 השמה של הספרה 2 בתוך הרג'יסטר v1	השמה של 4 ביטים בתוך vx	const/4 vx,lit4
const-string v1, "ACCESS_TOKEN"	השמה של מחרוזת בתוך רג'יסטר vx. ברמת ה-smali נראה ממש מחרוזת. בפועל יש שם הפנייה למספר מחרוזת ב-string table בקובץ ה-dex	const-string vx,string