Data Encoding Basics

URL encoding

This table is a character encoding chart that is useful in explaining which characters are “safe” and which characters should be encoded in URLs.

  • http://perishablepress.com/stop-using-unsafe-characters-in-urls/

Some commonly encoded characters are:

Character Purporse in URL Encoding
’#’ Separate anchors %23
’?’ Separate query string %3F
‘&’ Separate query elements %24
’%’ Indicates an encoded character %25
’/’ Separate domain and directories %2F
’+’ Indicates a space %2B
' Not recommended %20 or +

HTML encoding

Documents transmitted via HTTP can send a charset parameter in the header to specify the character encoding of the document sent. This is the HTTP header: Content-Type

Content-Type:text/html;charset=utf-8

Define character encoding using HTTP

# PHP > Uses the header() function to send a raw HTTP header:
header('Content-Type:text/html;charset=utf-8')

# ASP.Net > Uses the response object:
<%Response.charset="utf-8"%>

# JSP > Uses the page directive:
<%@ page contentType="text/html; charset=UTF-8"%>
# Using directive META:
<meta http-equiv="Content-Type" Content="text/html;charset=utf-8">

# With HTML5, is also possible to write:
<meta charset="utf-8">

HTML4 and HTML5 specifications about special characters

http://www.w3.org/TR/1998/REC-html40-19980424/charset.html#h-5.3

http://www.w3.org/TR/html5/single-page.html#character-references

  • Named characters references:

→ http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Character_entity_references_in_HTML

Characters references must start with a U+0026 AMPERSAND character (&):

CHaracter Reference Rule Encoded character
Named entity & + named character references + ; &It;
Numeric Decimal & + # + + ; <
Numeric Hexadecimal & + #x + + ; < / &#X3C;

Some Variations:

CHaracter Reference Variation Encoded character
Numeric Decimal No terminator (;) <
  One or more zeroes before code &#060 / &#0000060
Numeric Hexadecimal No terminator (;) &#x3c
  One or more zeroes before code &#0x3c / &#00003c

Base (36|64) encoding

Base 36 Encoding Scheme

Base36 - Its the most compact, case-insensitive, alphanumerical system using ASCII characters. In fact, the schemes alphabet contains all digits [0-9] and Latin letters [A-Z]

Alt text

Its used in many real-world scenarios

  • Reddit used if For identifying both posts and comments

Some URL shortening services like TinyURL use Base36 integer as compact, alphanumeric identifiers.

→ http://tinyurl/ljasd

PHP

PHP uses the base_convert() function to convert numbers:

OHPE is Base 10 is <?=base_convert("OHPE",36,10);?>

JavaScript JavaScript used two functions:

(1142690.toString(36)
1142690..toString(36) #encode
parseInt("ohpe",36)   #decode

Base64 Encoding Scheme

Base64 is one of the most widespread binary-to-text encoding schemes to date. It was designed to allow binary data to be represented as ASCII string text.

Alt text

  • The alphabet of the Base64 encoding scheme is composed of digits [0-9] and Latin letters, both upper and lower case [a-zA-Z], For a total of 62 values. To complete the character set to 64 there are the plus (+) and slash (/) characters.
  • Moreover: http://en.wikipedia.org/wiki/Base64#Implementations_and_history

The algorithm divides the message into groups of 6 bits* and then converts each group, with the respective ASCII character, following the conversion table.

Alt text

Alt text

Alt text

Thats why the allowed characters are 64 (2 raised to 6th power = 64)

  • If the lastest gruop is null(000000) the respective encoding value is =
  • If the traiing null groups are two, then will be encoded as ==

PHP PHP used base64_encode and base64_decode functions based on MIME Base64 implementation:

<?=base64_encode('encode this string')?> //encode
<?=base64_decode('ZW5jb2RlIHRoaXMgc3RyaW5n')?> //decode

JavaScript Many browsers can handle base64 natively through function btoa and atob:

window.btoa('encode this string'); //encode
window.atob('ZW5jb2RlIHRoaXMgc3RyaW5n'); //decode

Moreover: https://developer.mozilla.org/en-US/docs/Web/API/Window.btoa

Unicode encoding

Unicode aka ISO/IEC 10646 Universal Character Set. It can expose web applications to possible security attacks, like bypass filters.

  • http://www.joelonsoftware.com/articles/Unicode.html

Alt text

Alt text

UTF = Unicode Transformation Format:

- UTF-8
- UTF-16
- UTF-32

Homoglyph | Visual Spoofing

In typography, a Homoglyph is one or two or more characters, or glyphs, with shapes that either appear identical or cannot be differentiated by quick visual inspection. -Wikipedia

Homograph - a word that looks the same as another word
Homogliph - a look-like character used to create homographs

Example:
Visual Sp'oo'fing = U+006F (Latin small letter o)
U+03BF (Greek small letter omicron)

Alt text

Alt text

Moreover Confusables: http://unicode.org/cldr/utility/confusables.jsp

Homoglyph Attack Generator: http://www.irongeek.com/homoglyph-attack-generator.php

Article about Homoglyph and Punycode attacks: http://www.irongeek.com/i.php?page=security/out-of-character-use-of-punycode-and-homoglyph-attacks-to-obfuscate-urls-for-phishing

They can bypass anti cross-site scripting and SQL Injection filters;

  • Create usernames and Spotify account hijacking: http://labs.spotify.com/2013/06/18/creative-usernames/

There are other ways in which characters and strings can be transformed by software processes, such as normalization, canonicalization, best fit mapping, etc

Alt text

Alt text

→ Moreover: http://websec.github.io/unicode-security-guide/

Extra Resources

- http://unicode.org/cldr/utility/
- http://codepoints.net/
- http://txtn.us/
- http://www.panix.com/~eli/unicode/convert.cgi

Multiple (De|En) Codings

  • Its common to abuse multiple encodings to bypass security measures
    URL-Encoding > URL
    

Filtering Basics

A common, yet often recommended, best practice to protect web applications against malicious attacks is the use of specific input filtering and output encoding controls.

These kings of controls may range from naive blacklists to experienced and higly restrictive whitelists. What about in the real world? We are somewhere in the middle!

  • Control can be implemented at different layers in a web application. They can be represented as either libraries and APIs, or in the best case, by internal specialits or external organizations, like ESAPI by OWASP.
  • https://www.owasp.org/index.php/Category:OWASP_Enterprise_Security_API
  • Security Controls are also inside most common browsers.

Generally, these solutions fall into the IDS and IPS world, but FOr Web Applications, the most chosen are the Web Application Firewall (WAFs)

A free and open source solution: http://www.modsecurity.org/

Regular Expressions (RE or RegEx)

  • Represents the official way used to define the filter rules. Mastering RegEx is fundamental to understand how to bypass filters because RE are extremely powerful.
  • Its a special sequence of characters used For describing a search pattern.

→ regular expression = regex

→ pattern matched = match

Two main types

  • DFA = http://en.wikipedia.org/wiki/Deterministic_finite_automaton
  • NFA = http://en.wikipedia.org/wiki/Nondeterministic_finite_automaton
Engine Program
DFA awk, egrep, MySQL, Procmail
NFA .NET languages, Java, Perl, PHP, Python, Ruby, PCRE library, vi, grep, less, more

Comparison of regular expression engines: http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines

Regular Expression Flavor Comparison: http://www.regular-expressions.info/refflavors.html

Non-printing characters:

its used to evade bad filters and obfuscate the payload.

Match Unicode Code Point:

Regular expression flavors that work with Unicode use specific meta-sequences to match code points.
The sequence is \ucode-point, where code-point is the hexadecimal number of the character to match. 
There are regex flavors like PCRE that do not support the former notation, 
but use an alternative sequence \x{code-point} in its place.

example:

\u2603   = the snowman character in .NET, Java, Javascript and Python
\x{2603} = the snowman character in Apache and PHP (PCRE library)

Meta-sequence Quality:

\p{quality-id} = have a specific quality
\P{quality-id} = do not have quality

Match Unicode Category:

# To match the string with all the case variations (lower, upper and title), this regex does the job:
[\p{Ll}\p{Lu}\p{Lt}]

# As a shorthand, some regex flavors implement this solution:
\p{L&}

Web Application Firewal - WAF

ByPass WAFs

|-| = instead of using this
|→| = the best choice is

Cross-Site Scripting:

- alert('xss') 
- alert(1)
→ prompt('xss') 
→ prompt(8)
→ confirm('xss')
→ confirm(8)
→ alert(/xss/.source)
→ window[/alert/.source](8)

- alert(document.cookie) 
→ with(document)alert(cookie) 
→ alert(document['cookie'])
→ alert(document[/cookie/.source])
→ alert(document[/coo/.source+kie/.source])

- <img src=x onerror=alert(1);>
→ <svg/onload=alert(1)>
→ <video src=x onerror=alert(1);>
→ <audio src=x onerror=alert(1);>

- javascript:alert(document.cookie)
→ data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=

Blind SQL Injection

- 'or 1=1           '
- 'or 6=6           '
→ 'or 0x47=0x47     '
- or char(32)=''
→ or 6 is not null

SQL Injection

- UNION SELECT
→ UNION ALL SELECT

Directory Traversal

- /etc/passwd
→ /too/../etc/far/../passwd
→ /etc//passwd
→ /etc/ignore/../passwd
→ /etc/passwd........

Web Shell

- c99.php
- r57.php
- shell.aspx
- cmd.jsp
- CmdAsp.asp
→ augh.php

Detection and Fingerprinting

  • Citrix Netscaler uses some different cookies in the HTTP responses like ns_af or citrix_ns_id or NSC_
  • F5 BIG-IP ASM (Application Security Manager) uses cookies starting with TS and followed with a string that respect the following regex:
^TS[a-z-A-Z0-9]{3,6}
  • Barracura uses two cookies barra_counter_session and BNI__BARRACUDA_LB_COOKIE

Header Rewrite

Some WAFs rewrite the HTTP headers. Usually these modify the Server Header to deceive the attackers.

HTTP Response Code

Some WAFs modify the HTTP response codes if the request is hostile; For example:

- mod_security       > 406 Not Acceptable
→ AQTRONIX WebKnight > 999 No Hacking

HTTP Response Body

its also possible to detect in the response body

Example:

<body> <body>
…Mod_Security… AQTRONIX WebKnight
</body> </body

dotDefender Blocked your Request

Close Connection

its useful in dropping the connection in the case the WAF detects a malicious request

  • mod_security

Detect WAF

wafw00f is a tool written in python that can detect up to 20 different WAF products

  • wafw00f - https://code.google.com/p/waffit/

The techniques used to detect a WAF are similar to those we have seen previously:

  1. Cookies
  2. Server Cloaking
  3. Response Codes
  4. Drop Action
  5. Pre-Built-in Rules
  • Nmap contains a script that tries to detect the presence of a web application fireall, its type and version. http://nmap.org/nsedoc/scripts/http-waf-fingerprint.html

→ nmap –script=http-waf-fingerprint -p 80

  • imperva-detect = https://code.google.com/p/imperva-detect/

Client-Side Filters

Browsers are the primary mean used to address client-side attacks

Browser Add-ons

NoScript Security Suite is a whitelist-based security tool that basically disables all the executable web content (Javascript, Java, Flash, Silverlight, …) and lets the user choose which sites are trusted, thus allowing the use of these technologies.

→ https://addons.mozilla.org/en-US/firefox/addon/noscript/

  • http://noscript.net/features#xss is effect browser-based solutions to prevent targeted malicious web attacks.

XSS Filter (IE)

c:\windows\system32\mshtml.dll library. Ways to inspect:

# Hex editors like WinHex. Notepad++ with TextFX plugin
# IDAPro
# MS-DOS commands

findstr /C:"sc{r}" \WINDOWS\SYSTEM32\mshtml.dll | find "{"
> savepath //u can save to a file For more readable results

Neutering in Action

Basically, once a malicious injection is detected, the XSS Filter modified the evil part of the payload by adding the ‘#’ character in place of the neuter chracter, defined in the rules.

evil > ev{i}l > ev#l
<svg/onload=alert(1)> = <svg/#nload=alert(1)>

Web sites that chose to opt-out of this protection can use the HTTP response header:

X-XSS-Protection: 0
X-XSS-Protection: 1; mode=block //instead of sanitize the page, will render a simple #
# others browsers like safari, used the same scheme
  • http://www.adambarth.com/papers/2010/bates-barth-jackson.pdf
  • enabled by default in browsers such as: chrome, opera and safari

THe filter analyzes both the inbound requests and the outbound. If, in the parsed HTML data, it finds executable code within the response, then it stops the script and generates a console alert similar to the following. The XSS Auditor refused to execute a script in …

however there is a lot of bypasses aswell