Thursday, October 20, 2016

Effective Code Review

The following checklist is meant to be a general purpose guideline for code review. Code review is an opportunity to learn and teach others about best coding practices and patterns, besides of course, hunting for bugs. Take this opportunity to teach new members about coding conventions, design principles and corner cases in your code base.


Overview

  • Use a checklist
  • The code should fit in the overall architecture of the code base (a design discussion should precede the implementation, so this is a sanity check)
  • SOLID principles of software engineering are respected
  • Design paradigms followed by the team are respected (object-oriented, functional, ECS, data-oriented, etc)
  • Design patterns used are appropriate for the use case
  • Promote code reuse; DRY principle
  • Function arguments are tested for invalid input in range and type
  • Unit tests have been written (or test cases have been outlined) for passing, failing and exceptional test cases
  • If there is an older pattern and a newer pattern of coding, make sure the newer pattern is followed
  • Consider Big(O) efficiency of algorithms (avoid micro-optimization). Performance will vary among platforms (desktop, mobile, VR), the algorithm runtime should take that into account.
  • Naming conventions and formatting are respected
  • Anything that can be easily modified later or by a tool should be given the least priority during review
  • Familiarize yourself with the OWASP Top Ten security vulnerabilities

Detailed Checks

[ARRAY]
  • Null or empty check
  • Minimum length of input array is checked
  • Integer overflow for array elements (int.Min, int.Max, etc); eg, is there an operation (add/multiply) inside the function that can cause int overflow. (2 * 1.5GB = 2 * 1,610,612,736 bytes, will return a negative number in C#/Java if Int32 is used to store the bytes result)
  • Logical condition is satisfied at both ends of an array (depending on how some vars are initialized to a[0] or how a loop is exited, a logic may not get executed for endpoint elements of the array)
  • Buffer overflow, index out of bounds
"String"
  • Null, empty or whitespace
  • Single char string
  • Unicode chars (surrogate pairs)
  • Single and double quotes inside string
  • Escape chars "\n", "\t" inside string
  • If method contains Regex expression then check for Regex DoS vulnerabilities (eg: ^(\d+)+$); add a timeout in Regex constructor as a safety measure. aka ReDoS attacks
  • If using an XML parser (reachable from a service API) then make sure XML DoS vulnerabilities are taken into account (aka XML Bomb, Billion Laughs), xml entity expansion causing exponential memory usage.
(Number, DateTime)
  • Make sure culture format is taken into account for numbers and datetimes
  • DateTime should normally be UTC
  • Some time periods are invalid based on the date, even if the hr:mm:ss looks good. For eg: "April 2, 2016, 2:30 A.M." is an invalid time if daylight saving time is on for certain TimeZones, even if it 'looks' good.
  • DateTime.Now values should not be used to calculate time difference for anything. They are prone to errors (DST for eg). Always use either UtcNow or a StopWatch.
  • Always check if any calculation may cause an int to overflow the min/max values of the Int32 or Int64 range. (Lua numbers are double by default)
{Class}
  • Make sure every class overrides GetHashCode and Equals methods. (The default implementations for Hash functions may be vulnerable to DoS attacks on server-side code.
    Default GetHashCode has other issues for both classes and structs, so it's better to avoid the default implementation altogether.)
  • Classes and functions should try to live up to the SOLID principles of OOD.
    At least live up to the Single Responsibility Principle, as in, a class should have a single responsibility, and a function should have a single responsibility. This improves modularity and testability.
  • Function names should tell us exactly what it does. A function should not do more work than its name advertises. We should not have to read the code to understand what the function does.
  • Methods in a class must be cohesive.
  • Program to interfaces rather than implementation (abstract classes in C++, interfaces in C#/Java). It helps in keeping classes decoupled and flexible.
  • Prefer composition over inheritance. We use the Entity-Component-System design pattern in our Lua code-base. Familiarize yourself with it and help others learn about it.
Function()
  • Add comments with the correct syntax for auto-generation of documentation from comments. At least public functions should be commented. Eg: ADoc for Lua can generate docs from comments.
  • Watch out for type safety in dynamic languages (Lua, JavaScript). Function arguments may need to be validated for the expected type.
  • Avoid coupling functions and classes with global mutable types. In other words, minimize usage of global lists, variables and objects.
  • Prefer not returning null from a function
  • Lua supports tail call optimization for recursive function calls, take advantage of that whenever you can. It will prevent stack overflow. Neither Java nor C# has tail call optimization support.
  • Make sure you understand what #table means in Lua. It returns the length of the shortest array part of the table (if there are gaps in the table). Eg: print(#{1, 2, nil, 3, nil}) prints 2.
#Framework/Library
  • Make sure frameworks and libraries you are using don't have any known vulnerabilities.
  • Watch out for framework or library related gotchas: Java's SimpleDateFormat.parse method is not thread-safe, which is unexpected. Java's Random method is not cryptographically secure (use SecureRandom instead).
  • Turn FxCop on with custom rules for .Net static check. Note: FxCop checks the compiled code, not the source code.

~ Multi-Threading ~

  • Prefer Isolation first, Immutability second, and Synchronization last for thread safety
  • Prefer newer concurrency models such as Actor model (message passing, Java Akka, Akka.Net, TPL DataFlow) and Software Transactional Memory (JVSTM, Akka STM, STMNET, Clojure)
  • .Net Concurrent collections are only thread-safe per method call (of the collection), they don't make YOUR code thread-safe for multiple method calls in a collection.
  • Avoid locking on lock(this), lock(typeof(myClass)), lock(valueType), lock(public object), lock("string literal")
  • If you are desperate to test concurrency related code in .Net, check out Microsoft CHESS, a tool for finding and deterministically reproducing concurrency bugs (eg deadlocks). For Java use IBM's ConTest.

Security
  • Understand the flow of data in your application. Check with your team's security advocate to see if there's a Threat Model or Data Flow Diagram for the portion of code you are reviewing.
  • Data that cross privilege boundary should be checked for authorization and possibly encrypted
  • Familiarize yourself with the OWASP Top 10 vulnerabilities (such as XSS, CSRF, SQL injections, etc)
  • Install OWASP SafeNuGet and Dependency Check into your .Net and Java projects to automatically check for 3rd party package vulnerabilities
  • Any kind of user input must be sanitized to prevent injection attacks
  • Buffer overflow, index bounds, integer overflow, stack overflow Improve code readability and testability. This will reduce defects in code which are the major source of security vulnerabilities.
  • Make sure passwords, encryption keys, tokens, etc don't get checked into VCS (git). 
  • Use OWASP Regex Parser to check for Regex vulnerabilities in your code
  • Don't use String to store passwords or CC numbers. C# has SecureString, in Java use a char[].
  • Pay attention to deserialization of untrusted data, using Json.Net for eg. (possibility of arbitrary code execution)
  • Do not log session tickets, access tokens, usernames, PII, connection strings, keys, etc.



References:
Proven Practices for Peer Review - IBM
OWASP Code Review Guide
SafeNuGet - OWASP
The Unfortunate Reality of Insecure Libraries
Code Complete - Steve McConnell
Refactoring - Martin Fowler
Lua Sandboxing
Regex Vulnerabilities Cheatsheet
Black Hat - JSON deserialization attacks
Json.NET TypeNameHandling Vulnerability

Sunday, October 2, 2016

Semantic Knowledge Base with Natural Language Chatbot

The following video is an internal Tech Talk I gave based on a hackathon project I did at Autodesk in September 2016. The talk describes the project and its long-term goal.