C++ style guide
Contents
Background
C++ is the main development language used by KlayGE. As every C++ programmer knows, the language has many powerful features, but this power brings with it complexity, which in turn can make code more bug-prone and harder to read and maintain.
The goal of this guide is to manage this complexity by describing in detail the dos and don'ts of writing C++ code. These rules exist to keep the code base manageable while still allowing coders to use C++ language features productively.
Style, also known as readability, is what we call the conventions that govern our C++ code. The term Style is a bit of a misnomer, since these conventions cover far more than just source file formatting.
One way in which we keep the code base manageable is by enforcing consistency. It is very important that any programmer be able to look at another's code and quickly understand it. Maintaining a uniform style and following conventions means that we can more easily use "pattern-matching" to infer what various symbols are and what invariants are true about them. Creating common, required idioms and patterns makes code much easier to understand. In some cases there might be good arguments for changing certain style rules, but we nonetheless keep things as they are in order to preserve consistency.
Another issue this guide addresses is that of C++ feature bloat. C++ is a huge language with many advanced features. In some cases we constrain, or even ban, use of certain features. We do this to keep code simple and to avoid the various common errors and problems that these features can cause. This guide lists these features and explains why their use is restricted.
Note that this guide is not a C++ tutorial: we assume that the reader is familiar with the language.
Header Files
In general, every .cpp file should have an associated .hpp file. There are some common exceptions, such as unittests and small .cpp files containing just a main() function.
Correct use of header files can make a huge difference to the readability, size and performance of your code.
The following rules will guide you through the various pitfalls of using header files.
The #define Guard
All header files should have both "#define guards" and "#pragma once" to prevent multiple inclusion and speed up compiling. The format of the symbol name should be _<FILE>_HPP. For example, the file foo.hpp should have the following guard:
#ifndef _FOO_HPP #define _FOO_HPP #pragma once ... #endif // _FOO_HPP
Header File Dependencies
Don't use an #include when a forward declaration would suffice.
When you include a header file you introduce a dependency that will cause your code to be recompiled whenever the header file changes. If your header file includes other header files, any change to those files will cause any code that includes your header to be recompiled. Therefore, we prefer to minimize includes, particularly includes of header files in other header files.
You can significantly reduce the number of header files you need to include in your own header files by using forward declarations. For example, if your header file uses the File class in ways that do not require access to the declaration of the File class, your header file can just forward declare class File; instead of having to #include "base/file.hpp". In KlayGE, all classes and structs have forward declarations in "KlayGE/PreDeclare.hpp". You can #include it in the front of your .hpp.
How can we use a class Foo in a header file without access to its definition?
- We can declare data members of type Foo* or Foo&.
- We can declare (but not define) functions with arguments, and/or return values, of type Foo. (One exception is if an argument Foo or const Foo& has a non-explicit, one-argument constructor, in which case we need the full definition to support automatic type conversion.)
- We can declare static data members of type Foo. This is because static data members are defined outside the class definition.
On the other hand, you must include the header file for Foo if your class subclasses Foo or has a data member of type Foo.
Sometimes it makes sense to have pointer (or better, scoped_ptr) members instead of object members. However, this complicates code readability and imposes a performance penalty, so avoid doing this transformation if the only purpose is to minimize includes in header files.
Of course, .cpp files typically do require the definitions of the classes they use, and usually have to include several header files.
Note: If you use a symbol Foo in your source file, you should bring in a definition for Foo yourself, either via an #include or via a forward declaration. Do not depend on the symbol being brought in transitively via headers not directly included. One exception is if Foo is used in myfile.cpp, it's ok to #include (or forward-declare) Foo in myfile.hpp, instead of myfile.cpp.
Inline Functions
Define functions inline only when they are small, say, 10 lines or less.
Definition:
You can declare functions in a way that allows the compiler to expand them inline rather than calling them through the usual function call mechanism.
Pros:
Inlining a function can generate more efficient object code, as long as the inlined function is small. Feel free to inline accessors and mutators, and other short, performance-critical functions.
Cons:
Overuse of inlining can actually make programs slower. Depending on a function's size, inlining it can cause the code size to increase or decrease. Inlining a very small accessor function will usually decrease code size while inlining a very large function can dramatically increase code size. On modern processors smaller code usually runs faster due to better use of the instruction cache.
Decision:
A decent rule of thumb is to not inline a function if it is more than 10 lines long. Beware of destructors, which are often longer than they appear because of implicit member- and base-destructor calls!
Another useful rule of thumb: it's typically not cost effective to inline functions with loops or switch statements (unless, in the common case, the loop or switch statement is never executed).
It is important to know that functions are not always inlined even if they are declared as such; for example, virtual and recursive functions are not normally inlined. Usually recursive functions should not be inline. The main reason for making a virtual function inline is to place its definition in the class, either for convenience or to document its behavior, e.g., for accessors and mutators.
Function Parameter Ordering
When defining a function, parameter order is: inputs, then outputs.
Parameters to C/C++ functions are either input to the function, output from the function, or both. Input parameters are usually values or const references, while output and input/output parameters will be non-const pointers. When ordering function parameters, put all input-only parameters before any output parameters. In particular, do not add new parameters to the end of the function just because they are new; place new input-only parameters before the output parameters.
This is not a hard-and-fast rule. Parameters that are both input and output (often classes/structs) muddy the waters, and, as always, consistency with related functions may require you to bend the rule.
Names and Order of Includes
Use standard order for readability and to avoid hidden dependencies: KlayGE's .hpp, C library, C++ library, other libraries' .hpp, your project's .hpp.
All of KlayGE's public header files are in "Include/KlayGE" directory, which is in the include path. Use it without UNIX directory shortcuts . (the current directory) or .. (the parent directory). For example, KlayGE/foo.hpp should be included as
#include <KlayGE/foo.hpp>
In dir/foo.cc or dir/foo_test.cc, whose main purpose is to implement or test the stuff in dir2/foo2.h, order your includes as follows:
- KlayGE/KlayGE.hpp
- dir2/foo2.h (preferred location — see details below).
- C system files.
- C++ system files.
- Other libraries' .hpp files.
- Your project's .hpp files.
With the preferred ordering, if dir/foo2.hpp omits any necessary includes, the build of dir/foo.cpp or dir/foo_test.cpp will break. Thus, this rule ensures that build breaks show up first for the people working on these files, not for innocent people in other packages.
dir/foo.cpp and dir2/foo2.hpp are often in the same directory (e.g. base/basictypes_test.cpp and base/basictypes.hpp), but can be in different directories too.
For example, the includes in KlayGE/src/foo.cpp might look like this:
#include <KlayGE/KlayGE.hpp> #include <KlayGE/Math.hpp> #include <KlayGE/ResLoader.hpp> #include <vector> #include <boost/shared_ptr.hpp> #include <KlayGE/foo.hpp>
Scoping
Namespaces
Unnamed namespaces in .cpp files are encouraged. With named namespaces, choose the name based on the project, and possibly its path. Do not use a using-directive.
Definition:
Namespaces subdivide the global scope into distinct, named scopes, and so are useful for preventing name collisions in the global scope.
Pros:
Namespaces provide a (hierarchical) axis of naming, in addition to the (also hierarchical) name axis provided by classes.
For example, if two different projects have a class Foo in the global scope, these symbols may collide at compile time or at runtime. If each project places their code in a namespace, project1::Foo and project2::Foo are now distinct symbols that do not collide.
Cons:
Namespaces can be confusing, because they provide an additional (hierarchical) axis of naming, in addition to the (also hierarchical) name axis provided by classes.
Use of unnamed spaces in header files can easily cause violations of the C++ One Definition Rule (ODR).
Decision:
Use namespaces according to the policy described below. Terminate namespaces with comments as shown in the given examples.
Unnamed Namespaces
- Unnamed namespaces are allowed and even encouraged in .cc files, to avoid runtime naming conflicts:
namespace { // This is in a .cpp file. // The content of a namespace is not indented enum { Unused, EOF, Error }; // Commonly used tokens. bool AtEof() { return EOF == pos_; } // Uses our namespace's EOF. } // namespace
However, file-scope declarations that are associated with a particular class may be declared in that class as types, static data members or static member functions rather than as members of an unnamed namespace.
- Do not use unnamed namespaces in .hpp files.
Named Namespaces
Named namespaces should be used as follows:
- Namespaces wrap the entire source file after includes, definitions/declarations, and forward declarations of classes from other namespaces:
// In the .hpp file namespace mynamespace { // All declarations are within the namespace scope. // Notice the lack of indentation. class MyClass { public: ... void Foo(); }; } // In the .cpp file namespace mynamespace { // Definition of functions is within scope of the namespace. void MyClass::Foo() { ... } }
The typical .cpp file might have more complex detail, including the need to reference classes in other namespaces.
#include "a.hpp" #define someflag "dummy flag" class C; // Forward declaration of class C in the global namespace. namespace a { class A; // Forward declaration of a::A. } namespace b { ...code for b... // Code goes against the left margin. }
- Do not declare anything in namespace std, not even forward declarations of standard library classes. Declaring entities in namespace std is undefined behavior, i.e., not portable. To declare entities from the standard library, include the appropriate header file.
- You may not use a using-directive to make all names from a namespace available.
// Forbidden -- This pollutes the namespace. using namespace foo;
- You may use a using-declaration anywhere in a .cpp file, and in functions, methods or classes in .hpp files.
// OK in .cpp files. // Must be in a function, method or class in .hpp files. using ::foo::bar;
- Namespace aliases are allowed anywhere in a .cpp file, anywhere inside the named namespace that wraps an entire .hpp file, and in functions and methods.
// Shorten access to some commonly used names in .cpp files. namespace fbz = ::foo::bar::baz; // Shorten access to some commonly used names (in a .hpp file). namespace librarian { // The following alias is available to all files including // this header (in namespace librarian): // alias names should therefore be chosen consistently // within a project. namespace pd_s = ::pipeline_diagnostics::sidetable; inline void my_inline_function() { // namespace alias local to a function (or method). namespace fbz = ::foo::bar::baz; ... } } // namespace librarian
Note that an alias in a .hpp file is visible to everyone #including that file, so public headers (those available outside a project) and headers transitively #included by them, should avoid defining aliases, as part of the general goal of keeping public APIs as small as possible.
Nested Classes
Although you may use public nested classes when they are part of an interface, consider a namespace to keep declarations out of the global scope.
Definition:
A class can define another class within it; this is also called a member class.
class Foo { private: // Bar is a member class, nested within Foo. class Bar { ... }; };
Pros:
This is useful when the nested (or member) class is only used by the enclosing class; making it a member puts it in the enclosing class scope rather than polluting the outer scope with the class name. Nested classes can be forward declared within the enclosing class and then defined in the .cpp file to avoid including the nested class definition in the enclosing class declaration, since the nested class definition is usually only relevant to the implementation.
Cons:
Nested classes can be forward-declared only within the definition of the enclosing class. Thus, any header file manipulating a Foo::Bar* pointer will have to include the full class declaration for Foo.
Decision:
Do not make nested classes public unless they are actually part of the interface, e.g., a class that holds a set of options for some method.
Nonmember, Static Member, and Global Functions
Prefer nonmember functions within a namespace or static member functions to global functions; use completely global functions rarely.
Pros:
Nonmember and static member functions can be useful in some situations. Putting nonmember functions in a namespace avoids polluting the global namespace.
Cons:
Nonmember and static member functions may make more sense as members of a new class, especially if they access external resources or have significant dependencies.
Decision:
Sometimes it is useful, or even necessary, to define a function not bound to a class instance. Such a function can be either a static member or a nonmember function. Nonmember functions should not depend on external variables, and should nearly always exist in a namespace. Rather than creating classes only to group static member functions which do not share static data, use namespaces instead.
Functions defined in the same compilation unit as production classes may introduce unnecessary coupling and link-time dependencies when directly called from other compilation units; static member functions are particularly susceptible to this. Consider extracting a new class, or placing the functions in a namespace possibly in a separate library.
If you must define a nonmember function and it is only needed in its .cc file, use an unnamed namespace or static linkage (eg static int Foo() {...}) to limit its scope.
Local Variables
Place a function's variables in the narrowest scope possible, and initialize variables in the declaration.
C++ allows you to declare variables anywhere in a function. We encourage you to declare them in as local a scope as possible, and as close to the first use as possible. This makes it easier for the reader to find the declaration and see what type the variable is and what it was initialized to. In particular, initialization should be used instead of declaration and assignment, e.g.
int i; i = f(); // Bad -- initialization separate from declaration.
int j = g(); // Good -- declaration has initialization.
Note that vc and gcc implements for (int i = 0; i < 10; ++ i) correctly (the scope of i is only the scope of the for loop), so you can then reuse i in another for loop in the same scope. It also correctly scopes declarations in if and while statements, e.g.
while (const char* p = strchr(str, '/')) { str = p + 1; }
There is one caveat: if the variable is an object, its constructor is invoked every time it enters scope and is created, and its destructor is invoked every time it goes out of scope.
// Inefficient implementation: for (int i = 0; i < 1000000; ++ i) { Foo f; // My ctor and dtor get called 1000000 times each. f.DoSomething(i); }
It may be more efficient to declare such a variable used in a loop outside that loop:
Foo f; // My ctor and dtor get called once each. for (int i = 0; i < 1000000; ++ i) { f.DoSomething(i); }
Static and Global Variables
Static or global variables of class type are forbidden: they cause hard-to-find bugs due to indeterminate order of construction and destruction. On some platform, (e.g. Android), a global variables with constructor crashes all the time.
Objects with static storage duration, including global variables, static variables, static class member variables, and function static variables, must be Plain Old Data (POD): only ints, chars, floats, or pointers, or arrays/structs of POD.
The order in which class constructors and initializers for static variables are called is only partially specified in C++ and can even change from build to build, which can cause bugs that are difficult to find. Therefore in addition to banning globals of class type, we do not allow static POD variables to be initialized with the result of a function, unless that function (such as getenv(), or getpid()) does not itself depend on any other globals.
Likewise, the order in which destructors are called is defined to be the reverse of the order in which the constructors were called. Since constructor order is indeterminate, so is destructor order. For example, at program-end time a static variable might have been destroyed, but code still running -- perhaps in another thread -- tries to access it and fails. Or the destructor for a static 'string' variable might be run prior to the destructor for another variable that contains a reference to that string.
As a result we only allow static variables to contain POD data. This rule completely disallows vector (use C arrays instead), or string (use const char []).
If you need a static or global variable of a class type, consider initializing a pointer (which will never be freed), from either your main() function or from pthread_once(). Note that this must be a raw pointer, not a "smart" pointer, since the smart pointer's destructor will have the order-of-destructor issue that we are trying to avoid.