Inexpensive protection of your source code
Snob
Simple Name Obfuscator
Tutorial
Version 1.0

MacroExpressions
http://www.macroexpressions.com
1. Installing and uninstalling Snob
4. Configuring Snob for the Bark project
4.1 Protecting files from obfuscation. Introducing APIfiles.snob
4.2 Informing Snob of a programming language: dotext.snob
4.2.2 Reusing a configuration: include= statement.
4.2.3 Using pre-packaged configurations: use= statement.
5. Running Snob and inspecting results
5.2 Snob obfuscation map: projmap.snob
6. Syntax of a language definition file
6.1 Regular expressions in dotext.snob
6.2 Statements of language-specific configuration files
6.3 Telling Snob what to obfuscate: name= statement
6.4 Telling Snob what to remove: comment= statement
6.5 Telling Snob what not to confuse with names: keyword= statement
6.6 Telling Snob where not to look for names: ignore= statement
6.6.1 Example: Adding pragma handling to dotc.snob
6.7 Telling Snob where not to look for names in any language: reserved= statements
6.8 Introducing: string= statements
7. Adding reserved words to Snob configuration
7.1 A method of preserving third-party names automatically
7.2 Preserving literal words in all languages: configuration file reserved.snob
8.1.1 Files that make it to the target directory tree
8.1.2 Files that Snob will not attempt to obfuscate
8.1.3 Names that Snob will not attempt to obfuscate
8.1.4 Snob configuration files. 15
This tutorial introduces the use and configuration of Snob, or Simple Name OBfuscator. Its objective is to replace meaningful names in your project with meaningless ones in an irreversible way and to remove comments. We introduce a toy project in C and demonstrate how to work with and configure Snob.
Snob by itself is independent of the programming language(s) used in your project. Instead, it relies on language-specific configuration files.
Correspondingly, we will show Snob usage in case the language configuration files are already available, at least in the most basic version. Then well explore the insides of the language configuration files
Snob is a standalone executable, and a rather small one by todays standards. Just copy the Snob executable file, snob.exe to some directory, such as C:\Snob (as we will assume from now on), and that completes the installation. If you have basic configuration files, copy them to the same directory. If you feel like that, you may want to add C:\Snob to your PATH environment variable.
To uninstall Snob, simply remove the Snob directory C:\Snob or whatever the name you gave to it.
Snob, we said, obfuscates names and removes comments. Snob, we said, is a tool independent of your projects programming language(s). Therefore, to do its job, Snob must be told:
· what a name is, and
· what a comment is
Given only those definitions, Snob is ready to do its job, but you will almost certainly not like the results. The reason is that Snob will replace not only your own identifiers, but also anything that looks like a name to it. However, your projects programming language may have elements lexically indistinguishable from names (such as keywords) or other language constructs (like C pragmas) that should not be touched at all. Therefore, to do its job properly, Snob must be told:
These configuration items taken together define a programming language to Snob.
Turning attention to your project, we observe that there may be
Files that you dont want obfuscated are, for instance, your API (application programmers interface) files that you deliver to your customer, or any accompanying application examples.
Other names not to be obfuscated include, for instance, the API names of a third-party library you are using in your project, or any language extensions provided by your compiler.
To summarize, Snob configuration for your project consists of
In this tutorial, we introduce a toy project, Bark, in the C programming language and follow the steps needed to obfuscate it using Snob.
Consider a toy project in the C language that we want to obfuscate because we deliver it in the source code format.
It contains a single function as its public interface, bark(), which takes a pointer to a character string and prints it to the standard output, but prefixed with BARK: and ending with three exclamation points. E.g., given Hello, world! it would print BARK: Hello, world!!!!\n.
We implemented this function in two .c files and two headers, one (bark.h) for the public interface and one (barkpriv.h) as internal public header.
Here is our implementation:
bark.c:
/* Here is an implementation of Bark */
#include <stdio.h>
#include "bark.h"
#include "barkpriv.h"
void bark(const char *str)
{
printf(BARK_PREFIX); //print prefix
bark_internal(str); //print the rest
}
barkpriv.c
/* Here is an implementation of Bark's internals */
#include <stdio.h>
#include "barkpriv.h"
unsigned int interval = 17u; //just for kicks
void bark_internal(const char *str)
{
printf("%s", str); //need "%s" in case str contains formatting
printf(BARK_SUFFIX); //print suffix
}
barkpriv.h
/* This is the private header of the terrific Bark package */
#define BARK_PREFIX "BARK: "
#define BARK_SUFFIX "!!!\n"
extern void bark_internal(const char *);
And, finally,
bark.h
/* This is a public API of the terrific Bark package */
extern void bark(const char *);
In addition we want to provide a self-test which serves also as an illustration to an application note:
barktest.c
/* This is a self-test and an example of the bark code */
/* Bark will output "BARK: ", then your string
and then three exclamation points and a newline.
*/
#include <stdio.h>
#include <string.h>
#include "bark.h"
int main()
{
char buf[200];
printf(">"); //prompt
while(NULL!=fgets(buf, sizeof(buf), stdin)) {
size_t len = strlen(buf);
if(len > 0) {
if(buf[len-1] != '\n') {
//We didn't get the whole string; try again
printf(
" (ERROR) String too long. Try again\n\n");
}
else {
buf[--len] = 0; //truncate the newline
}
}
if(len == 0) {
printf("Bye\n");
break;
}
bark(buf);
fflush(stdin); //start clean;
printf(">"); //print prompt
}
return 0;
}
This is the project that is supplied in Bark\Code directory in the distribution. You can actually build it and play with barking output.
If you have not done so already, unzip all .snob files in the distribution to C:\Snob where snob.exe is as well. The .snob files are basic-level configuration files for the C language. And yes, for now well assume that the most basic configuration for your programming language exists.
First of all, the files bark.h and barktest.c represent our API. We dont want to obfuscate them at all. The way to inform Snob about it is to list them in your projects APIfiles.snob file. So, lets create APIfiles.snob in Bark\Code:
APIfiles.snob
bark.h
barktest.c
|
Any subdirectory in the project directory tree may have a file named APIfiles.snob. Its syntax is as follows: each non-empty line is a file specification of file(s) considered your API. The filespec can contain wildcards (* and ?); in this case all matching files are considered API.
Snob treats the API filespecs as follows: · If the filespec does not contain any directory information, not even .\, this is treated as real, real API file in the same directory as the APIfiles.snob itself.
|
Now we need to produce the extension-specific configuration files.
To decide how to obfuscate a file filename.ext, Snob searches for a configuration file with a fixed name dotext.snob (so, for .cpp files Snob will search for dotcpp.snob).
|
Snobs rule of search is as follows: first look in the directory where the file, filename.ext, is located and then go up the directory tree all the way to the root of the project directory (such as Bark\Code above). The first configuration file found takes effect. (It may be said that subdirectories that do not have the configuration inherit it from their parent directory, and those which do have it override the inherited configuration, if any.) If none is found, Snob finally looks for it in its own directory (C:\Snob).
If the configuration is not found, the file, filename.ext, will not be processed into the target directory tree. For instance, we do not provide dotsnob.snob, so Snob configuration files, which have the extension .snob, are skipped.
|
First of all, since the project is in the C programming language, we need to create configuration for it. C files have, by convention, extension .c, so, by Snob convention, the name of the configuration file for it is dotc.snob.
C has something of an oddity in that it has header files which, being perfectly good C files, have, by convention, a different extension, .h. So, to create a configuration for C, we need a second configuration file, doth.snob.
Good news is that however we decide to configure dotc.snob, we should configure
doth.snob the same way, simply because
the header files have the same syntax. (Things get more complicated if we throw
C++ into the language mix:
.h files may have C or C++ syntax or both. This
matter is discussed in the manual and we skip it in the tutorial.) Instead of
copying dotc.snob to
doth.snob and thus
creating ourselves a maintenance headache, we create the following
doth.snob
Same configuration as in
dotc.snob
include=dotc.snob
The first line is a comment; any line starting with non-keyword is a comment and it is safe to start a comment with a blank, as we just did.
The second line contains include= keyword; it instructs Snob to read configuration from another file as if it were textually included. The filename to include is dotc.snob, the one that we are going to create next.
|
Snob has rules (covered in the manual) on where to search for the file to include: · If no path information is specified, it searches first in the directory where the current file (doth.snob) itself is located and all the way up to the top level of the project directory. If still not found, the file is searched in the directory where the Snob executable, snob.exe, is located. · If the path part is present and resolves to a relative path, it is considered relative to the directory where the current file is located · If the path is not relative, Snob looks for the file exactly where specified · If Snob cannot find the file, it reports an error and exits.
|
In our case, since no path information is specified, it searches first in the directory where doth.snob itself is located and all the way up to the top level of the project directory.
Since our project directory tree is quite simple it contains just the project root directory Code, we put our doth.snob right there. If we had a few subdirectories, each would inherit the configuration from the parent directory, if present. However, any directory may have its own doth.snob which would override the inherited configuration.
We are done with doth.snob; lets concentrate on the .c configuration file, dotc.snob. Well put it in the same directory Code.
We want to make use of a basic language configuration; we start with
this:
dotc.snob
use=C99base.snob
The use= statement is similar to the include= statement weve seen in doth.snob; the only difference is that Snob looks for the specified file only in its own directory (C:\Snob).
So, we included the base C configuration pre-packaged in C99base.snob. In the Snob directory, there is another similarly named file, C90base.snob, which is also a C configuration file but corresponding to the previous revision of the C standard. That revision didnt allow the //-comments we use in the Bark project, so we need the new standard. (Your compiler may be C90 and allow //-comments as a language extension. Snob knows none of this.)
The Bark project uses a few C standard library calls and macros; their names must not be obfuscated. (The same, by the way, applies to the standard typedefs, struct, union and enum tags and members.) As to the definitions of them, the Snob directory contains two files to choose from: Crsvnormal.snob and CParanoia.snob. The first file contains commonly used reserved words; the second one reserves anything claimed in the standard however mildly; for instance, it reserves names beginning with an underscore (_). Usually, youll be OK with the first file, which we add now to our dotc.snob:
use= Crsvnormal.snob
We are ready to obfuscate our Bark project.
Snob is a command-line
utility that takes two arguments: your project directory and the name of the
directory which will contain the obfuscated version of your project, e.g.,
snob MyProject
MyObfuscatedProject
The target directory must not exist yet and its location must be writeable.
Snob will create the target directory and clone the directory tree of the project directory (with no files in the target tree yet). Then it will look at each file in the project tree, such as and examine its extension, .ext in this case.
The configuration files (doth.snob, dotc.snob and APIfiles.snob) are supplied in Bark\Step1 directory. If you didnt work along, simply copy those files to Bark\Code. Now change to Bark directory and issue the following command:
C:\Snob\snob Code Obf1
Here is the Snob output in all its glory:
Looking for configuration files under "Code"
Entering directory "Code"
Searching configuration for the extension .snob
No configuration file dotsnob.snob found
Searching configuration for the extension .c
Found configuration dotc.snob in Code
Searching configuration for the extension .h
Found configuration doth.snob in Code
Leaving directory "Code"
Done looking for configuration files
Looking for API specs under "Code"
Looking for configuration files under "Code"
Entering directory "Code"
Marking bark.h copy-only
Marking barktest.c copy-only
Leaving directory "Code"
Done looking for API specs
Processing project "Code" to "Obf1"
Entering directory "Code"
APIfiles.snob -- skipping
bark.c --> Obf1\bark.c (process)
bark.h --> Obf1\bark.h (copy)
barkpriv.c --> Obf1\barkpriv.c (process)
barkpriv.h --> Obf1\barkpriv.h (process)
barktest.c --> Obf1\barktest.c (copy)
dotc.snob -- skipping
doth.snob -- skipping
Leaving directory "Code"
Writing obfuscation map to Obf1\projmap.snob
End processing the project
Finished
We can see that Snob created the directory structure under Obf1 identical to that of Code. Lets take a look at non-API files it created:
bark.c
#include <stdio.h>
#include "bark.h"
#include "barkpriv.h"
void bark(const char *C0000000B)
{
printf(C0000000C);
C0000000D(C0000000B);
}
barkpriv.c
#include <stdio.h>
#include "barkpriv.h"
unsigned int C0000000E = 17u;
void C0000000D(const char *C0000000B)
{
printf("%s", C0000000B);
printf(C0000000F);
}
barkpriv.h
#define C0000000C "BARK: "
#define C0000000F "!!!\n"
extern void C0000000D(const char *);
It seems quite clear that it is not for human eyes.
It may be interesting to look at the obfuscation map which Snob saves in the file projmap.snob in the target directory:
projmap.snob
#Snob Substitution Table for the project "Code"
C0000000B : str
C0000000C : BARK_PREFIX
C0000000D : bark_internal
C0000000E : interval
C0000000F : BARK_SUFFIX
The first line is the title; following it is a list of pairs a name that Snob invented vs. a name that was replaced by the invented name.
To this point, we carefully avoided a question of the format of the files Crsvnormal.snob and C99base.snob that we used. Since those files were only good for inclusion in dotc.snob, we are actually going to talk about the syntax of dotext.snob files in general.
Lets begin with a motivational example. In C (and C++) there is a
rather interesting feature, although rarely used: the stringize operator.
Consider a macro definition
#define
MYSTRING(x) #x
If you write then
char *p = MYSTRING(Because I
can);
the macro expands to a double-quoted string after
pre-processing, but Snob has no way of knowing that. So, Snob will treat the
words Because, I, can as names and replace
them with invented names, which is not what we want.
Clearly, Snob must be told to treat text in parentheses preceded by the word MYSTRING as a string, so as to not look for names there.
Unfortunately, this cannot be done in the common pre-packaged language configuration file like C99base.snob. Thats because the name of the macro was invented within your project, and cannot be known in advance. So, we need to do this ourselves.
To re-iterate, we want to designate as a string the following definition: text in parentheses preceded by the word MYSTRING. This is a rather complex idea and Snob needs a rather expressive means to express such ideas.
In language-specific configuration files, Snob uses regular expressions to express complex configuration rules.
Regular expressions are, in a way, text search patterns. By telling Snob that a string is such and such regular expression, we mean (and Snob understands) that a segment of text matching a search by the regular expression is considered a string.
(There are several flavors of regular expressions; Snob uses PCRE, Perl-compatible regular expressions. The PCRE library, which is open source software, is written by Philip Hazel, and copyright the University of Cambridge, England. See
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/
A reference of Snob regular expressions, which is an adaptation of PCRE reference, is provided in the distribution and also online on the Snob pages of MacroExpressions website.)
A suitable regular expression for our verbal definition of a string
is
\bMYSTRING\(.*?\)
(In regular expressions, parentheses are used for grouping, just like in math expressions. To give them their literal meaning, they must be escaped with a backslash, as we have in the expression above. The dot means match any character and the *? means repeated any number of times ungreedy so that the first, rather than the very last right parenthesis will end the search. The starting \b requires that the match begins on the word boundary.)
Its worth noting that the word MYSTRING itself will not be obfuscated because it is a part of the search pattern. If we want to obfuscate MYSTRING, we need to exclude it from the search pattern. To do so, we can say that a string is any parenthesized text if it follows the word MYSTRING. The following regular expression corresponds to this improved definition:
(?<=\bMYSTRING)\(.*?\)
So, we have a perfectly good regular expression. Now, we need to tell Snob, in our dotc.snob file, that it defines a string. This is done via string= statement which will be covered in its turn.
It is time to discuss general syntax of language-specific configuration files. Recall that for extension .ext the name of the configuration file is dotext.snob and that the configuration file acts on the directory it is located in and down the directory tree until overridden in some subdirectory by a file with the same name.
A dotext.snob file consists of statements and comments. A comment is anything that is not a statement. Since all statements start from the first position, it is safe to start comment lines with a blank space, as we were doing all along in the example.
The following statements are recognized by Snob:
Here <regexp> is a Perl-style (or, more precisely, PCRE-style) regular expression and <filespec> is a filename with optional path component. Notice that there are no spaces around the = sign.
We already encountered