Inexpensive protection of your source code

 

 

Snob

 

Simple Name Obfuscator

 

 

Tutorial

 

 

Version 1.0

 

 

 

 

 

 

MacroExpressions

http://www.macroexpressions.com

 

 


Table of Contents

 

 

0.       Structure of Snob tutorial.. 1

1.       Installing and uninstalling Snob.. 1

2.       What exactly Snob does. 2

3.       The test project.. 2

4.       Configuring Snob for the Bark project.. 4

4.1      Protecting files from obfuscation. Introducing APIfiles.snob.. 4

4.2      Informing Snob of a programming language: dotext.snob.. 4

4.2.1     Comments in dotext.snob. 5

4.2.2     Reusing a configuration: include= statement.5

4.2.3     Using pre-packaged configurations: use= statement.5

5.       Running Snob and inspecting results. 5

5.1      How to run Snob.. 5

5.2      Snob obfuscation map: projmap.snob.. 7

6.       Syntax of a language definition file.. 7

6.1      Regular expressions in dotext.snob.. 7

6.2      Statements of language-specific configuration files. 8

6.3      Telling Snob what to obfuscate: name= statement. 8

6.4      Telling Snob what to remove: comment= statement. 9

6.5      Telling Snob what not to confuse with names: keyword= statement  9

6.6      Telling Snob where not to look for names: ignore= statement. 9

6.6.1     Example: Adding pragma handling to dotc.snob. 9

6.7      Telling Snob where not to look for names in any language: reserved= statements  11

6.8      Introducing: string= statements. 12

7.       Adding reserved words to Snob configuration.. 12

7.1      A method of preserving third-party names automatically.. 12

7.2      Preserving literal words in all languages: configuration file reserved.snob   13

8.       Summary.. 14

8.1.1     Files that make it to the target directory tree. 14

8.1.2     Files that Snob will not attempt to obfuscate. 14

8.1.3     Names that Snob will not attempt to obfuscate. 15

8.1.4     Snob configuration files. 15

9.       Conclusion.. 15

 

0.    Structure of Snob tutorial

This tutorial introduces the use and configuration of Snob, or Simple Name OBfuscator. Its objective is to replace meaningful names in your project with meaningless ones in an irreversible way and to remove comments. We introduce a toy project in C and demonstrate how to work with and configure Snob.

 

Snob by itself is independent of the programming language(s) used in your project. Instead, it relies on language-specific configuration files.

 

Correspondingly, we will show Snob usage in case the language configuration files are already available, at least in the most basic version. Then we’ll explore the insides of the language configuration files

1.    Installing and uninstalling Snob

Snob is a standalone executable, and a rather small one by today’s standards. Just copy the Snob executable file, snob.exe to some directory, such as C:\Snob (as we will assume from now on), and that completes the installation. If you have basic configuration files, copy them to the same directory. If you feel like that, you may want to add C:\Snob to your PATH environment variable.

 

To uninstall Snob, simply remove the Snob directory C:\Snob or whatever the name you gave to it.

2.    What exactly Snob does

Snob, we said, obfuscates names and removes comments. Snob, we said, is a tool independent of your project’s programming language(s). Therefore, to do its job, Snob must be told:

·         what a name is, and

·         what a comment is

 

Given only those definitions, Snob is ready to do its job, but you will almost certainly not like the results. The reason is that Snob will replace not only your own identifiers, but also anything that looks like a name to it. However, your project’s programming language may have elements lexically indistinguishable from names (such as keywords) or other language constructs (like C pragmas) that should not be touched at all. Therefore, to do its job properly, Snob must be told:

 

These configuration items taken together define a programming language to Snob.

 

Turning attention to your project, we observe that there may be

 

Files that you don’t want obfuscated are, for instance, your API (application programmer’s interface) files that you deliver to your customer, or any accompanying application examples.

Other names not to be obfuscated include, for instance, the API names of a third-party library you are using in your project, or any language extensions provided by your compiler.

 

To summarize, Snob configuration for your project consists of

 

In this tutorial, we introduce a toy project, Bark, in the C programming language and follow the steps needed to obfuscate it using Snob.

 

3.    The test project

Consider a toy project in the C language that we want to obfuscate because we deliver it in the source code format.

 

It contains a single function as its public interface, bark(), which takes a pointer to a character string and prints it to the standard output, but prefixed with “BARK: ” and ending with three exclamation points. E.g., given “Hello, world!” it would print “BARK: Hello, world!!!!\n”.

 

We implemented this function in two .c files and two headers, one (bark.h) for the public interface and one (barkpriv.h) as internal public header.

 

Here is our implementation:

 

bark.c:

 

/* Here is an implementation of Bark */

 

#include <stdio.h>

#include "bark.h"

#include "barkpriv.h"

 

void bark(const char *str)

{

    printf(BARK_PREFIX); //print prefix

    bark_internal(str);  //print the rest

}

 

 

barkpriv.c

 

/* Here is an implementation of Bark's internals */

 

#include <stdio.h>

#include "barkpriv.h"

 

unsigned int interval = 17u; //just for kicks

 

void bark_internal(const char *str)

{

    printf("%s", str); //need "%s" in case str contains formatting

    printf(BARK_SUFFIX); //print suffix

}

 

barkpriv.h

 

/* This is the private header of the terrific Bark package */

 

#define BARK_PREFIX "BARK: "

#define BARK_SUFFIX "!!!\n"

 

extern void bark_internal(const char *);

 

And, finally,

bark.h

/* This is a public API of the terrific Bark package */

 

extern void bark(const char *);

 

In addition we want to provide a self-test which serves also as an illustration to an application note:

 

barktest.c

 

/* This is a self-test and an example of the bark code */

/* Bark will output "BARK: ",  then your string

   and then three exclamation points and a newline.

*/

#include <stdio.h>

#include <string.h>

#include "bark.h"

 

int main()

{

    char buf[200];

    printf(">"); //prompt

    while(NULL!=fgets(buf, sizeof(buf), stdin)) {

        size_t len = strlen(buf);

        if(len > 0) {

            if(buf[len-1] != '\n') {

                //We didn't get the whole string; try again

                printf(

                   "      (ERROR) String too long. Try again\n\n");

            }

            else {

                buf[--len] = 0; //truncate the newline

            }

        }

        if(len == 0) {

            printf("Bye\n");

            break;

        }

        bark(buf);

        fflush(stdin); //start clean;

        printf(">"); //print prompt

    }

    return 0;

}

 

This is the project that is supplied in Bark\Code directory in the distribution. You can actually build it and play with barking output.

4.    Configuring Snob for the Bark project

If you have not done so already, unzip all .snob files in the distribution to C:\Snob where snob.exe is as well. The .snob files are basic-level configuration files for the C language. And yes, for now we’ll assume that the most basic configuration for your programming language exists.

 

4.1   Protecting files from obfuscation. Introducing APIfiles.snob

First of all, the files bark.h and barktest.c represent our API. We don’t want to obfuscate them at all. The way to inform Snob about it is to list them in your project’s APIfiles.snob file. So, let’s create APIfiles.snob in Bark\Code:

 

APIfiles.snob

bark.h

barktest.c

 

 

Any subdirectory in the project directory tree may have a file named APIfiles.snob. Its syntax is as follows: each non-empty line is a file specification of file(s) considered your API. The filespec can contain wildcards (‘*’ and ‘?’); in this case all matching files are considered API.

 

Snob treats the API filespecs as follows:

·       If the filespec does not contain any directory information, not even ‘.\’, this is treated as real, real API file in the same directory as the APIfiles.snob itself.

  • If the filespec contains wildcards, Snob would register any match as API but would not complain if it didn’t find any.
  • If the filespec does not contain wildcards, i.e., it is just a filename, Snob would exit with an error information if the file didn’t exist and Snob knows configuration for its extension.
  • If Snob doesn’t know the extension configuration, it would ignore the filespec match.
  • If the filespec does contain any directory information whatsoever, Snob would learn names from any filespec match with known extension configuration and mark them as preserved (i.e., not subject to obfuscation). The net effect of this is that the filespec matches with known extensions and found in the project directory tree would be stripped off any comments but all names in them would be preserved.

 

 

4.2   Informing Snob of a programming language: dotext.snob

Now we need to produce the extension-specific configuration files.

 

To decide how to obfuscate a file filename.ext, Snob searches for a configuration file with a fixed name dotext.snob (so, for .cpp files Snob will search for dotcpp.snob).

 

 

Snob’s rule of search is as follows: first look in the directory where the file, filename.ext, is located and then go up the directory tree all the way to the root of the project directory (such as Bark\Code above). The first configuration file found takes effect. (It may be said that subdirectories that do not have the configuration inherit it from their parent directory, and those which do have it override the inherited configuration, if any.) If none is found, Snob finally looks for it in its own directory (C:\Snob).

 

If the configuration is not found, the file, filename.ext, will not be processed into the target directory tree. For instance, we do not provide dotsnob.snob, so Snob configuration files, which have the extension .snob, are skipped.

 

 

 

First of all, since the project is in the C programming language, we need to create configuration for it. C files have, by convention, extension .c, so, by Snob convention, the name of the configuration file for it is dotc.snob.

 

C has something of an oddity in that it has “header files” which, being perfectly good C files, have, by convention, a different extension, .h. So, to create a configuration for C, we need a second configuration file, doth.snob.

 

4.2.1       Comments in dotext.snob

4.2.2       Reusing a configuration: include= statement.

Good news is that however we decide to configure dotc.snob, we should configure doth.snob the same way, simply because the header files have the same syntax. (Things get more complicated if we throw C++ into the language mix: .h files may have C or C++ syntax – or both. This matter is discussed in the manual and we skip it in the tutorial.) Instead of copying dotc.snob to doth.snob and thus creating ourselves a maintenance headache, we create the following

doth.snob
 Same configuration as in dotc.snob
include=dotc.snob

 

The first line is a comment; any line starting with non-keyword is a comment and it is safe to start a comment with a blank, as we just did.

 

The second line contains include= keyword; it instructs Snob to read configuration from another file as if it were textually included. The filename to include is dotc.snob, the one that we are going to create next.

 

 

Snob has rules (covered in the manual) on where to search for the file to include:

·         If no path information is specified, it searches first in the directory where the current file (doth.snob) itself is located and all the way up to the top level of the project directory. If still not found, the file is searched in the directory where the Snob executable, snob.exe, is located.

·         If the path part is present and resolves to a relative path, it is considered relative to the directory where the current file is located

·         If the path is not relative, Snob looks for the file exactly where specified

·         If Snob cannot find the file, it reports an error and exits.

 

 

In our case, since no path information is specified, it searches first in the directory where doth.snob itself is located and all the way up to the top level of the project directory.

 

Since our project directory tree is quite simple – it contains just the project root directory Code, we put our doth.snob right there. If we had a few subdirectories, each would inherit the configuration from the parent directory, if present. However, any directory may have its own doth.snob which would override the inherited configuration.

 

We are done with doth.snob; let’s concentrate on the .c configuration file, dotc.snob. We’ll put it in the same directory Code.

 

4.2.3       Using pre-packaged configurations: use= statement.

We want to make use of a basic language configuration; we start with this:
dotc.snob

use=C99base.snob

 

The use= statement is similar to the include= statement we’ve seen in doth.snob; the only difference is that Snob looks for the specified file only in its own directory (C:\Snob).

So, we included the base C configuration pre-packaged in C99base.snob. In the Snob directory, there is another similarly named file, C90base.snob, which is also a C configuration file but corresponding to the previous revision of the C standard. That revision didn’t allow the //-comments we use in the Bark project, so we need the new standard. (Your compiler may be C90 and allow //-comments as a language extension. Snob knows none of this.)

 

The Bark project uses a few C standard library calls and macros; their names must not be obfuscated. (The same, by the way, applies to the standard typedefs, struct, union and enum tags and members.) As to the definitions of them, the Snob directory contains two files to choose from: Crsvnormal.snob and CParanoia.snob. The first file contains commonly used reserved words; the second one reserves anything claimed in the standard however mildly; for instance, it reserves names beginning with an underscore (_). Usually, you’ll be OK with the first file, which we add now to our dotc.snob:

 

use= Crsvnormal.snob

 

We are ready to obfuscate our Bark project.

 

5.    Running Snob and inspecting results

5.1   How to run Snob

Snob is a command-line utility that takes two arguments: your project directory and the name of the directory which will contain the obfuscated version of your project, e.g.,
snob MyProject MyObfuscatedProject

The target directory must not exist yet and its location must be writeable.

 

Snob will create the target directory and clone the directory tree of the project directory (with no files in the target tree yet). Then it will look at each file in the project tree, such as and examine its extension, .ext in this case.

 

The configuration files (doth.snob, dotc.snob and APIfiles.snob) are supplied in Bark\Step1 directory. If you didn’t work along, simply copy those files to Bark\Code. Now change to Bark directory and issue the following command:

C:\Snob\snob Code Obf1

 

Here is the Snob output in all its glory:

 

Looking for configuration files under "Code"

Entering directory "Code"

Searching configuration for the extension .snob

 No configuration file dotsnob.snob found

Searching configuration for the extension .c

 Found configuration dotc.snob in Code

Searching configuration for the extension .h

 Found configuration doth.snob in Code

Leaving directory "Code"

Done looking for configuration files

Looking for API specs under "Code"

Looking for configuration files under "Code"

Entering directory "Code"

 Marking bark.h copy-only

 Marking barktest.c copy-only

Leaving directory "Code"

Done looking for API specs

Processing project "Code" to "Obf1"

Entering directory "Code"

    APIfiles.snob -- skipping

    bark.c --> Obf1\bark.c (process)

    bark.h --> Obf1\bark.h (copy)

    barkpriv.c --> Obf1\barkpriv.c (process)

    barkpriv.h --> Obf1\barkpriv.h (process)

    barktest.c --> Obf1\barktest.c (copy)

    dotc.snob -- skipping

    doth.snob -- skipping

Leaving directory "Code"

Writing obfuscation map to Obf1\projmap.snob

End processing the project

Finished

 

We can see that Snob created the directory structure under Obf1 identical to that of Code. Let’s take a look at non-API files it created:

 

bark.c

 

 

#include <stdio.h>

#include "bark.h"

#include "barkpriv.h"

 

void bark(const char *C0000000B)

{

    printf(C0000000C);

    C0000000D(C0000000B); 

}

 

barkpriv.c

 

 

#include <stdio.h>

#include "barkpriv.h"

 

unsigned int C0000000E = 17u;

 

void C0000000D(const char *C0000000B)

{

    printf("%s", C0000000B);

    printf(C0000000F);

}

 

barkpriv.h

 

 

#define C0000000C "BARK: "

#define C0000000F "!!!\n"

 

extern void C0000000D(const char *);

 

It seems quite clear that it is not for human eyes.

 

5.2   Snob obfuscation map: projmap.snob

It may be interesting to look at the obfuscation map which Snob saves in the file projmap.snob in the target directory:

 

projmap.snob

#Snob Substitution Table for the project "Code"

C0000000B : str

C0000000C : BARK_PREFIX

C0000000D : bark_internal

C0000000E : interval

C0000000F : BARK_SUFFIX

 

The first line is the title; following it is a list of pairs – a name that Snob invented vs. a name that was replaced by the invented name.

6.    Syntax of a language definition file

To this point, we carefully avoided a question of the format of the files Crsvnormal.snob and C99base.snob that we used. Since those files were only good for inclusion in dotc.snob, we are actually going to talk about the syntax of dotext.snob files in general.

 

6.1   Regular expressions in dotext.snob

Let’s begin with a motivational example. In C (and C++) there is a rather interesting feature, although rarely used: the stringize operator. Consider a macro definition

#define MYSTRING(x) #x

 

If you write then

char *p = MYSTRING(Because I can);

the macro expands to a double-quoted string after pre-processing, but Snob has no way of knowing that. So, Snob will treat the words “Because,” “I,” “can” as names and replace them with invented names, which is not what we want.

 

Clearly, Snob must be told to treat text in parentheses preceded by the word MYSTRING as a string, so as to not look for names there.

 

Unfortunately, this cannot be done in the common pre-packaged language configuration file like C99base.snob. That’s because the name of the macro was invented within your project, and cannot be known in advance. So, we need to do this ourselves.

 

To re-iterate, we want to designate as a string the following definition: “text in parentheses preceded by the word MYSTRING.” This is a rather complex idea and Snob needs a rather expressive means to express such ideas.

 

In language-specific configuration files, Snob uses regular expressions to express complex configuration rules.

 

Regular expressions are, in a way, text search patterns. By telling Snob that a string is such and such regular expression, we mean (and Snob understands) that a segment of text matching a search by the regular expression is considered a string.

 

 (There are several flavors of regular expressions; Snob uses PCRE, Perl-compatible regular expressions. The PCRE library, which is open source software, is written by Philip Hazel, and copyright the University of Cambridge, England. See

     ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/

A reference of Snob regular expressions, which is an adaptation of PCRE reference, is provided in the distribution and also online on the Snob pages of MacroExpressions website.)

 

A suitable regular expression for our verbal definition of a string is
\bMYSTRING\(.*?\)

 

(In regular expressions, parentheses are used for grouping, just like in math expressions. To give them their literal meaning, they must be “escaped” with a backslash, as we have in the expression above. The dot means “match any character” and the “*?” means repeated any number of times “ungreedy” – so that the first, rather than the very last right parenthesis will end the search. The starting \b requires that the match begins on the word boundary.)

 

It’s worth noting that the word MYSTRING itself will not be obfuscated because it is a part of the search pattern. If we want to obfuscate MYSTRING, we need to exclude it from the search pattern. To do so, we can say that a string is any parenthesized text if it follows the word MYSTRING. The following regular expression corresponds to this improved definition:

 

(?<=\bMYSTRING)\(.*?\)

 

So, we have a perfectly good regular expression. Now, we need to tell Snob, in our dotc.snob file, that it defines a string. This is done via string= statement which will be covered in its turn.

 

6.2   Statements of language-specific configuration files

It is time to discuss general syntax of language-specific configuration files. Recall that for extension .ext the name of the configuration file is dotext.snob and that the configuration file acts on the directory it is located in and down the directory tree until overridden in some subdirectory by a file with the same name.

 

A dotext.snob file consists of statements and comments. A comment is anything that is not a statement. Since all statements start from the first position, it is safe to start comment lines with a blank space, as we were doing all along in the example.

 

The following statements are recognized by Snob:

 

Here <regexp> is a Perl-style (or, more precisely, PCRE-style) regular expression and <filespec> is a filename with optional path component. Notice that there are no spaces around the = sign.

 

We already encountered