top of page

How to write YARA rules for improving your security and malware detection?

Use an empty template to start

YARA rules are text files, which follow a very basic, yet powerful, syntax.

YARA rules always contain three parts:

  • The meta part: This part contains general or specific information that is not processed but serves the user to understand what it is about.

  • The strings part: This part contains all the strings that need to be searched for in files.

  • The condition part: This part defines the condition for matching. It can be just matching one or several strings, but it can also be more complex as we will see later in this article.

From my experience, it is strongly advised to create an empty template that you will always use to start writing a new rule. This way, you just need to fill a few variable contents and add the desired conditions.

rule samplerule
author="Cedric Pernet"
reference="any useful reference"

Using this template, you can quickly edit the metadata and the rule name (in our example it is named samplerule). The metadata can be just anything the user wants to put there. As for me, I always use a version number, a date, a reference which could be a malware hash, or a blog report that mentions what I want to detect, and an author field.

Now that the metadata is written, let's start writing out the first rule.

A first rule

YARA rules are a combination of strings elements and conditions. The strings can be text strings, hexadecimal strings or regular expressions.

The conditions are boolean expressions, just like in other programming languages. The most known are AND, OR, NOT. Relational, arithmetic and bitwise operators can also be used.

The search for qualified candidates for an open data scientist position with the right education and the right experience will take time and require a comprehensive recruitment plan. This kit includes a job description, sample interview questions, and ...

Research provided by TechRepublic Premium

Here is a first rule:

rule netcat_detection
author="Cedric Pernet"
reference="netcat is a free tool available freely online"
$str1="gethostpoop fuxored" // this is very specific to the netcat tool
$str2="nc -l -p port [options]"
$str1 or $str2

So let us explain this rule titled netcat_detection.

After our usual metadata, the strings division contains two variables, str1 and str2, which of course might be named any way we like. Also, to illustrate how to add comments, the first variable contains one comment at the end of it.

The condition part contains the following condition: It must match either str1 or str2.

This could have been written in a more comfortable way:

any of ($str*)

This can be useful if we have a lot of different variables and we want to just match on any of it.

Running the first rule

Let's now run our rule, which we saved as a file named rule1.yar. We want to run it against a folder containing several different files, two of them being the 32- and 64-bits versions of the netcat software (Figure A). Our system is for testing is a Ubuntu Linux distribution, but it does not matter as Yara can be installed easily on Linux, Mac or Windows operating systems.

Figure A

Running a YARA rule on a folder to detect a particular software.

As expected, YARA runs and returns the names of all files matching the rule.

Of course, one can put as many YARA rules as wanted in a single file, which makes it more comfortable than having a lot of different rule files.

Running YARA with -s option shows the exact strings which have matched those files (Figure B):

Figure B

Running YARA with -s option to show matching strings.

On a side note, finding tools like netcat somewhere in your corporate network might indeed be worth investigating: That basic tool should not be found on the average user computer, since it allows computers to connect and exchange data on specific ports and might be used by attackers. It might also, of course, be used by IT people or red team staff, hence the investigation to determine why it was found on a machine from the corporate network.

More complex strings

Matching a basic string can be enough for finding files within systems. Yet strings might be encoded differently on different systems or might have been slightly triggered by attackers. One slight change, for example, can be to change the case of strings using random upper and lower case. Luckily enough, YARA can handle this easily.

In the following YARA strings part, a string will match no matter what case it uses:

$str1="thisisit" nocase

The condition $str1 will now match with any case used: "ThisIsIt", "THISISIT", "thisisit","ThIsIsiT", etc.

If strings are encoded using two bytes per character, the "wide" modifier can be used, and can of course be combined with another one:

$str1="thisisit" nocase wide

To search for strings on both the ASCII and wide form, the modifier "ascii" can be used in conjunction with wide.

$str1="thisisit" ascii wide

Hexadecimal strings

Hexadecimal strings can be used easily:

$str1={ 75 72 65 6C 6E 20 }
$str2={ 75 72 65 6C ?? 20 }
$str3={ 75 72 [2-4] 65 6C }

Here are three different hexadecimal variables. The first one searches for an exact sequence on hexadecimal strings. The second one uses a wildcard expressed with two ? characters and will search strings with just any hexadecimal value where the ?? stands.

The third string searches for the two first bytes, then a jump of two to four characters, then the two last bytes. This is very handy when some sequences vary in different files but show a predictable number of random bytes between two known ones.

Regular expressions

Regular expressions, just like in any programming language, are very useful to detect particular content that can be written in different ways. In YARA, they are defined by using a string that starts and ends with the slash (/) character.

Let's take an example that makes sense.

In a malware binary, the developer left debug information, in particular the famous PDB string.

It reads:


Now the idea would be not to only create a rule that would match this malware, but all the different versions of it in case the version number changes. Also, we decided to exclude the "D" drive from the rule, since the developer could also have it on another drive.

We come up with regular expression (Figure C):

Figure C

A rule to match all versions of a malware, based on its PDB string, and the results.

For demonstration purposes, we built a file named newmalwareversion.exe which contains three different PDB strings, each with a different version number. Our rule matches them all.

Please note that the \ characters from our strings have been doubled, because \ is a special character which needs to be escaped, like in C language.

More complex conditions

Conditions can be smarter than just matching a single or several strings. You can use conditions to count strings, to specify an offset at which you want to find a string, to match a file size or even use loops.

Here are a few examples which I commented for explanation:

2 of ($str*) 
// will match on 2 of several strings named str followed by a number

($str1 or $str2) and ($text1 or $text2) 
// example of Boolean operators

#a == 4 and #b > 6 
// string a needs to be found exactly four times and string b needs to be found strictly more than six times

$str at 100 
// string str needs to be located within the file at offset 100

$str in (500..filesize) 
// string str needs to be located between offset 500 and end of file.

filesize > 500KB 
// Only files which are more than 500KB big will be considered


This article shows the most basic capabilities of YARA. We could not document everything, of course, since it is really a kind of programming language. The possibilities offered by YARA for matching files are quite endless. The more the analyst gets comfortable with YARA, the more he or she will get the feel for it and improve their skills to write more efficient rules.

Source: techreplublic

The Tech Platform



bottom of page