top of page

What is YARA (Yet Another Recursive/Ridiculous Acronym) ? and What are YARA Rules?

YARA (Yet Another Recursive/Ridiculous Acronym)

YARA is a tool aimed at (but not limited to) helping malware researchers to identify and classify malware samples. With YARA you can create descriptions of malware families (or whatever you want to describe) based on textual or binary patterns. Each description (rule), consists of a set of strings and a Boolean expression which determine its logic.


YARA Rules

YARA rules are like a piece of programming language, they work by defining a number of variables that contain patterns found in a sample of malware. If some or all of the conditions are met, depending on the rule, then it can be used to successfully identify a piece of malware.


When analyzing a piece of malware researchers will identify unique patterns and strings within the malware that allows them to identify which threat group and malware family the sample is attributed to. By creating a YARA rule from several samples from the same malware family, it is possible to identify multiple samples all associated with perhaps the same campaign or threat actor.


When investigating a piece of malware an analyst may create a YARA rule for a new sample they are investigating. This rule could then be used to search their own private malware database or online repositories such VirusTotal for similar samples.


If the malware analyst works for an organization that deploys an IPS or another YARA-supported platform that is used for malware protection, then YARA rules can be used as an incident response tool to detect malicious binaries within the organization.


Use Cases

YARA has proven to be extremely popular within the infosec community, the reason being is there are a number of use cases for implementing YARA:

  • Identify and classify malware

  • Find new samples based on family-specific patterns

  • Incident Responders can deploy YARA rules to identify samples and compromised devices

  • Proactive deployment of custom YARA rules can increase an organization’s defenses


YARA Elements

In order to build a useful YARA rule, you will need to know the various elements that can be used to build your own custom YARA rule.

1. Metadata

Metadata doesn’t affect what the YARA rule will search for, instead, it provides useful information about the rule itself.

  • Author – Name, email address, Twitter handle.

  • Date – Date rule was created.

  • Version – The version number of the YARA rule for tracking amendments.

  • Reference – A link to an article or download of the sample, this is used to provide relevant information on the malware sample the rule is designed to detect.

  • Description – A brief overview of the rule’s purpose and malware it aims to detect.

  • Hash – A list of sample hashes that were used to create the YARA rule.


2. Strings

It is common to find unique and interesting strings within a malware sample, these are ideal for building out a YARA rule. To define a string within a rule, the string itself needs to be declared as a variable.

  • $a=”string from malware sample”

In addition to declaring a string, we can also append modifiers after the declared string to fine-tune the search.

  • $a=”malwarestring” fullword – This modifier will match against an exact word. For example ‘www.malwarestring.com’ would return a match, but ‘www.abcmalwarestring.com’ would not.

  • $a=”malwarestring” wide – This would match unicode strings which are separated by null bytes, for example ‘w.w.w…m.a.l.w.a.r.e.s.t.r.i.n.g…c.o.m.’

  • $a=”malwarestring” wide ascii – This will allow the rule to match on unicode and ascii characters.

  • $a=”MalwareString” nocase – The rule will match the string regardless of case.

In the image below I have used HxD, a hex editor, here we can see some strings within the tool.



I have highlighted the ASCII string ‘\photo.png’ and the corresponding hexadecimal representation is also highlighted. Using this information you can declare a hex string within a YARA rule.

  • $a={5C 70 68 6F 74 6F 2E 70 6E 67} – Note the use of curly brackets instead of speech quotations.

  • $a={5C 70 68 6F ?? ?F 2E 70 6E 67} – Question marks can be used as wildcards if you have detected a slight variation of a hex pattern within multiple samples.

  • $a={5C [2-10] 6F 74 6F 2E 70 6E 67} – In this example, I have stated that the string may start with the value ‘5C’ but there may be 2 – 10 random bytes before the matching pattern begins again.

  • $a={5C (01 02 | 03 04) 6F 2E 70 6E 67} – In this example i have stated that the hex values in this location could be ‘01 02’ or ‘03 04’.

Some strings and unique identifiers that are great for YARA rules:

  • Mutexes – Unique to malware families, these are used by malware to check if a device has already been compromised by checking for the presence of the mutex.

  • Rare and unusual user agents – Identified when malware communicates with its C2 infrastructure.

  • Registry keys – Often created by malware as a persistence mechanism.

  • PDB paths – This stands for Program Database, a PDB contains debugging information about a file. It is very unlikely you will have PDB for a piece of malware but the PDB path can often be found and used in a YARA rule i.e. c:\users\user\desktop\vc++ 6\6.2.20\scrollerctrl_demo\scrollertest\release\scrollertest.pdb.

  • Encrypted config strings – Malware will often encrypt its config which contains useful IOCs such as IP addresses and domains. If you have the reverse engineering skills to identify this encrypted data then it can be used within a YARA rule.


3. Conditions

The strings section defines the search criteria that will be used for a YARA rule, the conditions section defines the criteria for the rule to trigger a successful match. There are multiple conditions that can be used which I will outline.

  • uint16(0) == 0x5A4D – Checking the header of a file is a great condition to include in your YARA rules. This condition is stipulating that the file must be a Windows executable, this is because the hex values 4D 5A are always located at the start of an executable file header. This is reversed in YARA due to endianness.

  • uint32(0)==0x464c457f) or (uint32(0) == 0xfeedfacf) or (uint32(0) == 0xcffaedfe) or (uint32(0) == 0xfeedface) or (uint32(0) == 0xcefaedfe) – Used to identify Linux binaries by checking the file header.

  • (#a == 6) – String count is equal to 6.

  • (#a > 6) – String count is greater than 6

There are a few different ways to specify the file size condition.

  • (filesize>512)

  • (filesize<5000000)

  • (filesize<5MB)

Once the strings have been declared within a rule you can then customize how many matches need to be triggered as a condition for the rule to return what it deems a successful condition.

  • 2 of ($a,$b,$c)

  • 3 of them

  • 4 of ($a*)

  • all of them

  • any of them

  • $a and not $b

Where possible try and use 2-3 groups of conditions in order to avoid generating false positives and to also create a reliable rule.

4. Imports

Imports are a great way to implement additional conditions into your YARA rules, in this article I will cover some examples of how to use the PE import.


PE Library:

Adding the syntax ‘import pe’ to the start of a YARA rule will allow you to use the PE functionality of YARA, this is useful if you cannot identify any unique strings.


Exports are great additions to a YARA rule, exports are functions that the malware author has created so be sure to make use of their unique names. In the image below I have identified some exports used by a DLL that was dropped by a piece of Formbook malware.



  • pe.exports(“Botanist”, “Chechako”, “Originator”, “Repressions”)

In the image below I have identified an interesting DLL that is used for HTTP connectivity, winhttp.dll:



We can also see that this library imports a number of interesting APIs that could be included within a rule.



  • pe.imports(“winhttp.dll”, “WinHttpConnect”)

  • pe.machine == pe.MACHINE_AMD64 – Used for checking machine type.

An imphash is the hash of the malware’s import address table or IAT which we identified in the previous image using PEStudio. The same IAT will often be used across a malware family so using it in a YARA rule should detect similar samples.

  • pe.imphash() == “0E18F33408BE6E4CB217F0266066C51C”

For a files timestamp to be used in a YARA rule it must be converted to an epoch unix timestamp, in the image below I have identified when the malware was compiled.



Using the syntax ‘//’ allows comments to be made within the rule, so below I am able to add a comment which specifies what the epoch timestamp is.

  • pe.timestamp == 1616850469 // Tue Dec 08 17:58:56 2020

The version section of PEStudio shows that this sample of Lokibot has some unique version identifiers, using the pe.version_info function we can specify which version properties to use such as the ‘CompanyName’ field.



  • pe.version_info[“CompanyName”] contains AmAZon.cOm

  • pe.language(0x0804) // China – Languages identified can be used by specifying the Microsoft language code.

In the image below I have identified a number of sections in the malware that aren’t commonly found in other Windows executables I have analyzed. Using this information I can specify specific section names and the associated section number.


Note the sections are zero-indexed, so the first section would be ‘0’, the second would be ‘1’, and so on. So in the example below I have used the section named ‘BSS’ which is section number two.



  • pe.sections[2].name == “BSS”





The Tech Platform

0 comments

Comments


bottom of page