Malware Analysis ! 2day
Note: This is a translation of a previously written article into English, so if you have trouble reading the information, I will improve the translation.
Principle: Program Static Analysis refers to a code analysis technology that scans program code through Lexical analysis, Parsing, control flow, Data-flow analysis and other technologies to verify whether the code meets the standards, security, reliability, maintainability and other indicators without running the code. This technology can be used to verify whether the software is a virus, generally analyzed from the following aspects:
String Checking
In the case that the program is not running, some tools are used to extract the program string to see if there is suspicious information to determine if it is a virus, the principle and analysis are as follows:
Definition: A string or string (String) is a string of characters consisting of numbers, letters, underscores, etc. It is mainly used for programming, concept descriptions, function explanations, etc.Additional knowledge: a string is similar to an array of characters in storage, so each individual element of its bit is extractableCommonly used for: output information, URL addresses, file names, path information, etc.As computers can only recognise 0 and 1 numbers, encoding techniques are often used to solve this problem in order to use the string specified by the inputDefinition: Encoding is the process of converting information from one form or format to another also known as the code of a computer programming language. A pre-defined method is used to encode text, numbers or other objects into numbers, or to convert information or data into a defined electrical pulse signal. Coding is widely used in electronic computers, television, remote control and communications. Encoding is the process of converting information from one form or format to another. Decoding, is the reverse process of encoding.Common coding techniquesCode:
1 | ASCII: A computer coding system based on the Latin alphabet, mainly used to display modern English and other Western European languages. |
To extract strings from the computer’s binary code, you can use the following tools
Strings
Official website: https://docs.microsoft.com/zh-cn/sysinternals/downloads/strings
Function: Finds printable strings in object files or binary files
Drawback: but it will ignore the contextual formatting. It may search for: a memory address, a sequence of CPU instructions, a piece of data, etc.
Limitations: It will only search for printable strings with three or more consecutive ASCII (2 zero-terminated) or Unicode (4 zero-terminated) characters ending in a terminator.
Target: Computer viruses can exploit this search restriction to cause Strings to search for useful strings (e.g. by turning all characters into two characters before stitching)
Tip: Change the file suffix when searching to avoid running
Case in point:
- SDL Passolowebsite: https://www.sdltrados.cn/cn/products/passolo/ Function: Software localization tool, and of course searchable stringsDrawbacks: large program size
- Resource Hackerwebsite: https://www.sdltrados.cn/cn/products/passolo/
- Lingobit Localizerwebsite: http://www.lingobit.com/zh/index.html
- Resource Tunerwebsite: http://www.restuner.com/
- Restoratorwebsite: https://www.bome.com/products/restorator
- Sisulizerwebsite: https://www.sisulizer.com/
- General search to the url or other IP address should be careful, it is likely to be a virus, good protection in access verification, to avoid web page hanging horse.
PE Checking
As most infecting viruses are infecting PE files, as this allows them to run their own virus code while the PE file is running. This allows the virus to continue to infect other normal files in order to spread itself. So from an antivirus point of view, you should first determine whether a file is a PE structure and then decide which method you should use to scan the file. So, how to determine whether a file is PE structured or not, let’s start with the concept of PE:
- PE concept: PE (Portable Execute) files are a generic term for executable files under Windows, commonly known as DLL, EXE, OCX, SYS, etc.Scope: Windows executable programs and dynamic link librariesContains information: necessary information on how Windows loads files from the hard disk into memory for executionThe fact that a file is a PE file has nothing to do with its extension - PE files can have any extension. So how does Windows distinguish between executable and non-executable files? We call LoadLibrary and pass a filename, how does the system determine that this file is a legitimate dynamic library? This is where the PE file structure comes in.The PE structure
This specification describes the structure of executable (image) files and object files under the Windows family of operating systems. These files are referred to as Portable Executable (PE) and Common Object File Format (COFF) files, respectively.
https://learn.microsoft.com/en-us/windows/win32/debug/pe-format
Commonly used PE analysis tools
PE-bear: https://github.com/hasherezade/pe-bear
PEview(roguekillerpe): https://www.adlice.com/download/roguekillerpe/
PPEE: https://mzrst.com/
CFF Explorer: https://ntcore.com/?page_id=388
PE Explorer: http://www.heaventools.com/overview.htm
Checking and killing techniques:
Determining if a program entry point in a PE file is abnormal
Many viruses, after infecting a PE file, will usually add a portion of code to the PE file and then change the AddressOfEntryPoint in the PE header to locate the address to the code inserted by the virus so that whenever the file is run, the virus code will be the first to run.
In general, many viruses place the code inserted into the PE file at the back of the PE file and then place a statement at the end of the code to jump back to the real entry point of the original PE file. This allows the user to execute the virus code unnoticed. Anti-virus software can determine whether a file is suspected of being infected by a virus based on whether the entry point of the PE file is abnormal. If the entry point of a PE file points to something other than this, then the file is suspected of being infected by a virus. Of course, this subjective judgement is not always accurate, but it can be considered a basis for judgement. The heuristic scan we mentioned last issue uses such features to help determine unknown viruses.
Some viruses have also come up with a number of ways to change the program flow without modifying the entry point in order to prevent such detection by anti-virus software. For example, changing the code of the original entry point program and then jumping to the virus body.
Extracting feature codes based on PE structure
Feature codes are extracted by dividing the file into different parts and then extracting a certain length of content from each part as a feature code. The problem with this method is that many viruses have similar features, such as the PE structure we are discussing, and a large part of the beginning of many PE files is the same, so it is not ideal to extract the features by dividing the file into equal parts. This is where we considered using the PE structure to extract a certain amount of content from each section as feature codes, or using various key points as references to find feature codes in the vicinity. In this way, the drawbacks of the equal division of files to extract feature codes method mentioned above can be greatly avoided, and the variability of feature codes among different viruses is enhanced. For example, for this detection of CIH virus, features near the PE Header and near the entry point were examined.
Demo:
Identification of CIH virus
There are three characteristics:
The first is that if the first byte of the PE Header is non-zero, it is likely to be infected, and CIH itself uses this to determine this.
However, this feature is not always reliable, as programs that are not infected with the CIH virus may also become non-zero in this area for various reasons, so two additional code features are added.
CIH will change the code entry point to point to itself, based on this, we took the entry point offset feature and used the siddt action and the two actions of hanging the file system hook behind it as features, so that it is more reliable.
Of course, all 3 features are concentrated in the virus header, if we want to be more reliable and avoid false positives within the family, we can also add some code behind the virus body
Linking libraries and functions
How are linked libraries and functions targeted by computer viruses when they can bring so much useful information to the analysis of viruses? With this in mind, learn more:
The reason for targeting: the virus uses the import table in the PE structure to import into the computer’s memory the link libraries, functions and other things containing malicious content that the computer virus needs, and calls the functions in the dynamic link libraries (linking the computer virus code to the dynamic link libraries through the link libraries) to prepare the work
- Introductory question: what is linking and what are the linking methodsThe problem that linking solves is the integration of our own code with a library written by someone else.
- Static linking: is the least common method of linking code bases on Windows platforms, but is more common in UNIX and Linux programs.
- What: The binary code for all required functions is included in the executable file when it is generated (link time). Therefore, the linker needs to know which functions are required by the target files participating in the link, and also what functions are available in each target file, so that the linker knows if every function required by the target file can be linked correctly.If a function required by a target file is not found in a participating target file, the linker reports an error.There are two important interfaces in the target file to provide this information: one is the symbol table and the other is the relocation table.When a library is statically linked to an executable, all the code in this library is copied to the executable.
- Advantage: no library dependencies are required at the time of release, i.e. no more libraries to be released with, the application can be executed independently.
- Disadvantages: However, there is no information about the linked library in the PE file header. This method results in a larger executable and takes up more memory space; if the static library is updated, all executable files will have to be re-linked to use the new static library. This linking method is not normally used by computer viruses to reduce the size of the virus.
- Linking time: at the time of generating the executable (linking done during compilation)
- Dynamic linking: Dynamic linking is the most common and should be of most concern to malicious code analysts. Dynamic linking information is written in the import table and when the code base is dynamically linked, the host operating system will search for the required code base when the program is loaded.
- Features: Instead of directly copying the executable code at compile time, this information is passed to the operating system by recording a series of symbols and parameters, which are passed to the operating system when the program is run or loaded. The operating system is responsible for loading the required dynamic libraries into memory, and then the program, when running to the specified code, goes to share the execution of the dynamic library executable code already loaded in memory, eventually achieving the purpose of run-time connectivity.
- Advantage: multiple programs can share the same piece of code without the need to store multiple copies on disk.
- Disadvantage: As it is loaded at runtime, it may affect the pre-execution performance of the program.
- Link time: when the program is running or loaded
- When the application calls the LoadLibrary or LoadLibraryEx function, the system tries to locate the DLL in load-time dynamic linking search order (see Load-time dynamic linking); if found, the system maps the DLL module into the process’s virtual address space and increases the reference count. If the code of the DLL specified when LoadLibrary or LoadLibraryEx is called is already mapped to the virtual address space of the calling process, the function returns only the handle to the DLL and increases the DLL reference count. Note: Two DLLs with the same filename and extension but not in the same directory are not considered to be the same DLL.Note: Although runtime linking is not popular in legitimate programs, it is commonly used in malicious code, especially when the malicious code is cased or obfuscated. Because shelling or obfuscation destroys the import table of a computer virus, without which the Windows system will not help the virus to complete its linking work, it is necessary to use run-time linking as a method to load the required linked libraries and functions into memory space at runtime.
- Features: link only if needed for fit
- Advantage: executable programs using run-time linking only link to the library when a function is needed, rather than at program startup as in dynamic linking mode
- Disadvantage: you need to use the relevant function to call it
- Link time: when a function call is encountered
- Link-based analysis:The PE file header lists all dynamic link libraries and functions required by the computer virus codeDynamic link library and function names can be used to analyse the function of a computer virusInformation on commonly used dynamic link libraries
Commonly used analytical tools:
Dependency Walker:Included in some versions of Visual Studio and other Microsoft development packages to support dynamic linking functions that list executable filesCommon functions in viruses:
- LoadLibrary: dynamically loads the dynamic link library from the hard disk into the computer virus memory space
- GetProcAddress: finds the address of the corresponding function in the DLL
- URLDownloadToFile(): will download a file from the InternetImport functionsThe PE file header also contains information about the specific function used by the executable, as you can only see the name of the function in the import function, in order to understand the parameters, functions and usage of the function, you can find this information in Microsoft’s MSDN or, of course, using a search engine.Exporting functionsSimilar to the import functions, the export functions of DLLs and EXEs are used to interact with other programs and code.Usually a DLL will implement one or more functions and then export them so that other programs can import and use them.The PE file also contains information about which functions are exported in a file
Ancillary kill detection
Anti-virus software, malware checking platforms and malware analysis platforms are commonly used to assist in the checking and killing process, and they have the following advantages:
Having a virus signature database: a database that contains various “lookalikes” of known viruses, based on which proprietary characteristics, software can be identified as a virus, mainly for known viruses.Virus targeting: the writers of computer viruses can easily modify their code to change the various characteristics of these viruses, often using the following techniques to avoid detection by anti-virus softwareCode:
1 | Polymorphic techniques: semantic invariance, syntactic obfuscation, increased difficulty of inverse analysis. |
- Have heuristic rules: because there are virus characteristics in the feature library is not, antivirus software did not check these unknown viruses, it is based on the known virus analysis experience summed up some rules to identify whether the software is a virus, mainly for unknown viruses.Virus for: the development of new types of viruses, not used also by antivirus software to know the characteristics and behavior has avoided antivirus software detection
When there is no local antivirus software, traffic is not a lot of conditions such as the existence of restrictions, can be calculated by entering the file Hash value, to some websites using Hash value to check and kill, the principle and common query platform is as follows:
Principle: Hash is a unique algorithm (hash function) to calculate the unique identifier of a file, which varies from file to file, influencing factors can be file size, content, creation date, etc. …… calculates the hash value, using these characteristics to understand that the file is not corrupted or modified can also be used to query the analysis results in the query platform.
Calculation tools:
Hasher Pro: http://www.den4b.com/
HashOnClick: https://www.2brightsparks.com
Hash Generator Pro: http://insili.co.uk/
MD5 File Hasher Pro: http://www.digital-tronic.com/md5-file-hasher/
Advanced Hash Calculator: http://www.filesweb.com/
Virus Toal: https://www.virustotal.com/gui/home/search
morality is one foot higher, the devil one foot higher
Viruses often use shelling and obfuscation techniques to avoid being analysed by static analysis techniques
- Purpose: to avoid detection by antivirus software and to make virus analysis more difficult
Obfuscation: Hiding information about computer virus programs
Commonly used tools:
DotFuscator: https://www.preemptive.com/
DashO Pro: https://www.preemptive.com/
ProGuard: https://www.guardsquare.com/en/proguard
Virbox Protector: https://shell.virbox.com/
Code Virtualizer: https://www.oreans.com/
Skater .NET obfuscator: http://www.rustemsoft.com/ Shelling: Compressing the size of computer virus files and protecting the core code of the virus using encryption techniques
Commonly used tools:
UPXShell: http://upxshell.sourceforge.net/download.html
DRMsoft EncryptEXE: http://www.drmsoft.com/
Vmproject: https://vmpsoft.com/
Protection strategies against viruses, often assisted by shelling and anti-obfuscation techniques for analysisShelling: Shelling is the removal of software shells, there are manual shelling and automatic shelling of software
Common tools:
QuickUnpack: http://qunpack.ahteam.org/?p=458#more-458
frida-unpack: https://github.com/WeiEast/frida-unpack
de4dot: https://github.com/0xd4d/de4dot
drizzleDumper: https://github.com/DrizzleRisk/drizzleDumper
de4js: https://github.com/lelinhtinh/de4js
wxappUnpacker: https://github.com/gzh4213/wxappUnpacker
Android_unpacker: https://github.com/CheckPointSW/android_unpacker
unpacker: https://github.com/malwaremusings/unpacker- Anti-obfuscation: bringing code back to a beautiful, highly readable state
Commonly used tools:
simplify: https://github.com/CalebFenton/simplify
de4dot: https://github.com/0xd4d/de4dot
flare-floss: https://github.com/fireeye/flare-floss
Tigress_protection: https://github.com/JonathanSalwan/Tigress_protection
VTIL-Core: https://github.com/vtil-project/VTIL-Core
dex-oracle: https://github.com/CalebFenton/dex-oracle
malware-jail: https://github.com/HynekPetrak/malware-jail
de4js: https://github.com/lelinhtinh/de4js
dnpatch: https://github.com/ioncodes/dnpatch
etacsufbo: https://github.com/ChiChou/etacsufbo
samsung-firmware-magic: https://github.com/chrivers/samsung-firmware-magic
JRemapper: https://github.com/Col-E/JRemapper