Microsoft and Intel project converts malware into images before analyzing it

BY admin May 16, 2020 Microsoft ، Privacy & Security 6 views

Microsoft and Intel have recently collaborated on a new research project that explored a new approach to detecting and classifying malware.

Image: Microsoft

Microsoft and Intel have recently collaborated on a new research project that explored a new approach to detecting and classifying malware.

Called STAMINA (STAtic Malware-as-Image Network Analysis), the project relies on a new technique that converts malware samples into grayscale images and then scans the image for textural and structural patterns specific to malware samples.

HOW STAMINA ACTUALLY WORKS

The Intel-Microsoft research team said the entire process followed a few simple steps. The first consisted of taking an input file and converting its binary form into a stream of raw pixel data.

Researchers then took this one-dimensional (1D) pixel stream and converted it into a 2D photo so that normal image analysis algorithms can analyze it.

The width of the image was selected based on the input file’s size, using the table below. The height was dynamic, and resulted from dividing the raw pixel stream by the chosen width value.

20200516.Microsoft-and-Intel-project-converts-malware-into-images-before-analyzing-it-01.png

Image: Intel, Microsoft

 

The Intel and Microsoft team said that resizing the raw image did not “negatively impact the classification result,” and this was a necessary step so that the computational resources won’t have to work with images consisting of billions of pixels, which would most likely slow down processing.

MICROSOFT’S INVESTMENT IN MACHINE LEARNING

The research is part of Microsoft’s recent efforts of improving malware detection using machine learning techniques.

STAMINA used a technique called deep learning. Deep learning is a subset of machine learning (ML), a branch of artificial intelligence (AI), which refers to intelligent computer networks that are capable of learning on their own from input data that is stored in an unstructured or unlabeled format — in this case, a random malware binary.

Microsoft said that while STAMINA was accurate and fast when working with smaller files, it faultered with larger ones.

“For bigger size applications, STAMINA becomes less effective due to limitations in converting billions of pixels into JPEG images and then resizing them,” Microsoft said in a blog post last week.

However, this most likely doesn’t matter, as the project could be used for small files only, with excellent results.

In an interview with ZDNet earlier this month, Tanmay Ganacharya, Director for Security Research of Microsoft Threat Protection, said that Microsoft now heavily relies on machine learning for detecting emerging threats, and this system uses a different machine learning modules that are being deployed on customer systems or Microsoft servers.

Microsoft now uses client-side machine learning model engines, cloud-side machine learning model engines, machine learning modules for capturing sequences of behaviors or capturing the content of the file itself, Ganacharya said.

Based on the reported results, STAMINA could be very well one of those ML modules that we may soon see implemented at Microsoft as a way to spot malware.

Currently, Microsoft can make this approach work better than other companies primarily because of the sheer data it possesses from the hundreds of millions of Windows Defender installs.

“Anybody can build a model, but the labeled data and the quantity of it and the quality of it, really helps train the machine learning models appropriately and hence defines how effective they are going to be,” Ganacharya said.

“And we, at Microsoft, have that as an advantage because we do have sensors that are bringing us lots of interesting signals through email, through identity, through the endpoint, and being able to combine them.”

Comments

write your comment.

Your email address will not be published.