Sam's Machine Learning Data Structure-Architect

		Sam's Machine Learning Analysis, Data Structure-Architect©

Sam's Machine Learning Analysis, Data Structure-Architect
Table of Contents:

         • Introduction
         • Our Machine Learning Detection Approach
         • Our Machine Learning Detection Strategies
         • Scanning Buffer(s)
         • OS and Compiler-Interpreter Simulators
         • Estimating the Total Number of Zeros-&-Ones Search Patterns
         • Detection Zeros-&-Ones Data Structure
         • Sample Analysis of Zeros-&-Ones Matrices
         • SolarWinds Hacking Lessons - Hidden in Plain Sight
         • Hackers' Code
         • Programming Languages
         • Scripts
         • Libraries of Patterns
         • Operation Matrices - The Spine of Our Machine Learning System
         • Engines
         • Dynamic Business Rules
         • Bare-Metal Server Features

Introduction
We recommend that the readers need to check our Machine Learning Page. Not long ago, we posted a page in SamEldin.com website on how to build business processes as Zeros-&-Ones to start an intelligent automation processes for business. We called it Dynamic Business Rules:

              http://sameldin.com/SamQualificationPages/MachineLearning.html
              http://sameldin.com/DynamicBusinessRules/index.html

The main goal of this page is to analyze-design-architect our Machine Learning (ML) structure (software and hardware). Our ML would be addressing Cybersecurity detection issues. Our focus in this page is networks' side. The goal is securing networks from external and internal hacking.

Internet Protocol
There is a number of internet and/or networks' communication protocols which the networks and their users would use to perform the requested services:

Transmission Control Protocol (TCP)	Internet Protocol (IP)	User Datagram protocol (UDP)	Gohper
Address Resolution Protocol (ARP)	Domain Name System (DNS)	File Transfer Protocol (FTP/S)	File Transfer Protocol/Secure (FTPS)
HyperText Transfer Protocol (HTTP)	HyperText Transfer Protocol/Secure (HTTPS)	Internet Message Access Protocol (IMAP)	Post Office Protocol (POP3)
Remote Desktop Protocol (RDP)	Secure Sockets Layer (SSL)	Session Initiation Protocol (SIP)	Server Message Block (SMB)
Simple Network Management Protocol (SNMP)	Secure SHell (SSH)	Telnet	Virtual Network Computing (VNC)

1.	These protocols are gateways to transfer data, emails, files (all types), send commends, requests, access, ..etc
2.	Every network must scan every item coming into their system, which is an overwhelming task and requires constant surveillance, updates and changes
3.	Every network must constantly scan every item on the network for hidden malicious malware
4.	There are numerous vulnerabilities in these protocols which lead to their active exploitation and pose serious challenges to network security
5.	Malicious code can be embeded in every item coming-going-residing (OS, software, files, commands, requests, ..etc) on the networks
6.	Hackers can use networks and other computers to launch their attacks
7.	It is 24 X 7 monitoring plus the risks are quite high

Scanning Issues:

         1. There is almost endless number of files which need to be scanned
         2. The files types and sizes are numerous
         3. Any file can have malicious code
         4. Scanning speed and system performance must be addressed
         5. The ever-changing hackers tactics and tools
         6. Lack of education of networks' users on how to help in the networks' detection and the prevention
         7. Most of the detection and the prevention software lacks intelligence-automation
         8. Dependencies on Cybersecurity vendors to perform detection and prevention
         9. Hackers use Reverse Engineering, AI and ML to add to their arsenal of attacks
         10. Lack of experience of Cybersecurity staff and processes
         11. Lack of experience of Cybersecurity management and processes
         12. Most of detection testing are done by using free, open-source and vendors' tools which adds more risks
         13. Management is not willing to take the risks of building their own security tools and software
         14. Management rather passes the security risks to security vendors
         15. Detection is not a science, but a guessing game

Other issues:

         1. Internal hacking
         2. Human factor
         3. Old networks, system and old software which are not well maintained and are open gates for hackers
         4. Rollback
         5. Recovery
         6. Backup
         7. Audit trail

Our Machine Learning Detection Approach
Pros and Cons of Our Machine Learning Tools:
Pros:
To make life short, Machine Learning (ML) would be running in the background of any software or system. ML would be the added intelligence and automation to these systems. ML would perform all the background support plus most of the tedious analysis and/or calculations.
The background of the following system would implement ML:

       1. Cybersecurity Detection
       2. Reverse Engineering
       3. DevOps
       4. DataOps
       5. Integration
       6. Customer Relationship Management (CRM)
       7. Management
       8. Big Data
       9. Data Services - Data Storage - Data Exchange
       10. Customer Answering Services
       11. Email and Email security
       12. Software Analysis
       13. Documentation
       14. Training
       15. Software Testing
       16. Building and Migrating Data Centers
       17. ..etc

Our approach to ML is to build Zeros-&-Ones of the target item(s) such as Cybersecurity detection. We would be looking at the hacker's code, their distinct features, hackers' habits, programming tools, thinking, ..etc and from these we would be building the Zeros-&-Ones. We would be using the Zeros-&-Ones to develop the Bytes, Words and the Scanning Patterns.

Cons:
We believe we are ahead of all the existing ML researches and all the world ML tools. Sadly the ML experts are not willing to admit that their work is nothing more than a guessing game, where they are trying different algorithms. Then, if all fail, they would fix the data to fit the algorithms or their search patterns. Not to mention, they are using vendors or someone else's tools. Their work is Not a Concrete Science.

We have to admit also that we lack resources and finance, but we do not lack experience, energy and innovation. Therefore, we are creating everything from scratch. We have no choice except to outsmart the hackers and the big Cybersecurity vendors.

Our View of Networks' Detection Issues:

TCP/IP packet data- Network's Inbound Traffic

TCP/IP packet data- Network's Inbound Traffic

Image #1

Looking at Image #1, all the OSI seven layers or TCP/IP five layers can be hacked. Our main concern in this page is the packet's data. All the internet inbound traffic (digital) is composed of streams of bytes and any stream could be a possible carrier of hackers' code.
Therefore, our Machine Learning tools have the responsibility of:

         1. Find the malicious bytes of code - fast
         2. The speed of finding it is critical
         3. The types
         4. The count
         5. Actions needed
         6. The source
         7. Track it
         8. Audit trail it
         9. Warning all involved parties
         10. Lessons learned

These catergories would be used as our Zeros-&-Ones .

Our Machine Learning Detection Strategies

Looking at Image #1, scanning the inbound byte streams is an overwhelming task to any scanning tool:

         • The malicious code can be embedded in any stream and at any position in the stream
         • Possible Variation of code
         • Tactics used
         • The attacks can be relentless

Hacking Scenarios:

         • ".exe" hackers' code in middle of files (images, PDF, HTML, text, emails, ..etc)
         • Self-extracting zipped code in the middle of files (images, PDF, HTML, text, emails, ..etc)
         • Hidden code in DLLs
         • Cross-site scripting (XSS) attack

Our Strategies:

         • Build speedy detection with flexibility to tackle any variation
         • Find hackers code at the start of every gate
         • Develop scanning processes which can be implemented in any sequence
         • Tracking
         • Audit trail
         • Evaluate every attack
         • Self-evaluating processes
         • Build history of processes (successful or not)
         • Build fast bare-metal structure for fast scanning
         • Automation of all the detection processes

How can we build speedy a detection with flexibility to tackle any variation?

Our Machine Learning Detection Components are:
Our architect-design has robust components for continuous scanning. In the case of our scanning encounters a difficult, time consuming or a new case, the scanning would be moved a Dedicated Virtual Testing Server to handle the issues separately. Our Crashes rollback is nothing more than moving the production IP address to the Virtual Rollback Server.

ML Detection Components Diagram

Image #2

Looking at the Image #2, we have three virtual servers or subsystems. The 2,000 foot view of our ML has the following major subsystems:

•	Our ML Scanning Subsystem runs the scanning and learning issues
•	A Virtual Testing Server with the task of extensively testing of any income request which would require more time to test or it is an unknown case to our ML processes
•	A Rollback Virtual Server is a standby in case we need to rollback and not stop our scanning objectives
•	Each of the detection subsystem would be running on a different bare-metal server. Therefore, if one server crashes, it would not impact the rest of the running servers

ML Components - Image #2:

         1. Scanning Buffer(s)
         2. OS and Compiler-Interpreter Simulators
         3. Libraries of Patterns
         4. Dynamic Business Rules
         5. Sort Engines
         6. Tracking Engines
         7. Operation Matrices - Matrices Working Pool
         8. Matrices Pool Management
         9. Evaluation Engines
         10. Decision-Maker Engines
         11. Execution Engines
         12. Reports-Statistics Engines
         13. Storage Utilities
         14. Storage - NAS
         15. Virtual Testing Server
         16. Virtual Rollback Sever

How can we build an intelligent system?
In a nutshell, the following steps or processes are what defines "Human Intelligence" which is the ability to:

         1. Learn from experience - (Zeros-&-Ones to develop the Bytes, Words and the Scanning Patterns)
         2. Adapt to new situations - (Scanning and tracking)
         3. Understand and handle abstract concepts - (Matrices for tracking and processing)
         4. Use knowledge to manipulate one's environment - (Engines to run the system)

Our Machine Learning Components Functionalities:
To architect-design an intelligent system which dynamically learns as it runs, we need to architect-design independent components and communication lookup boards of matrices with values. Each component is architected-designed to:

•	Perform one task - help vertical scaling by adding more components as needed
•	Producers and Consumers: the out of one component can be used by a number of other components for specialization, adding functionalities, dynamic updates, transparencies and tracking bottleneck performance
•	Producers build matrices, Consumers get a copy of the matrices to work with and the matrices are stored for tracking, further analysis and tracking performance with timestamps
•	All Matrices are created or instantiated with default values
•	This would also eliminate any data synchronization, updates and memory issues
•	Different components may view the same data differently which adds intelligence and functionalities
•	Dynamic Business Rules help keep the updates and the changes current without modifying any code

Scanning Buffer(s)
The main objective of buffers is to speed and balance processes and help input streams from being blocked or slowed down. All the firewalls would be sending their output to our ML Scanning Buffer, where our ML starts processing. Buffers can also be dumped to backup system.

OS and Compiler-Interpreter Simulators
The task at hand is not trivia nor joke, the following is actual values from actual events:

•	The AWS DDoS Attack in 2020 - it was estimate 2.3 terabytes per second is equal to 1 billion packet per second.
•	September 09,2021, the Russian tech giant Yandex was hit with the biggest DDoS Attack ever - a record attack of nearly 22 million requests per second

         We estimate 2.3 terabytes per second is equal to 1 billion packet per second.

We need to remind our readers that our focus is the networks' side detection.

What are our ML goals?

         • Scan every incoming byte
         • Narrow our focus on executable statements by OS, Compiler-Interpreter, Shell script, ..etc
         • Build an Intelligent Automated Virtual Integrate-able Cost-Effective System - see Images #2

Architecting-Designing Our OS-Compiler-Interpreter Simulators
what is Our OS and Compiler-Interpreter Simulator?

Our OS and Compiler-Interpreter Simulator is a software tool (we will develop) which mimics OS and Compiler-Interpreter without the overhead of qualifying and running every statement. In short, it is an OS, a complieror or an interpreter which parses the executable statement to know the names of built-in functions or the names of the commands used. It would send the names of the built-in function or the name of the command to a matrix and no further action would taken.

Let us look at the following C programming statements and a few bytes from the content of a jpg image file as examples:

         total_1 += value;
         // math expression will not be flagged

         pointerToMyArray += 3;
         // pointer arithmetic will be flagged

         if (remove(fileName) < 0)
         // deleting or removing files will be flagged

         "ÿØÿà JFIF x x ÿþ LEAD Technologies Inc. V1.01 ÿÛ"
         // Bytes from the content of jpg image file will not be flagged

Therefore, the speed of our Simulators' execution would be as fast as the processors and registers speed. The matrices and scanned byte would loaded in Cache and core memory for execution speed and No slow IO calls. Matrices would be copied to the Working Matrices Pool located in Core memory for speed. The system engines would get their own copies and the main Matrices would be saved to NAS for further analysis. Storage Utilities Engines would be performing the storing of the Matrices.

Note:
We would team up with OS-Compilers vendors to build our OS and Compiler-Interpreter Simulators.
We would develop the specs documents for our Simulators and these documents would be provided at later time.
In the case we would not find OS-Compilers vendors to work with us, then we do have the background and experience to develop our own. C language would be our development choice.

OS CompilerI nterpreter Simulators Diagram

OS CompilerI nterpreter Simulators Diagram

#3 Image

Image #3 has the a rough draft of how our OS and Compiler-Interpreter Simulators would scan the inbound networks traffic. The firewalls would be sending their output to a buffer and Sort Engine would sort the inbound traffic according what possible inbound stream data (files, command, images, ..etc). We will be building a number of Simulators based on OS, Compiler-Interpreter, Shell script, ..etc. Based on the inound data type, different type of Simulator would be scanning the proper inbound type.

The main and the only one job of the Simulator is to find code (malicious or not) stores it in Simulator's output matrices.
Simulators will be running on dedicated bare-metal servers.

The main objective of hackers' code is to be executed in the computer memory by the operation system.
Therefore, our approach is:

         1. Build an OS and Compiler-Interpreter Simulators
         2. Simulators should be run by dedicated bare-metal with speedy structure
         3. Simulator Execution would be running on the hardware level - see bare-metal section
         4. Run all the incoming stream as if each stream is a running program
         5. Simulator would be doing the OS and/or compiler job
         6. Simulator will track any real code and write it to a tracking matrices for independent evaluation
         7. Simulator will ignore any non-executable text, images or any data
         8. All the tracking matrices will be evaluated for speedy action by another bare-metal server

Rough Draft of Our OS and Compiler-Interpreter Simulators Specifications
The question is:

         How to find an executable statement in the inbound stream of bytes fast and not to slow the system performance?

Our approach is:

         Every inbound byte is guilty until it is proven innocent

Therefore, we need to build a Our OS and Compiler-Interpreter Simulators which mimics OS and Compiler-Interpreter without the overhead of qualifying and running every statement. For example a jump - "goto" statement as in the following Code Segment #1:

                     startAgain:
                     while connect(mySocket, (struct socketAddress *) &serverAddress, sizeof(serverAddress) != 0){
                                   sleep(60);
                                   goto startAgain;
                     }
                                   Code Segment #1

We need to know the actual byte or text presentation as it is written in the following:

         • Text
         • Object code
         • Byte code
         • Assembly
         • or any executable code which would run in the computer memory

Any of the actual representation is our Zeros-&-Ones Criteria in scanning inbound stream of bytes.
Therefore our Simulators would be looking for every possible presentation of the jump statement "goto". Our ML also needs to distinguish between legitimate and illegitimate code.

Note:
Our task is not simple, for example, a Java or C "switch" and also shell script "case" statements would have a number jump statement within their structure. Therefore, we need to build a base data object (Data Access Object in Java) which would be used to create the search patterns our ML would be scanning for. Now our "goto" statement does have more than one pattern based on the programming languages and commands.

The next questions would be:

         • The number of possible Zeros-&-Ones
         • Variations of each Zeros-&-Ones
         • ..etc

The Zeros-&-Ones is the start, but we would be building scanning patterns from Zeros-&-Ones, hackers' code, hackers' tendencies, approaches, ..etc. Our focus would be these patterns plus we would be adding more patterns as we run our ML and learn more.

Let us look at the types of Operating Systems (OS) and Compiler-Interpreter of programming languages.

Types of Operating Systems (OS):
OS is a set of programs and each program is composed of statements. There are a number of OS based on the type of services their performance.
The following is a set of OS types:

       1. Batch OS
       2. Distributed OS
       3. Multitasking OS
       4. Network OS
       5. Real-OS
       6. Mobile OS
       7. Hypervisor
       8. Routers - A router is an Embedded system category, it is a computer system with limited number of tasks
       9. Browser - the browser is the executor of code similar to OS

Compiler-Interpreter of each the following languages:

         1. Linux, Unix, and Windows scripts
         2. C and C++
         3. Python
         4. Java
         5. JavaScript
         6. Assembly Instruction
         7. PHP

Estimating the Total Number of Zeros-&-Ones Search Patterns
At this point in the Analysis-Design-Architect, we can give a good estimate of all the possible Zeros-&-Ones Search Patterns.
Let us examine the following facts.

Statements and Functions:
The number of Zeros-&-Ones would be close to any OS, programming languages or scripts built-in functions, macros and commands. For example, our Zeros-&-Ones for C language would what the C complier would be using:

       abort, abs, acos, asctime, asin, assert ...
              tan, tanh, time, tmpfile, tmpnam, tolower, toupper, ungetc, va_arg, vprintf, vfprintf, and vsprintf

We estimated to be little over 140 function-call and our prediction for the rest of the programming languages would be close. Our task is doable with far less search items than what we believe the security vendors searching tools are using.

What is the total number of the build-in functions and commands which all the listed OS and Compiler-Interpreters?
Let us look at the following:

         • Linux Kernel uses over 100 commands
         • Unix Kernel uses over 100 commands
         • C Language has about 140 built-in function

Assumptions:
Our estimate of all the OS and Compiler-Interpreters would be:

         20 OS and Compiler-Interpreters * 200 build-in function and command = 4,000

Let us assume that each of the build-in function and command has 10 possible variations (on an average)

Our estimate of the number of Zeros-&-Ones Search Patterns would be

         4,000 build-in function and command * 10 possible variations = 40,000 Search Patterns

The good news is Zeros-&-Ones Search Patterns is not in the millions plus each type OS and Compiler-Interpreters would work with only small portion of the 40,000 Search Patterns total.

Detection Zeros-&-Ones Data Structure
All the OS, programming languages and commands have a number of built-in functions or calls, which any programming statement would include these built-in function or commands. Our detection Zeros-&-Ones are these built-in functions and commands. Our ML scanning or search patterns are the variation of possible use of these built-in functions and commands.

Data Structure - Java Data Access Object (DAO):
We need to present a rough analysis of Zeros-&-Ones Data Structure, therefore we are using Java Data Access Object (DAO) as our building block for all possible Zeros-&-Ones. We are open to any recommendation and modification to our Detection Java DAO.

Let us look at the following C functions which can be used to damage any network:

         int system(const char *string)
                  - perform a system call or used to pass the commands to be executed by operating system

         int atexit(void (*func)(void))
                  - sets a function to be called when the program exits

         int remove(const char *filename)
                  - erase a file

Looking at the C functions listed, we need to ask what would our Java Data Access Object (DAO) design should be?
Plus we need to ask the following questions about the fields types, names, values and their performance:

         Does our DetectionZerosOnesDAO object cover all the needed data?
         Can our ML would be able to process the DetectionZerosOnesDAO object fast?
         Are there redundancies?
         Does it using index for tracking?
         Does track with a Timestamp?
         Can human make sense out of fields types, names, values and performance?

/**
*
* @author sameldin
*/
public class DetectionZerosOnesDAO {

	private boolean	passFailFlag;

	private String	detectionZerosOnesName;
	private String	aliasesName;
	private String	cyberAttackName; // name for human use and not a number
	private String	IPaddresses;
	private int	OS_CompilerInterpreterID;
	private int	detection_ID;
	private int	tracking_ID;
	private int	auditTail_ID;
	private int	weight;
	private int	returnValue;
	private int	detectionProcessingIndex; // which detection
	private int	detectionRespondID; // processes to be performed
	private int	possibleOutputID;
	private int	possibleOutcomeID;
	private int	dataTypeID;
	private int	usedFrequency = 90; // 90 percentage
	private String	actualCode = "";
	private String	fileTypeUsed = "NA, images, PDF, HTML, text, emails, ..etc";
	private String	parametersList = "";
	private String	intentRanking = "numeration: legitimate, danger, normal, need to be check, ..etc";
	private boolean	dataUsedFlag = true;
	private boolean	hashingFlag;
	private boolean	callOSFlag;
	private boolean	useTimeFunctionFlag;
	private boolean	usedInGroupAttachFlag;
	private boolean	dealsWithFilesFlag;
	private Date	encounterDate;
	private Date	todayDate;
	private long	timeStamp; // date converted to long

                  ... get and set methods

} // end of DetectionZerosOnesDAO

Our OS and Compiler-Interpreter Simulators would create a Matrix with functions or commends found in the inbound byte stream. Each of the function or the command must be evaluated by Evaluation Engines. Evaluation Engines will create a Matrix for Decision-Maker Engines to pass-or-fail of the function or command. Our Java DetectionZerosOnesDAO object is designed to track every possible data about that function or command. It would have all the needed information for Execution Engines and Analysis to perform their tasks.

let us look at the following function and values which would be loaded in DetectionZerosOnesDAO object.

         int remove(const char *filename)
                  - erase a file

public class DetectionZerosOnesDAO {

	private boolean	passFailFlag = false // = false - failed

	private String	detectionZerosOnesName = "remove()"; // remove function
	private String	aliasesName = "delete"; // delete or remove files
	private String	cyberAttackName ="file deletion"; // name for human use and not a number
	private String	IPaddresses = "1.1.1.1, 22.33.44.55, 66.77.88.99"; //used as tracking
	private int	OS_CompilerInterpreterID = 12345678; // ID number - starts with 12
	private int	detection_ID = 99345678; // Detection ID number - starts with 99
	private int	tracking_ID = 34345678; // tracking ID number - starts with 34
	private int	auditTail_ID = 57345678; // audit Tail ID number - starts with 57
	private int	weight = 8; // 8 of 10
	private int	returnValue = 9; // int index is 9
	private int	detectionProcessingIndex = 17897; // which detection - starts with 17
	private int	detectionRespondID = 87654; // processes to be performed - starts with 87
	private int	possibleOutputID = 8; // 8 = files deletions
	private int	possibleOutcomeID = 9; // 9 = system damage
	private int	dataTypeID = 17; // files = 17 based in the file type PDF, JPG, ..etc
	private int	usedFrequency = 90; // 90 percentage of this happening
	private String	actualCode = "remove(const char filename);"; // int remove(const char filename)
	private String	fileTypeUsed = "NA, images, PDF, HTML, text, emails, ..etc";
	private String	parametersList = " const char *filename ";
	private String	intentRanking = "numeration: legitimate, danger, normal, need to be check, ..etc";
	private boolean	dataUsedFlag = true; // data was used - file name(s)
	private boolean	hashingFlag = false; // no hashing
	private boolean	callOSFlag = false; // no call to the OS
	private boolean	useTimeFunctionFlag = false; // no timestamp used
	private boolean	usedInGroupAttachFlag = true; // possible diversion
	private boolean	dealsWithFilesFlag = true; // deletion of files
	private Date	encounterDate; // whatever the date which is used in tracking
	private Date	todayDate; // the day it occurred
	private long	timeStamp = 1234567890; // date converted to long

                  ... get and set methods

} // end of DetectionZerosOnesDAO

ID Numbers:
Using integer numbers (Long Integer) as ID helps in speeding the processes plus, it give a large range of number to choose from. Math expression as "div", "mod" and integer bit shit can help in the selections. Therefore, all IDs used have an ID for the ID number itself. For example, detection ID, the highest digits are 99. We are only suggesting an ID system, but we are open to other ideas.

         detection_ID = 99345678; // Detection ID number - starts with 99

Pros and Cons of Zeros-&-Ones :
The architect-design must address that fact we do not want the number of Zeros-&-Ones get out of hands and scanning would take forever to do the detection.

Cons:
Let us look at the following two examples.

Example #1 - My Laptop:
We installed a new antivirus software and the internet access, the laptops performance took a dive and every software (local to my machine or web) took a few seconds if not minutes to start. We were tempted to remove such antivirus due the time delay-wasted.

Example #2 - PHP:
We do not have any experience with PHP, but we got a copy of their built-in functions. Sadly, the number of these built-in function is Hugh and our PHP's Zeros-&-Ones would not be practical. We are seeking help with PHP experts to give us the language built-in functions which we believe would be similar to C, C++, Java, Assembly. We would be building our Zeros-&-Ones from the PHP built-in functions.

Pros:
The good news is Zeros-&-Ones Search Patterns is not in the millions plus each type OS and Compiler-Interpreters would work with only small portion of the 40,000 Search Patterns total.

Sample Analysis of Zeros-&-Ones Matrices:
We are looking for not only hackers' code, but also patterns, way of thinking, ..etc. The following are our attempt to capture all possible Zer0s-Ones. At this point in architect-design stage, we are running into the "Learning Curve" and with time and brainstorming with other experts, the goal is reachable and doable.

       1. SolarWinds
       2. Hackers' code
       3. Programming languages
       4. Scripts

SolarWinds Hacking Lessons - Hidden in Plain Sight
What is SolarWinds Hack (Orion)?
Breifly, SolarWinds is a major software company which provides system management tools for network and infrastructure monitoring, and other technical services to hundreds of thousands of organizations around the world. Among the company's products is an IT performance monitoring system called Orion.

In early 2020, hackers secretly broke into Texas-based SolarWind's systems and added malicious code into the company's software system. More than 30,000 public and private organizations, including local, state and federal agencies use the Orion network management system to manage their IT resources. As a result, the hack compromised the data, networks and systems of thousands when SolarWinds inadvertently delivered the backdoor malware as an update to the Orion software.

SolarWinds Hacking Zeros-&-Ones Breakdown:
The following is more of rough breakdown of our SolarWind search and we do need further studies.

Functions

       1. After an initial dormant period of up to two weeks, it retrieved and executed commands, called "Jobs"
       2. The hackers added their code in such way that has the same style of code so no one would notice a difference
       3. Used the same names and structure
       4. Hide in plain site
       5. List of function calls
       6. Threads
       7. Tree processes
       8. Initialization
       9. Hash functions for data
       10. Hash functions for methods name calling

Hashing

       8. Initialization
       9. Hash functions for data
       10. Hash functions for methods name calling

IP addresses and DNS

       11. IP addresses
       12. Hashing IP addresses, function names
       13. IP Addresses located in Victim's Country
       14. The attacker's choice of IP addresses was also optimized to evade detection
       15. The attacker primarily used only IP addresses originating from the same country as the victim
       16. They were leveraging Virtual Private Servers
       17. The DNS response will return a CNAME record that points to a Command and Control (C2) domain
       18. The malware masquerades its network traffic as the Orion Improvement Program (OIP) protocol
       19. The C2 traffic to the malicious domains is designed to mimic normal SolarWinds API communications
       20. After a dormant period of up to two weeks, the malware will attempt to resolve a subdomain of avsvmcloud-com.

Date and Time, Stop and Sleep

       21. Tracing calls
       22. Date and timestamp
       23. Sleep functions for days
       24. Threads sleep functions

Search

       25. Search function

OS Run

       26. OS calls
       27. Executed files
       28. Rebooted the machine
       29. Stop services from running
       30. Disabled system services
       31. Receive instructions for outside sites
       32. Interrupt or stop services
       33. Run in memory only which allowed the adversary to blend into the environment, avoid suspicion, and evade detection
       34. Executed their payload and then restored the legitimate original file
       35. They routinely removed their tools, including removing backdoors once legitimate remote access was achieved
       36. They replaced a legitimate utility with their own
       37. They similarly manipulated scheduled tasks by updating an existing legitimate task to execute their tools
       38. Returning the scheduled task to its original configuration

Zipping-Unzipping

       39. Zipping and unzipping function calls
       40. Compression and decompression

File Transaction

       41. Had the ability to transfer files
       42. Temporary File Replacement and Temporary Task Modification
       43. The attacker used a temporary file replacement technique to remotely execute utilities
       44. The trojanized update file is a standard Windows Installer Patch file that includes compressed resources
       45. Profiled the system
       46. Associated with the update, including the trojanized SolarWinds.
       47. Stored reconnaissance results within legitimate plugin configuration files allowing it to blend in

Updates and Mimic

       48. The backdoor used multiple block lists to identify forensic and anti-virus tools running as processes, services, and drivers
       49. Orion.Core.BusinessLayer.dll component. Once the update is installed, the malicious DLL will be loaded by the legitimate SolarWinds.
       50. BusinessLayerHost.exe or SolarWinds.BusinessLayerHostx64.exe (depending on system configuration)
       51. They send new login names and passwords to their sites to gain access

How a Cybersecurity Architect can use such case in architecting-designing any detection architect?
SolarWinds Case is a great learning lesson which our Zeros-&-Ones matrices can use to add more Zeros-&-Ones plus a reinforcing to existing Zeros-&-Ones . Therefore we need to do the following:

       • Get if we can copies of the actual code
       • Create similar code
       • Test the code to see if we learn more and add more cases

Hackers' Code
In this section, our attempt is give a picture of how easy hackers can add their malicious code in plain sight and would pass the best of coding eyes.
We are presenting two cases:

       • Java code with SQL call - adding SQL injection and a system call
       • C functions calls and tracing the execution (in memory) stacks for the function calls

Reverse Engineering enabled hackers of turning executable code into source code. Hackers can apply Reverse Engineering to vendors' code such as DLL and get a copy of the source. They would be able to insert their malicious code and compiler the source back to DLL to be added to clients libraries as in the case of SolarWinds hacking. They also can modify the DLL files dates, permission and security information to be exact as the vendors' DLL would be.

Case #1: Java code with SQL call - adding SQL injection and a system call
The following table have example of Java code with proper syntax including a SQL call. We also present for sake of simplicity what hackers had added the following to the code:

       43. index++; try{Runtime.getRuntime().exec("...");} catch (Exception e){}
                     ...
       69. localPreparedStatement = localConnection.prepareStatement(qryString);

Line #43, is a simple increment and at the same line a system call made to the operation system which can be easily missed.

As for 69, hack had appended SQL injection in the qryString, hackers can hard code the qryString with SQL injection.

Vendor Program ABC Using Java	Hacked Vendor Program ABC Using Java
// Line # + Code // comments 23. public class InsertSQLStatements { ... 43. index++; ... 69. localPreparedStatement = localConnection.prepareStatement(qryString); 70. localPreparedStatement.executeQuery(); 71. localPreparedStatement.close(); ... 242. } // end of code	// Line # + Code // comments 23. public class InsertSQLStatements { ... 43. index++; try{Runtime.getRuntime().exec("...");} catch (Exception e){} ... 69. localPreparedStatement = localConnection.prepareStatement(qryString); 70. localPreparedStatement.executeQuery(); 71. localPreparedStatement.close(); ... 242. } // end of code

Code Found Matrix Specs:
The goal of our OS and Compiler-Interpreter Simulators and in this case, Java Simulator should be able to pick any possible format of hacking code. Our Simulator would be able to scan any code in:

         • Text
         • Object code
         • Byte code
         • Assembly
         • or any executable code which would run in the computer memory

The goal of our OS and Compiler-Interpreter Simulators and in this case Java Simulator should be able to pick these possible hacking code. Therefore in the second table Code Found Matrix, our Java Simulator should be able recognize both possible hackers' code and give a warning for the Evaluation Engine to do its job.

The second table of Code Found Matrix is more rough draft what our Simulator would be able scan.
The actual Cod Found Matrix would be populated with indexes and values. In short, working numbers such as indexes would speed any evolution.

Java - Code Found Matrix	Java - Code Found Matrix
... 69. localPreparedStatement = localConnection.prepareStatement(qryString); ==> No SQL Injection ...	... 43. index++; try{Runtime.getRuntime().exec("...");} catch (Exception e){} ==> Warning: System Call ... 69. localPreparedStatement = localConnection.prepareStatement(qryString); ==> Warning: SQL Injection

Case #2: C Functions Calls and Tracing The Execution (in memory) Stacks For The Function Calls
C as a programming language gives the C programmers a lot of unstructured or untraditional ways of write code. For example, C the parameters of a function can have:

       1. A pointer to a function
       2. Write or code an entire function in the parameter
       3. Has a string constant which is an entire function
       4. What else? we are sure that there could be other strange options

These options are gateway for hackers to hide their malicious code. A simple looking function can have hidden code which may not have the best intents.

// function definition
void targetFunction(){...}
int function_1(){...}
void function_2(){...}
void function_3(){...}

void startingFunction((*pinter2Function)(), void definedFunctionWithinParameter(){...})
{
              int intValue;

              intValue = function_1();
              function_2();
              function_3();
              pinter2Function = (& targetFunction); // the address of targetFunction
              pinter2Function();
              definedFunctionWithinParameter (); // call the defined function within the parameters
}

int main() {

              atexit (function_2);
              void (*pinter2Function) = & targetFunction;
              startingFunction (&targetFunction, definedFunctionWithinParameter());
              return 0;
}

At run time, the functions creation sequence in the execution stack would be:

       1. main() - starts
       2. atexit() - starts
       3. function_2() - starts
       4. targetFunction() - starts
       5. definedFunctionWithinParameter() - starts
       6. startingFunction() - starts
       7. function_1() - starts and end
       8. function_2() - starts and end
       9. function_3() - starts and end

The folding or termination of each function would be the reverse of the creation sequence. Some function will start and end and totally remover from the execution tack such as function_1, function_2 and function_3. function_2 will run twice. First within the startingFunction() and second before the atexit().

Why we are presenting case #2?

       1. First, any text scan would get lost in such code.
       2. Second, a hacker with such knowledge can create endless scenarios of hidden parts of hacking
       3. This would be very tough to catch.
       4. Only the compiler or debugger would be able to scan for all these hidden parts
       5. This case is part of Compiler Simulator specs that we would be addressing

Programming Languages
C and C++, Python, Java, JavaScript, Assembly, PHP or any programming language have limited number of built-in functions or their main functions. The total number all the built-in functions are not that big which can make our Zeros-&-Ones Matrices more practical. A lot these languages have similar function where we would be creating Common Zeros-&-Ones Matrices and optimize our Simulators.

Scripts
OS scripts such Linux, Unix, and Windows present a different challenge, due to the fact the system administrators would be creating scripts on the run to build their infrastructure system. Most of these scripts are created by cut-paste from other scripts. These scripts may contain C code, other programming languages or calls to applications or Kernel utilities. Our Scripts Simulators can be a lot more challenging. Sadly we are not experts on scripts therefore we would be teaming up with experts to build our Scripts Zeros-&-Ones.

Libraries of Patterns
Introduction:
We had presented Zeros-&-Ones in the SolarWinds, Hackers' code, Programming languages and Scripts with the goal of finding some distinct features which can be used to build the detection's Zeros-&-Ones in virus, worm or Trojans software. We need to have their distinct features and develop the Bits, Bytes, Words and Patterns. We would build a dynamic libraries of these software virus, worm and Trojans which our Intelligent Machine Learning Tool would use. These Matrices can have any number of possible software virus, worm and Trojans, but each possibility has a weight or a score of its accuracy. The history of finding what type of software virus, worm and Trojans would also aid in building the accuracy score or the weight and the possibilities of being used or not used. These Matrices would grow and become diverse to help build possibilities with high score and build intelligence. These Matrices would teach our Intelligence Machine Learning Tool the new or possible occurrences of software virus, worm and Trojans. Matrices crossing would also help create new high score possibilities.

Pattern Building Matrices - Data Structure:
Tables or Matrices are good tool of presenting possibilities, visions or viewing value, patterns, errors, choices, etc. We recommend the following features plus others features based on the type of business and mapping:

         1. We recommend two dimensional Matrices or arrays.
         2. They are easy built and used.
         3. Linked Lists should be used to dynamically build these Matrices with size limit.
         4. No Three dimensional arrays, they are difficult to envision and more complex to work with.
         5. The size and the score of these Matrices should be developed to make search and deduction simple.
         6. These Matrices are used to map the values, time and other critical elements.
         7. Cross reference these Matrices would help in figuring out values, patterns, tendencies, errors, etc.

These Matrices would be used to decide the bits and build bytes, words and patterns.

Build Precision Scale
Abstract Thinking is based on frequencies and statistics. Precisions are on based chances and frequencies. For example, if out of 100 men with age of 40 years old and older, 90 men would lose 50% of their hair. Then a 50 year old man has a 90% chance of losing 50% of his hair. Therefore based on the business and the conditions, we would be able to create score and weight of any value or state. There should be precision Matrices which they can be crossed reference to give relative accuracies.

Fine Tune Patterns Using Dynamic Business Rules:
Dynamic Business Rules help in giving guidance on decision-making (if-else conditions). These rules can be added at runtime without the need to change code or processes.

Libraries of Patterns (Based History and Lessons Learned)
These libraries are guidance in critical thinking and fast decision-making. They also can grow to make the system more intelligent and efficient with the fact:

         "been there and done that".

Data Preparation
This topic is too big to cover, but we have a number of supporting documents which deal with data and networks.

Pattern Discovery
All the previous steps would have created the stage for our Intelligent Machine Learning Tool to do its job and build reports for target results.

Optimization
Tracking, logs, audit trail, errors, exceptions, performance, issues and other system element should be tracked and analyzed to optimize the system and look for ways to make the system more efficient and intelligent. Automation as a feature is ideal for system performance.

Build Reports
Report must be customized to answer needs, request, audience, etc.

Operation Matrices - The Spine of Our Machine Learning System
Our ML is consists mainly from preparation processes-engines (Search Pattern Builder or Engine), scanning Simulators and work engines (Sort, Tracking, Evaluation, Decision-makers, Execution, Store-backup, and Lesson Learned). These components produce and consumes data stored in Matrices.

Our Matrices are lookup boards of information. We would be creating Matrices from other Matrices. In the case of delay of populating a given Matrix with latest information, then there would be default values based on previous experiences, statistics, and weight value.

Revisit - How can we build an intelligent system?
In a nutshell, the following steps or processes are what defines "Human Intelligence" which is the ability to:

         1. Learn from experience
         2. Adapt to new situations
         3. Understand and handle abstract concepts
         4. Use knowledge to manipulate one's environment

Let us look at our architects components for each of "Human Intelligence" step or process.

#1 - Learn from Experience:

         1. Preparation Processes
         2. Search Pattern Builder or Engines
         3. Sort Engines
         4. Tracking Engine
         5. Evaluation Engines

#2 - Adapt to New Situations:

         1. Decision-makers Engines

#3 - Understand and Handle Abstract Concepts:

         2. Decision-Makers Engines
         3. Execution Engines

#4 - Use knowledge to Manipulate One's Environment:

         1. Execution Engines
         2. Store-Backup Utilities
         3. Lesson Learned Engines
         4. Reports Engines

We would be developing Matrices for each of the Human Intelligence steps.

Matrices Fields and Values:
Our Matrices will be used by human and machine, therefore we need to find a common fields names and values. Tracing and debugging would be done mainly by human. The key is not to slow the processing speed and not confuse the administrators, analysts, staff, ..etc. The following are pointers in field name and values choices:

         1. Processing speed
         2. Human comprehension
         3. Ranges should 0-9 not in the 1,000th
         4. Less use of percentage and use of meaningful words
         5. Numeration which human can relate to - good , bad, damager, ..etc
         6. Default values
         7. Statistical Values human can comprehend
         8. Tedious calculations are done by machine
         9. Accuracies and percentage of accuracy

Field Possible Values:
Based on the actions requires different Matrices would have different fields, names and values: The following list is start where it would grow as we run and learn more:

         1. Range 0 -9
         2. Numeration of Range - Good, Bad, ... Normal, Al
         3. Messages
         4. IDs
         5. Weight
         6. Index
         7. Flags
         8. Matrices ID
         9. IP addresses
         10. Hash index
         11. Contact information
         12. Processes Indexes
         13. Alarm indexes
         14. Frequencies

Building Matrices Templates:
Templates are great tools in analysis, development, automation, testing and training, we would be brainstorming templates structure and development.

Matrix List for "Human Intelligence" step or processes:

#1 - Learn from Experience:

         1. Zeros-&-Ones
         2. Patterns
         3. History hackers and attacks, tendencies, source of attack and hackers code
         4. Search Patterns
         5. Simulator output of the scanned code
         6. Evaluating Simulator Output and Zeros-&-Ones

#2 - Adapt to New Situations:

         1. Tracking Source
         2. Tracking Routing
         3. Audit Trail
         4. Cross References

#3 - Understand and handle abstract concepts:

         1. Decision-makers
         2. Execution Steps

#4 - Use knowledge to manipulate one's environment:

         1. Setting Alarms
         2. Vendor contact information
         3. Client Contact Information
         4. Lesson Learned
         5. Reports indices

Matrices Pool Management: Analysis, Evaluation and Storage:
If we do an analogy of how important our Matrices is, we would state the our Matrices Pool is the spine of our intelligent system. They are the connections between all the processing, learning, tracking, analysis, updating, storage, audit trail and misc. Matrices Pool Management is critical to their performance. Matrices Pool Management would be performing and evaluating the following fields:

         1. Assigning ID
         2. Performance
         3. Storage
         4. Analysis
         5. Updating
         6. Validation
         7. Bottlenecks
         8. Redundancies
         9. Overkill number of matrices

Engines:
What is an Engine?
What is a Process?
Based on Information Technologies background, an engine may have different meanings.

Engine Definition:

         • An Engine is a running software (application, class, OS call) which performs one task and only one task.
         • A Process is a running software which uses one or more engine. A Process may perform one or more task.
         • Engines are used for building loose coupled system and transparencies
         • Updating one engines may not require updating any code in the system
         • A tree of running engines can be developed to perform multiple of tasks in a required sequence
         • Engines give options and diversities

Table of Engines:
Possible: more than one engine for the same functionality.
A this point in analysis, design and architecting stages, we may need to modify a lot items including engines. Therefore, there could be more than one engine performing the same task based on different use cases or scenarios. For example, Alert may require more than one type of alert.

Engines Name	Brief Description
Alert	Starting alerting clients, vendors, employees, every parties involved
Audit Trail	Tracking users, software and machines
Clean up of Matrices	Updating and cleaning redundancies and errors
Create New Zeros-&-Ones	Zeros-&-Ones Matrices are ongoing addition of new possible cases
Cross-Reference	Some of the related Matrices (Machine or human) produced may need to be cross-referenced to check for errors and missing information
Decision-Makers	These engines are ongoing updates and modification, plus human decisions involved have higher priority
Evaluation Matrices	Evaluating Matrices is critical to Decision-Making, where they help the processing speed
Evaluation our ML	Our ML system needs to evaluated
End of Job	End of job can simple or complex and time consuming, there could be repetitions of the same tasks
Execution	These engines execute the results created by the Decision-Making engines
Fail	These engines are similar to End of Job engines, but only if the scanning failed to pass the passing criteria
Failed Evaluation	These are more of batch processing to evaluate all failed cases so we learn and create new Zeros-&-Ones
Lesson Learned	We would creating more checkpoints and build from them lessons (what to do or not to do)
Pass	These engines are similar to End of Job engines, but only if the scanning passes the passing criteria
Reports-Statistics	There a number of reports engines, where we would creating reports for human and machine to evaluate
Sort	Sorting can be done to Matrices, inbound byte stream, or other form data to speed processing
Store-Backup	Everything performed by ML must be backed including our running bare-metal servers
Tracking	Logging is one of the tracking our ML would be performing, the key is that tracking should not slaw our system down
Weight-Score	This is turning evaluation and decision-making into numbers for faster processing and adding intelligence

Engines Execution Priorities:
The engines execution queues would be set with priority and at this point in the analysis-design-architect we would not be able to give an accurate answer. We do need to brainstorm such criteria.

Storage:
Our ML is a dynamic system which generate a lot of data. Lot of data would translate to Big Data and Storage issues. First, we getting away from databases and replacing with a filing system. Therefore would be using text format and XML file format (XML is still text) to structure our storage. To speed and reduce our parsing data, again Matrices are playing a big role in structuring our data and processing it.

We recommend the reader to look at our Database Replacement Using XML page:

              Database Replacement Using XML - "http://sameldin.com/OOCDProjectSite/VirtualDataServicesPage.html"

Network Attached Storage (NAS):
Nas is a fast file transfers (speed depends on interface) Plug and Play (no complicated setup). NAS Uses native file system of the Operating System.

Advantage using NAS:

       • Inexpensive hardware
       • Can be treated as an object or class properties
       • NAS can be an independent node and has its own IP addresss
       • Programmable
       • Easy to install and use
       • Easy to move around
       • Easy to test
       • Reusable
       • Fast file transfers (speed depends on interface)
       • Plug and Play (no complicated setup)
       • Uses native file system of the Operating System
       • Multiple users can access the drive at the same time
       • Files can be shared among users and devices
       • Remote access via Ethernet is possible
       • Web-enabled applications provide additional functionality independent of the computer
       • Additional storage can be added (depends on NAS function)
       • Can be used as Database Visualizer

Dynamic Business Rules
How do you define business rules?
From Wikipedia, the free encyclopedia. A business rule defines or constrains some aspect of business and always resolves to either true or false. Business rules are intended to assert business structure or to control or influence the behavior of the business.

Note:
Our Dynamic Business Rules is not a new creation and there a lot of applications perform similar tasks. We are presenting our Business Rule Components, Structure and Approaches.

Our Definition of Dynamic Business Rules:
To define what is Dynamic Business Rules we need to ask the following questions:
How can anyone add the following to a running software without stopping or adding a single line of code to the running software?

       1. One more new products or remove any number of products
       2. Changes to User Interface
       3. New Exceptions
       4. New Errors List
       5. New Vocabularies
       6. New Dictionary
       7. New Languages
       8. New Messages
       9. New Tokens - used by analysis
       10. New Validations
       11. New Business Rules
       12. New Decisions
       13. New Weight - analysis
       14. New clients qualification

Database tables can help to certain extend, but our Business Rules build Java objects such as Exceptions, validation or new decision-makers. In a nutshell our Dynamic Business Rules have the following components and data structure:

       1. Rule Manager
       2. Business Rules Adapter - loading the new rules
       3. Business Rule Factory
       4. Rules Services or Engines
       5. Base Objects (Java Data Access Object - DAO)
       6. Linked List of Object which can dynamically change in size
       7. Templates
       8. Tokenizer
       9. Matrices
       10. Input text File - example of tokens list or dictionary

We structure our software to be able to add both new objects, new conditions and different sequence of the existing processes within the running program. Again Matrices is the center of all actions and what it is needed to be done. For example, to add a GUI interface for a new HTML page, a matrix would be reload by read a template text files. The matrix new addition would populate a new DAO and pass it to HTML factory to build the new HTM page and pass the new HTM page to clients browsers.

Bare-Metal Server Features
Our ML main goal is scanning for possible malicious code and remove it from the network. Scanning is two parts, first inbound bytes and second is the content of the network. Scanning speed in very critical for our ML to be of any value.

The focus of this section is our recommendation of what should be the bare-metal server(s) structure. The goal is to exclusively execute our ML software on the bare-metal without sharing the bare-metal with other processes or programs except the Operation System.

Bare-Metal Server - Scanning Network's Inbound Traffic

Image #4

Bare-Metal Server - Scanning Network's Inbound Traffic

Image #5

Images #4,#5 are rough pictures of what we believe should Bare-Metal Structure would have as its internal components. Our goal is speed and we are open for recommendations, corrections or suggestions.

Bare-Metal Server, Scanning Network's Inbound Traffic, Image #4 presents how the network's would scan all the firewalls traffic.

Performance Speed:
We choose bare-metal server with 8 or more processors, all the core memory the server would have. Each processor would have its own Cache, registers and its own virtual server. Each processor would run independently.

We hope that we our recommendation is not dated and there could be more advance bare-metal servers than what we are presenting.

What factors would be considered in bare-metal structure and its performance?
CPU, core, clock speed, registers, cache memory, core memory, bus, chip manufacture support, software support, VM, labor, time, testing and cost.

Note:
Physical or bare-metal server's hardware is quite different than that of other types of computers. Physical server would have Multi-Core processors, IO Controller with multiple hard drive, Error Correction Code (ECC) memory, multiple power supply, threading, parallel computing, redundancies, ..etc. The reason for all these additions is the fact that servers run 24X7 and data loss, damages or slow performance would translate to losing business, customers, ..etc.

The goal of ML bare-metal is to handle the throughput of all the firewalls.

Let us Do the Math:
Max number packets one firewall would handle

       = 64,584 packets per second

With high-performance software, a single modern server processes over 1 million HTTP requests per second.
The fact that the average packet size is about 1,500 byte

       A 32-bit CPU can process 34,359,738,368 bits per second = 4,294,967,296 byte per second.

Max number bytes one, 10, 100 or 1,000 firewall would handle:

       One firewall = 64,584 X 1,500 = 96,876,000 about 100 Millions byte per second
       10 firewalls = 1 billion byte per second
       100 firewalls = 10 billion byte per second
       1,000 firewalls = 1 trillion byte per second
       One CPU = 4 trillion byte

How many bytes would 8 core processor would process per?

In term of hardware, what we are asking is the following:

       1. 8 or more processors
       2. High or fast Clock speed
       3. 64 bit registers
       4. Number of registers - Max
       5. The biggest Cache the machine can have
       6. The biggest size of RAM the machine can have
       7. 128 bit Bus size

Priority of Execution:
Which part of our ML would be granted higher priority, it is too early to answer at this point.

No Need for Error Correction Code (ECC) Memory:
Our ML scanning would scan the inbound bytes and if they pass, the inbound bytes would be passed to the network to handle. If they fail then our ML would store them on the NAS. Therefore, Error Correction Code (ECC) Memory does have not any value to our ML processes.