Sam's Machine Learning Analysis, Data Structure-Architect© |
---|
Sam's Machine Learning Analysis, Data Structure-Architect
Table of Contents: • Introduction • Our Machine Learning Detection Approach • Our Machine Learning Detection Strategies • Scanning Buffer(s) • OS and Compiler-Interpreter Simulators • Estimating the Total Number of Zeros-&-Ones Search Patterns • Detection Zeros-&-Ones Data Structure • Sample Analysis of Zeros-&-Ones Matrices • SolarWinds Hacking Lessons - Hidden in Plain Sight • Hackers' Code • Programming Languages • Scripts • Libraries of Patterns • Operation Matrices - The Spine of Our Machine Learning System • Engines • Dynamic Business Rules • Bare-Metal Server Features Introduction We recommend that the readers need to check our Machine Learning Page. Not long ago, we posted a page in SamEldin.com website on how to build business processes as Zeros-&-Ones to start an intelligent automation processes for business. We called it Dynamic Business Rules: http://sameldin.com/SamQualificationPages/MachineLearning.html http://sameldin.com/DynamicBusinessRules/index.html The main goal of this page is to analyze-design-architect our Machine Learning (ML) structure (software and hardware). Our ML would be addressing Cybersecurity detection issues. Our focus in this page is networks' side. The goal is securing networks from external and internal hacking. Internet Protocol There is a number of internet and/or networks' communication protocols which the networks and their users would use to perform the requested services:
Scanning Issues: 1. There is almost endless number of files which need to be scanned 2. The files types and sizes are numerous 3. Any file can have malicious code 4. Scanning speed and system performance must be addressed 5. The ever-changing hackers tactics and tools 6. Lack of education of networks' users on how to help in the networks' detection and the prevention 7. Most of the detection and the prevention software lacks intelligence-automation 8. Dependencies on Cybersecurity vendors to perform detection and prevention 9. Hackers use Reverse Engineering, AI and ML to add to their arsenal of attacks 10. Lack of experience of Cybersecurity staff and processes 11. Lack of experience of Cybersecurity management and processes 12. Most of detection testing are done by using free, open-source and vendors' tools which adds more risks 13. Management is not willing to take the risks of building their own security tools and software 14. Management rather passes the security risks to security vendors 15. Detection is not a science, but a guessing game Other issues: 1. Internal hacking 2. Human factor 3. Old networks, system and old software which are not well maintained and are open gates for hackers 4. Rollback 5. Recovery 6. Backup 7. Audit trail Our Machine Learning Detection Approach Pros and Cons of Our Machine Learning Tools: Pros: To make life short, Machine Learning (ML) would be running in the background of any software or system. ML would be the added intelligence and automation to these systems. ML would perform all the background support plus most of the tedious analysis and/or calculations. The background of the following system would implement ML: 1. Cybersecurity Detection 2. Reverse Engineering 3. DevOps 4. DataOps 5. Integration 6. Customer Relationship Management (CRM) 7. Management 8. Big Data 9. Data Services - Data Storage - Data Exchange 10. Customer Answering Services 11. Email and Email security 12. Software Analysis 13. Documentation 14. Training 15. Software Testing 16. Building and Migrating Data Centers 17. ..etc Our approach to ML is to build Zeros-&-Ones of the target item(s) such as Cybersecurity detection. We would be looking at the hacker's code, their distinct features, hackers' habits, programming tools, thinking, ..etc and from these we would be building the Zeros-&-Ones. We would be using the Zeros-&-Ones to develop the Bytes, Words and the Scanning Patterns. Cons: We believe we are ahead of all the existing ML researches and all the world ML tools. Sadly the ML experts are not willing to admit that their work is nothing more than a guessing game, where they are trying different algorithms. Then, if all fail, they would fix the data to fit the algorithms or their search patterns. Not to mention, they are using vendors or someone else's tools. Their work is Not a Concrete Science. We have to admit also that we lack resources and finance, but we do not lack experience, energy and innovation. Therefore, we are creating everything from scratch. We have no choice except to outsmart the hackers and the big Cybersecurity vendors. Our View of Networks' Detection Issues: Image #1 Looking at Image #1, all the OSI seven layers or TCP/IP five layers can be hacked. Our main concern in this page is the packet's data. All the internet inbound traffic (digital) is composed of streams of bytes and any stream could be a possible carrier of hackers' code. Therefore, our Machine Learning tools have the responsibility of: 1. Find the malicious bytes of code - fast 2. The speed of finding it is critical 3. The types 4. The count 5. Actions needed 6. The source 7. Track it 8. Audit trail it 9. Warning all involved parties 10. Lessons learned These catergories would be used as our Zeros-&-Ones . Our Machine Learning Detection Strategies Looking at Image #1, scanning the inbound byte streams is an overwhelming task to any scanning tool: • The malicious code can be embedded in any stream and at any position in the stream • Possible Variation of code • Tactics used • The attacks can be relentless Hacking Scenarios: • ".exe" hackers' code in middle of files (images, PDF, HTML, text, emails, ..etc) • Self-extracting zipped code in the middle of files (images, PDF, HTML, text, emails, ..etc) • Hidden code in DLLs • Cross-site scripting (XSS) attack Our Strategies: • Build speedy detection with flexibility to tackle any variation • Find hackers code at the start of every gate • Develop scanning processes which can be implemented in any sequence • Tracking • Audit trail • Evaluate every attack • Self-evaluating processes • Build history of processes (successful or not) • Build fast bare-metal structure for fast scanning • Automation of all the detection processes How can we build speedy a detection with flexibility to tackle any variation? Our Machine Learning Detection Components are: Our architect-design has robust components for continuous scanning. In the case of our scanning encounters a difficult, time consuming or a new case, the scanning would be moved a Dedicated Virtual Testing Server to handle the issues separately. Our Crashes rollback is nothing more than moving the production IP address to the Virtual Rollback Server. Image #2 Looking at the Image #2, we have three virtual servers or subsystems. The 2,000 foot view of our ML has the following major subsystems:
ML Components - Image #2: 1. Scanning Buffer(s) 2. OS and Compiler-Interpreter Simulators 3. Libraries of Patterns 4. Dynamic Business Rules 5. Sort Engines 6. Tracking Engines 7. Operation Matrices - Matrices Working Pool 8. Matrices Pool Management 9. Evaluation Engines 10. Decision-Maker Engines 11. Execution Engines 12. Reports-Statistics Engines 13. Storage Utilities 14. Storage - NAS 15. Virtual Testing Server 16. Virtual Rollback Sever How can we build an intelligent system? In a nutshell, the following steps or processes are what defines "Human Intelligence" which is the ability to: 1. Learn from experience - (Zeros-&-Ones to develop the Bytes, Words and the Scanning Patterns) 2. Adapt to new situations - (Scanning and tracking) 3. Understand and handle abstract concepts - (Matrices for tracking and processing) 4. Use knowledge to manipulate one's environment - (Engines to run the system) Our Machine Learning Components Functionalities: To architect-design an intelligent system which dynamically learns as it runs, we need to architect-design independent components and communication lookup boards of matrices with values. Each component is architected-designed to:
Scanning Buffer(s) The main objective of buffers is to speed and balance processes and help input streams from being blocked or slowed down. All the firewalls would be sending their output to our ML Scanning Buffer, where our ML starts processing. Buffers can also be dumped to backup system. OS and Compiler-Interpreter Simulators The task at hand is not trivia nor joke, the following is actual values from actual events:
We estimate 2.3 terabytes per second is equal to 1 billion packet per second. We need to remind our readers that our focus is the networks' side detection. What are our ML goals? • Scan every incoming byte • Narrow our focus on executable statements by OS, Compiler-Interpreter, Shell script, ..etc • Build an Intelligent Automated Virtual Integrate-able Cost-Effective System - see Images #2 Architecting-Designing Our OS-Compiler-Interpreter Simulators what is Our OS and Compiler-Interpreter Simulator? Our OS and Compiler-Interpreter Simulator is a software tool (we will develop) which mimics OS and Compiler-Interpreter without the overhead of qualifying and running every statement. In short, it is an OS, a complieror or an interpreter which parses the executable statement to know the names of built-in functions or the names of the commands used. It would send the names of the built-in function or the name of the command to a matrix and no further action would taken. Let us look at the following C programming statements and a few bytes from the content of a jpg image file as examples: total_1 += value; // math expression will not be flagged pointerToMyArray += 3; // pointer arithmetic will be flagged if (remove(fileName) < 0) // deleting or removing files will be flagged "ÿØÿà JFIF x x ÿþ LEAD Technologies Inc. V1.01 ÿÛ" // Bytes from the content of jpg image file will not be flagged Therefore, the speed of our Simulators' execution would be as fast as the processors and registers speed. The matrices and scanned byte would loaded in Cache and core memory for execution speed and No slow IO calls. Matrices would be copied to the Working Matrices Pool located in Core memory for speed. The system engines would get their own copies and the main Matrices would be saved to NAS for further analysis. Storage Utilities Engines would be performing the storing of the Matrices. Note: We would team up with OS-Compilers vendors to build our OS and Compiler-Interpreter Simulators. We would develop the specs documents for our Simulators and these documents would be provided at later time. In the case we would not find OS-Compilers vendors to work with us, then we do have the background and experience to develop our own. C language would be our development choice. #3 Image Image #3 has the a rough draft of how our OS and Compiler-Interpreter Simulators would scan the inbound networks traffic. The firewalls would be sending their output to a buffer and Sort Engine would sort the inbound traffic according what possible inbound stream data (files, command, images, ..etc). We will be building a number of Simulators based on OS, Compiler-Interpreter, Shell script, ..etc. Based on the inound data type, different type of Simulator would be scanning the proper inbound type. The main and the only one job of the Simulator is to find code (malicious or not) stores it in Simulator's output matrices. Simulators will be running on dedicated bare-metal servers. The main objective of hackers' code is to be executed in the computer memory by the operation system. Therefore, our approach is: 1. Build an OS and Compiler-Interpreter Simulators 2. Simulators should be run by dedicated bare-metal with speedy structure 3. Simulator Execution would be running on the hardware level - see bare-metal section 4. Run all the incoming stream as if each stream is a running program 5. Simulator would be doing the OS and/or compiler job 6. Simulator will track any real code and write it to a tracking matrices for independent evaluation 7. Simulator will ignore any non-executable text, images or any data 8. All the tracking matrices will be evaluated for speedy action by another bare-metal server Rough Draft of Our OS and Compiler-Interpreter Simulators Specifications The question is: How to find an executable statement in the inbound stream of bytes fast and not to slow the system performance? Our approach is: Every inbound byte is guilty until it is proven innocent Therefore, we need to build a Our OS and Compiler-Interpreter Simulators which mimics OS and Compiler-Interpreter without the overhead of qualifying and running every statement. For example a jump - "goto" statement as in the following Code Segment #1: startAgain: while connect(mySocket, (struct socketAddress *) &serverAddress, sizeof(serverAddress) != 0){ sleep(60); goto startAgain; } Code Segment #1 We need to know the actual byte or text presentation as it is written in the following: • Text • Object code • Byte code • Assembly • or any executable code which would run in the computer memory Any of the actual representation is our Zeros-&-Ones Criteria in scanning inbound stream of bytes. Therefore our Simulators would be looking for every possible presentation of the jump statement "goto". Our ML also needs to distinguish between legitimate and illegitimate code. Note: Our task is not simple, for example, a Java or C "switch" and also shell script "case" statements would have a number jump statement within their structure. Therefore, we need to build a base data object (Data Access Object in Java) which would be used to create the search patterns our ML would be scanning for. Now our "goto" statement does have more than one pattern based on the programming languages and commands. The next questions would be: • The number of possible Zeros-&-Ones • Variations of each Zeros-&-Ones • ..etc The Zeros-&-Ones is the start, but we would be building scanning patterns from Zeros-&-Ones, hackers' code, hackers' tendencies, approaches, ..etc. Our focus would be these patterns plus we would be adding more patterns as we run our ML and learn more. Let us look at the types of Operating Systems (OS) and Compiler-Interpreter of programming languages. Types of Operating Systems (OS): OS is a set of programs and each program is composed of statements. There are a number of OS based on the type of services their performance. The following is a set of OS types: 1. Batch OS 2. Distributed OS 3. Multitasking OS 4. Network OS 5. Real-OS 6. Mobile OS 7. Hypervisor 8. Routers - A router is an Embedded system category, it is a computer system with limited number of tasks 9. Browser - the browser is the executor of code similar to OS Compiler-Interpreter of each the following languages: 1. Linux, Unix, and Windows scripts 2. C and C++ 3. Python 4. Java 5. JavaScript 6. Assembly Instruction 7. PHP Estimating the Total Number of Zeros-&-Ones Search Patterns At this point in the Analysis-Design-Architect, we can give a good estimate of all the possible Zeros-&-Ones Search Patterns. Let us examine the following facts. Statements and Functions: The number of Zeros-&-Ones would be close to any OS, programming languages or scripts built-in functions, macros and commands. For example, our Zeros-&-Ones for C language would what the C complier would be using: abort, abs, acos, asctime, asin, assert ... tan, tanh, time, tmpfile, tmpnam, tolower, toupper, ungetc, va_arg, vprintf, vfprintf, and vsprintf We estimated to be little over 140 function-call and our prediction for the rest of the programming languages would be close. Our task is doable with far less search items than what we believe the security vendors searching tools are using. What is the total number of the build-in functions and commands which all the listed OS and Compiler-Interpreters? Let us look at the following: • Linux Kernel uses over 100 commands • Unix Kernel uses over 100 commands • C Language has about 140 built-in function Assumptions: Our estimate of all the OS and Compiler-Interpreters would be: 20 OS and Compiler-Interpreters * 200 build-in function and command = 4,000 Let us assume that each of the build-in function and command has 10 possible variations (on an average) Our estimate of the number of Zeros-&-Ones Search Patterns would be 4,000 build-in function and command * 10 possible variations = 40,000 Search Patterns The good news is Zeros-&-Ones Search Patterns is not in the millions plus each type OS and Compiler-Interpreters would work with only small portion of the 40,000 Search Patterns total. Detection Zeros-&-Ones Data Structure All the OS, programming languages and commands have a number of built-in functions or calls, which any programming statement would include these built-in function or commands. Our detection Zeros-&-Ones are these built-in functions and commands. Our ML scanning or search patterns are the variation of possible use of these built-in functions and commands. Data Structure - Java Data Access Object (DAO): We need to present a rough analysis of Zeros-&-Ones Data Structure, therefore we are using Java Data Access Object (DAO) as our building block for all possible Zeros-&-Ones. We are open to any recommendation and modification to our Detection Java DAO. Let us look at the following C functions which can be used to damage any network: int system(const char *string) - perform a system call or used to pass the commands to be executed by operating system int atexit(void (*func)(void)) - sets a function to be called when the program exits int remove(const char *filename) - erase a file Looking at the C functions listed, we need to ask what would our Java Data Access Object (DAO) design should be? Plus we need to ask the following questions about the fields types, names, values and their performance: Does our DetectionZerosOnesDAO object cover all the needed data? Can our ML would be able to process the DetectionZerosOnesDAO object fast? Are there redundancies? Does it using index for tracking? Does track with a Timestamp? Can human make sense out of fields types, names, values and performance? /** * * @author sameldin */ public class DetectionZerosOnesDAO {
... get and set methods } // end of DetectionZerosOnesDAO Our OS and Compiler-Interpreter Simulators would create a Matrix with functions or commends found in the inbound byte stream. Each of the function or the command must be evaluated by Evaluation Engines. Evaluation Engines will create a Matrix for Decision-Maker Engines to pass-or-fail of the function or command. Our Java DetectionZerosOnesDAO object is designed to track every possible data about that function or command. It would have all the needed information for Execution Engines and Analysis to perform their tasks. let us look at the following function and values which would be loaded in DetectionZerosOnesDAO object. int remove(const char *filename) - erase a file public class DetectionZerosOnesDAO {
... get and set methods } // end of DetectionZerosOnesDAO ID Numbers: Using integer numbers (Long Integer) as ID helps in speeding the processes plus, it give a large range of number to choose from. Math expression as "div", "mod" and integer bit shit can help in the selections. Therefore, all IDs used have an ID for the ID number itself. For example, detection ID, the highest digits are 99. We are only suggesting an ID system, but we are open to other ideas. detection_ID = 99345678; // Detection ID number - starts with 99 Pros and Cons of Zeros-&-Ones : The architect-design must address that fact we do not want the number of Zeros-&-Ones get out of hands and scanning would take forever to do the detection. Cons: Let us look at the following two examples. Example #1 - My Laptop: We installed a new antivirus software and the internet access, the laptops performance took a dive and every software (local to my machine or web) took a few seconds if not minutes to start. We were tempted to remove such antivirus due the time delay-wasted. Example #2 - PHP: We do not have any experience with PHP, but we got a copy of their built-in functions. Sadly, the number of these built-in function is Hugh and our PHP's Zeros-&-Ones would not be practical. We are seeking help with PHP experts to give us the language built-in functions which we believe would be similar to C, C++, Java, Assembly. We would be building our Zeros-&-Ones from the PHP built-in functions. Pros: The good news is Zeros-&-Ones Search Patterns is not in the millions plus each type OS and Compiler-Interpreters would work with only small portion of the 40,000 Search Patterns total. Sample Analysis of Zeros-&-Ones Matrices: We are looking for not only hackers' code, but also patterns, way of thinking, ..etc. The following are our attempt to capture all possible Zer0s-Ones. At this point in architect-design stage, we are running into the "Learning Curve" and with time and brainstorming with other experts, the goal is reachable and doable. 1. SolarWinds 2. Hackers' code 3. Programming languages 4. Scripts SolarWinds Hacking Lessons - Hidden in Plain Sight What is SolarWinds Hack (Orion)? Breifly, SolarWinds is a major software company which provides system management tools for network and infrastructure monitoring, and other technical services to hundreds of thousands of organizations around the world. Among the company's products is an IT performance monitoring system called Orion. In early 2020, hackers secretly broke into Texas-based SolarWind's systems and added malicious code into the company's software system. More than 30,000 public and private organizations, including local, state and federal agencies use the Orion network management system to manage their IT resources. As a result, the hack compromised the data, networks and systems of thousands when SolarWinds inadvertently delivered the backdoor malware as an update to the Orion software. SolarWinds Hacking Zeros-&-Ones Breakdown: The following is more of rough breakdown of our SolarWind search and we do need further studies. Functions 1. After an initial dormant period of up to two weeks, it retrieved and executed commands, called "Jobs" 2. The hackers added their code in such way that has the same style of code so no one would notice a difference 3. Used the same names and structure 4. Hide in plain site 5. List of function calls 6. Threads 7. Tree processes 8. Initialization 9. Hash functions for data 10. Hash functions for methods name calling Hashing 8. Initialization 9. Hash functions for data 10. Hash functions for methods name calling IP addresses and DNS 11. IP addresses 12. Hashing IP addresses, function names 13. IP Addresses located in Victim's Country 14. The attacker's choice of IP addresses was also optimized to evade detection 15. The attacker primarily used only IP addresses originating from the same country as the victim 16. They were leveraging Virtual Private Servers 17. The DNS response will return a CNAME record that points to a Command and Control (C2) domain 18. The malware masquerades its network traffic as the Orion Improvement Program (OIP) protocol 19. The C2 traffic to the malicious domains is designed to mimic normal SolarWinds API communications 20. After a dormant period of up to two weeks, the malware will attempt to resolve a subdomain of avsvmcloud-com. Date and Time, Stop and Sleep 21. Tracing calls 22. Date and timestamp 23. Sleep functions for days 24. Threads sleep functions Search 25. Search function OS Run 26. OS calls 27. Executed files 28. Rebooted the machine 29. Stop services from running 30. Disabled system services 31. Receive instructions for outside sites 32. Interrupt or stop services 33. Run in memory only which allowed the adversary to blend into the environment, avoid suspicion, and evade detection 34. Executed their payload and then restored the legitimate original file 35. They routinely removed their tools, including removing backdoors once legitimate remote access was achieved 36. They replaced a legitimate utility with their own 37. They similarly manipulated scheduled tasks by updating an existing legitimate task to execute their tools 38. Returning the scheduled task to its original configuration Zipping-Unzipping 39. Zipping and unzipping function calls 40. Compression and decompression File Transaction 41. Had the ability to transfer files 42. Temporary File Replacement and Temporary Task Modification 43. The attacker used a temporary file replacement technique to remotely execute utilities 44. The trojanized update file is a standard Windows Installer Patch file that includes compressed resources 45. Profiled the system 46. Associated with the update, including the trojanized SolarWinds. 47. Stored reconnaissance results within legitimate plugin configuration files allowing it to blend in Updates and Mimic 48. The backdoor used multiple block lists to identify forensic and anti-virus tools running as processes, services, and drivers 49. Orion.Core.BusinessLayer.dll component. Once the update is installed, the malicious DLL will be loaded by the legitimate SolarWinds. 50. BusinessLayerHost.exe or SolarWinds.BusinessLayerHostx64.exe (depending on system configuration) 51. They send new login names and passwords to their sites to gain access How a Cybersecurity Architect can use such case in architecting-designing any detection architect? SolarWinds Case is a great learning lesson which our Zeros-&-Ones matrices can use to add more Zeros-&-Ones plus a reinforcing to existing Zeros-&-Ones . Therefore we need to do the following: • Get if we can copies of the actual code • Create similar code • Test the code to see if we learn more and add more cases Hackers' Code In this section, our attempt is give a picture of how easy hackers can add their malicious code in plain sight and would pass the best of coding eyes. We are presenting two cases: • Java code with SQL call - adding SQL injection and a system call • C functions calls and tracing the execution (in memory) stacks for the function calls Reverse Engineering enabled hackers of turning executable code into source code. Hackers can apply Reverse Engineering to vendors' code such as DLL and get a copy of the source. They would be able to insert their malicious code and compiler the source back to DLL to be added to clients libraries as in the case of SolarWinds hacking. They also can modify the DLL files dates, permission and security information to be exact as the vendors' DLL would be. Case #1: Java code with SQL call - adding SQL injection and a system call The following table have example of Java code with proper syntax including a SQL call. We also present for sake of simplicity what hackers had added the following to the code: 43. index++; try{Runtime.getRuntime().exec("...");} catch (Exception e){} ... 69. localPreparedStatement = localConnection.prepareStatement(qryString); Line #43, is a simple increment and at the same line a system call made to the operation system which can be easily missed. As for 69, hack had appended SQL injection in the qryString, hackers can hard code the qryString with SQL injection.
Code Found Matrix Specs: The goal of our OS and Compiler-Interpreter Simulators and in this case, Java Simulator should be able to pick any possible format of hacking code. Our Simulator would be able to scan any code in: • Text • Object code • Byte code • Assembly • or any executable code which would run in the computer memory The goal of our OS and Compiler-Interpreter Simulators and in this case Java Simulator should be able to pick these possible hacking code. Therefore in the second table Code Found Matrix, our Java Simulator should be able recognize both possible hackers' code and give a warning for the Evaluation Engine to do its job. The second table of Code Found Matrix is more rough draft what our Simulator would be able scan. The actual Cod Found Matrix would be populated with indexes and values. In short, working numbers such as indexes would speed any evolution.
Case #2: C Functions Calls and Tracing The Execution (in memory) Stacks For The Function Calls C as a programming language gives the C programmers a lot of unstructured or untraditional ways of write code. For example, C the parameters of a function can have: 1. A pointer to a function 2. Write or code an entire function in the parameter 3. Has a string constant which is an entire function 4. What else? we are sure that there could be other strange options These options are gateway for hackers to hide their malicious code. A simple looking function can have hidden code which may not have the best intents. // function definition void targetFunction(){...} int function_1(){...} void function_2(){...} void function_3(){...} void startingFunction((*pinter2Function)(), void definedFunctionWithinParameter(){...}) { int intValue; intValue = function_1(); function_2(); function_3(); pinter2Function = (& targetFunction); // the address of targetFunction pinter2Function(); definedFunctionWithinParameter (); // call the defined function within the parameters } int main() { atexit (function_2); void (*pinter2Function) = & targetFunction; startingFunction (&targetFunction, definedFunctionWithinParameter()); return 0; } At run time, the functions creation sequence in the execution stack would be: 1. main() - starts 2. atexit() - starts 3. function_2() - starts 4. targetFunction() - starts 5. definedFunctionWithinParameter() - starts 6. startingFunction() - starts 7. function_1() - starts and end 8. function_2() - starts and end 9. function_3() - starts and end The folding or termination of each function would be the reverse of the creation sequence. Some function will start and end and totally remover from the execution tack such as function_1, function_2 and function_3. function_2 will run twice. First within the startingFunction() and second before the atexit(). Why we are presenting case #2? 1. First, any text scan would get lost in such code. 2. Second, a hacker with such knowledge can create endless scenarios of hidden parts of hacking 3. This would be very tough to catch. 4. Only the compiler or debugger would be able to scan for all these hidden parts 5. This case is part of Compiler Simulator specs that we would be addressing Programming Languages C and C++, Python, Java, JavaScript, Assembly, PHP or any programming language have limited number of built-in functions or their main functions. The total number all the built-in functions are not that big which can make our Zeros-&-Ones Matrices more practical. A lot these languages have similar function where we would be creating Common Zeros-&-Ones Matrices and optimize our Simulators. Scripts OS scripts such Linux, Unix, and Windows present a different challenge, due to the fact the system administrators would be creating scripts on the run to build their infrastructure system. Most of these scripts are created by cut-paste from other scripts. These scripts may contain C code, other programming languages or calls to applications or Kernel utilities. Our Scripts Simulators can be a lot more challenging. Sadly we are not experts on scripts therefore we would be teaming up with experts to build our Scripts Zeros-&-Ones. Libraries of Patterns Introduction: We had presented Zeros-&-Ones in the SolarWinds, Hackers' code, Programming languages and Scripts with the goal of finding some distinct features which can be used to build the detection's Zeros-&-Ones in virus, worm or Trojans software. We need to have their distinct features and develop the Bits, Bytes, Words and Patterns. We would build a dynamic libraries of these software virus, worm and Trojans which our Intelligent Machine Learning Tool would use. These Matrices can have any number of possible software virus, worm and Trojans, but each possibility has a weight or a score of its accuracy. The history of finding what type of software virus, worm and Trojans would also aid in building the accuracy score or the weight and the possibilities of being used or not used. These Matrices would grow and become diverse to help build possibilities with high score and build intelligence. These Matrices would teach our Intelligence Machine Learning Tool the new or possible occurrences of software virus, worm and Trojans. Matrices crossing would also help create new high score possibilities. Pattern Building Matrices - Data Structure: Tables or Matrices are good tool of presenting possibilities, visions or viewing value, patterns, errors, choices, etc. We recommend the following features plus others features based on the type of business and mapping: 1. We recommend two dimensional Matrices or arrays. 2. They are easy built and used. 3. Linked Lists should be used to dynamically build these Matrices with size limit. 4. No Three dimensional arrays, they are difficult to envision and more complex to work with. 5. The size and the score of these Matrices should be developed to make search and deduction simple. 6. These Matrices are used to map the values, time and other critical elements. 7. Cross reference these Matrices would help in figuring out values, patterns, tendencies, errors, etc. These Matrices would be used to decide the bits and build bytes, words and patterns. Build Precision Scale Abstract Thinking is based on frequencies and statistics. Precisions are on based chances and frequencies. For example, if out of 100 men with age of 40 years old and older, 90 men would lose 50% of their hair. Then a 50 year old man has a 90% chance of losing 50% of his hair. Therefore based on the business and the conditions, we would be able to create score and weight of any value or state. There should be precision Matrices which they can be crossed reference to give relative accuracies. Fine Tune Patterns Using Dynamic Business Rules: Dynamic Business Rules help in giving guidance on decision-making (if-else conditions). These rules can be added at runtime without the need to change code or processes. Libraries of Patterns (Based History and Lessons Learned) These libraries are guidance in critical thinking and fast decision-making. They also can grow to make the system more intelligent and efficient with the fact: "been there and done that". Data Preparation This topic is too big to cover, but we have a number of supporting documents which deal with data and networks. Pattern Discovery All the previous steps would have created the stage for our Intelligent Machine Learning Tool to do its job and build reports for target results. Optimization Tracking, logs, audit trail, errors, exceptions, performance, issues and other system element should be tracked and analyzed to optimize the system and look for ways to make the system more efficient and intelligent. Automation as a feature is ideal for system performance. Build Reports Report must be customized to answer needs, request, audience, etc. Operation Matrices - The Spine of Our Machine Learning System Our ML is consists mainly from preparation processes-engines (Search Pattern Builder or Engine), scanning Simulators and work engines (Sort, Tracking, Evaluation, Decision-makers, Execution, Store-backup, and Lesson Learned). These components produce and consumes data stored in Matrices. Our Matrices are lookup boards of information. We would be creating Matrices from other Matrices. In the case of delay of populating a given Matrix with latest information, then there would be default values based on previous experiences, statistics, and weight value. Revisit - How can we build an intelligent system? In a nutshell, the following steps or processes are what defines "Human Intelligence" which is the ability to: 1. Learn from experience 2. Adapt to new situations 3. Understand and handle abstract concepts 4. Use knowledge to manipulate one's environment Let us look at our architects components for each of "Human Intelligence" step or process. #1 - Learn from Experience: 1. Preparation Processes 2. Search Pattern Builder or Engines 3. Sort Engines 4. Tracking Engine 5. Evaluation Engines #2 - Adapt to New Situations: 1. Decision-makers Engines #3 - Understand and Handle Abstract Concepts: 2. Decision-Makers Engines 3. Execution Engines #4 - Use knowledge to Manipulate One's Environment: 1. Execution Engines 2. Store-Backup Utilities 3. Lesson Learned Engines 4. Reports Engines We would be developing Matrices for each of the Human Intelligence steps. Matrices Fields and Values: Our Matrices will be used by human and machine, therefore we need to find a common fields names and values. Tracing and debugging would be done mainly by human. The key is not to slow the processing speed and not confuse the administrators, analysts, staff, ..etc. The following are pointers in field name and values choices: 1. Processing speed 2. Human comprehension 3. Ranges should 0-9 not in the 1,000th 4. Less use of percentage and use of meaningful words 5. Numeration which human can relate to - good , bad, damager, ..etc 6. Default values 7. Statistical Values human can comprehend 8. Tedious calculations are done by machine 9. Accuracies and percentage of accuracy Field Possible Values: Based on the actions requires different Matrices would have different fields, names and values: The following list is start where it would grow as we run and learn more: 1. Range 0 -9 2. Numeration of Range - Good, Bad, ... Normal, Al 3. Messages 4. IDs 5. Weight 6. Index 7. Flags 8. Matrices ID 9. IP addresses 10. Hash index 11. Contact information 12. Processes Indexes 13. Alarm indexes 14. Frequencies Building Matrices Templates: Templates are great tools in analysis, development, automation, testing and training, we would be brainstorming templates structure and development. Matrix List for "Human Intelligence" step or processes: #1 - Learn from Experience: 1. Zeros-&-Ones 2. Patterns 3. History hackers and attacks, tendencies, source of attack and hackers code 4. Search Patterns 5. Simulator output of the scanned code 6. Evaluating Simulator Output and Zeros-&-Ones #2 - Adapt to New Situations: 1. Tracking Source 2. Tracking Routing 3. Audit Trail 4. Cross References #3 - Understand and handle abstract concepts: 1. Decision-makers 2. Execution Steps #4 - Use knowledge to manipulate one's environment: 1. Setting Alarms 2. Vendor contact information 3. Client Contact Information 4. Lesson Learned 5. Reports indices Matrices Pool Management: Analysis, Evaluation and Storage: If we do an analogy of how important our Matrices is, we would state the our Matrices Pool is the spine of our intelligent system. They are the connections between all the processing, learning, tracking, analysis, updating, storage, audit trail and misc. Matrices Pool Management is critical to their performance. Matrices Pool Management would be performing and evaluating the following fields: 1. Assigning ID 2. Performance 3. Storage 4. Analysis 5. Updating 6. Validation 7. Bottlenecks 8. Redundancies 9. Overkill number of matrices Engines: What is an Engine? What is a Process? Based on Information Technologies background, an engine may have different meanings. Engine Definition: • An Engine is a running software (application, class, OS call) which performs one task and only one task. • A Process is a running software which uses one or more engine. A Process may perform one or more task. • Engines are used for building loose coupled system and transparencies • Updating one engines may not require updating any code in the system • A tree of running engines can be developed to perform multiple of tasks in a required sequence • Engines give options and diversities Table of Engines: Possible: more than one engine for the same functionality. A this point in analysis, design and architecting stages, we may need to modify a lot items including engines. Therefore, there could be more than one engine performing the same task based on different use cases or scenarios. For example, Alert may require more than one type of alert.
Engines Execution Priorities: The engines execution queues would be set with priority and at this point in the analysis-design-architect we would not be able to give an accurate answer. We do need to brainstorm such criteria. Storage: Our ML is a dynamic system which generate a lot of data. Lot of data would translate to Big Data and Storage issues. First, we getting away from databases and replacing with a filing system. Therefore would be using text format and XML file format (XML is still text) to structure our storage. To speed and reduce our parsing data, again Matrices are playing a big role in structuring our data and processing it. We recommend the reader to look at our Database Replacement Using XML page: Database Replacement Using XML - "http://sameldin.com/OOCDProjectSite/VirtualDataServicesPage.html" Network Attached Storage (NAS): Nas is a fast file transfers (speed depends on interface) Plug and Play (no complicated setup). NAS Uses native file system of the Operating System. Advantage using NAS: • Inexpensive hardware • Can be treated as an object or class properties • NAS can be an independent node and has its own IP addresss • Programmable • Easy to install and use • Easy to move around • Easy to test • Reusable • Fast file transfers (speed depends on interface) • Plug and Play (no complicated setup) • Uses native file system of the Operating System • Multiple users can access the drive at the same time • Files can be shared among users and devices • Remote access via Ethernet is possible • Web-enabled applications provide additional functionality independent of the computer • Additional storage can be added (depends on NAS function) • Can be used as Database Visualizer Dynamic Business Rules How do you define business rules? From Wikipedia, the free encyclopedia. A business rule defines or constrains some aspect of business and always resolves to either true or false. Business rules are intended to assert business structure or to control or influence the behavior of the business. Note: Our Dynamic Business Rules is not a new creation and there a lot of applications perform similar tasks. We are presenting our Business Rule Components, Structure and Approaches. Our Definition of Dynamic Business Rules: To define what is Dynamic Business Rules we need to ask the following questions: How can anyone add the following to a running software without stopping or adding a single line of code to the running software? 1. One more new products or remove any number of products 2. Changes to User Interface 3. New Exceptions 4. New Errors List 5. New Vocabularies 6. New Dictionary 7. New Languages 8. New Messages 9. New Tokens - used by analysis 10. New Validations 11. New Business Rules 12. New Decisions 13. New Weight - analysis 14. New clients qualification Database tables can help to certain extend, but our Business Rules build Java objects such as Exceptions, validation or new decision-makers. In a nutshell our Dynamic Business Rules have the following components and data structure: 1. Rule Manager 2. Business Rules Adapter - loading the new rules 3. Business Rule Factory 4. Rules Services or Engines 5. Base Objects (Java Data Access Object - DAO) 6. Linked List of Object which can dynamically change in size 7. Templates 8. Tokenizer 9. Matrices 10. Input text File - example of tokens list or dictionary We structure our software to be able to add both new objects, new conditions and different sequence of the existing processes within the running program. Again Matrices is the center of all actions and what it is needed to be done. For example, to add a GUI interface for a new HTML page, a matrix would be reload by read a template text files. The matrix new addition would populate a new DAO and pass it to HTML factory to build the new HTM page and pass the new HTM page to clients browsers. Bare-Metal Server Features Our ML main goal is scanning for possible malicious code and remove it from the network. Scanning is two parts, first inbound bytes and second is the content of the network. Scanning speed in very critical for our ML to be of any value. The focus of this section is our recommendation of what should be the bare-metal server(s) structure. The goal is to exclusively execute our ML software on the bare-metal without sharing the bare-metal with other processes or programs except the Operation System. Image #4 Image #5 Images #4,#5 are rough pictures of what we believe should Bare-Metal Structure would have as its internal components. Our goal is speed and we are open for recommendations, corrections or suggestions. Bare-Metal Server, Scanning Network's Inbound Traffic, Image #4 presents how the network's would scan all the firewalls traffic. Performance Speed: We choose bare-metal server with 8 or more processors, all the core memory the server would have. Each processor would have its own Cache, registers and its own virtual server. Each processor would run independently. We hope that we our recommendation is not dated and there could be more advance bare-metal servers than what we are presenting. What factors would be considered in bare-metal structure and its performance? CPU, core, clock speed, registers, cache memory, core memory, bus, chip manufacture support, software support, VM, labor, time, testing and cost. Note: Physical or bare-metal server's hardware is quite different than that of other types of computers. Physical server would have Multi-Core processors, IO Controller with multiple hard drive, Error Correction Code (ECC) memory, multiple power supply, threading, parallel computing, redundancies, ..etc. The reason for all these additions is the fact that servers run 24X7 and data loss, damages or slow performance would translate to losing business, customers, ..etc. The goal of ML bare-metal is to handle the throughput of all the firewalls. Let us Do the Math: Max number packets one firewall would handle = 64,584 packets per second With high-performance software, a single modern server processes over 1 million HTTP requests per second. The fact that the average packet size is about 1,500 byte A 32-bit CPU can process 34,359,738,368 bits per second = 4,294,967,296 byte per second. Max number bytes one, 10, 100 or 1,000 firewall would handle: One firewall = 64,584 X 1,500 = 96,876,000 about 100 Millions byte per second 10 firewalls = 1 billion byte per second 100 firewalls = 10 billion byte per second 1,000 firewalls = 1 trillion byte per second One CPU = 4 trillion byte How many bytes would 8 core processor would process per? In term of hardware, what we are asking is the following: 1. 8 or more processors 2. High or fast Clock speed 3. 64 bit registers 4. Number of registers - Max 5. The biggest Cache the machine can have 6. The biggest size of RAM the machine can have 7. 128 bit Bus size Priority of Execution: Which part of our ML would be granted higher priority, it is too early to answer at this point. No Need for Error Correction Code (ECC) Memory: Our ML scanning would scan the inbound bytes and if they pass, the inbound bytes would be passed to the network to handle. If they fail then our ML would store them on the NAS. Therefore, Error Correction Code (ECC) Memory does have not any value to our ML processes. |
---|