In Alpacked, besides technical skills we strive to develop and improve skills needed for effective management and HR operations.
We were impressed by the Site Reliability engineering
book by Google. They define an idea of Incidents reports there which we tried to adopt in Alpacked.
Now every mistake is an incident - we consider it as a case instead of blaming an employee. We try to determine the root cause of the incident, the type of employee involvement; the way in which he reacted on the incident, what kind of technical means were used to fix the issue. As soon as the analysis is completed, we proceed to the next step.
We were lucky enough to get familiar with a series of books written by Jocko Willink and Leif Babin. The Dichotomy of Leadership
is one of my favorites. One of the chapters stated the idea of the importance of a balance between clear standards and will for creativity. The main idea is that everybody needs to have an ability to independently solve issues, however, in case of unexpected emergency situations, it is recommended to refer to the previously created Standard Operating Procedures.
An adoption of this idea in Alpacked resulted in the following process: once analysis in step A is done and all the factors are investigated, the case is turned into an SOP.
For example, one of our e-commerce clients once got hacked. We found that out during a scheduled audit and first of all we tried to remove all outcomes of that break-in to stop the leak of the customer's data. During this process all the code/infrastructure changes done by hacker got wiped out, which made incident investigation totally impossible. As a result, we created the following security incident SOP:
- Take a note of time when the breach has been found or reported by the customer
- Notify your direct supervisor about the break-ins and consider it your highest priority until resolved
- Create an AMI of the EC2 instance for further investigation (if applicable)
- Create a copy of the whole project for further investigation
- Create a copy of the database for further investigation
- Immediately remove malicious code
- Identify the time interval during which the system had been compromised
- Double-check security logs and reports that were generated exactly before and after the break-ins in the following order
- SSH logins
- Check AIDE report
- Check if EBS volume snapshot exists
- Consider Web Server logs as your main source of information
- Check security reports
- Check if there is no injected JS into the checkout/add card process
- If JS and DB Changes monitoring wasn't working use the Wayback Machine to find out the approx. date of injection
- Identify the type of data that was compromised
- Structure your findings based on steps 8-10 and present it to your direct supervisor.
- Based on the findings figure out the way attacker compromised the system
- If this is a known attack vector that has already been mitigated by Alpacked, make sure to find a reason customer hasn't been protected yet
- If this a known attack vector and the attacker still was able to break in or if this is a previously unknown attack vector, create a document that describes it and suggest a solution that will block it from happening in the future.
- Fill the Incident report and share it with your direct supervisor