Bullet Proofing ECS Tasks

Sudheer Kumar
3 min readJul 19, 2023
Photo by Erik Odiin on Unsplash

I have written before the use cases and importance of load testing. It is an exercise that is mandatory to make sure that your application runs well under heavy loads. One way to test this is to use tools like JMeter.

Recently during one of our load testing we came across an issue that certain input data is causing a crash in the underlying ECS task. We are having an ECS deployment architecture where Node JS docker images are getting deployed.

Problem Statement

Because of the unhandled error in the ECS tasks, the tasks were abruptly getting terminated and in ECS, spawning a new task will take around 30 sec to 1 min and during that time, load on your existing running ECS tasks will increase. So it is always desirableto avoid crashing of an ECS task in an ECS service.

How to gracefully handle every bad response

How much ever test cases you write, still there is a possibility of occurring an exception in the process as a result of a bad response that can cause an exception in some part of the code. The best we can do is to handle every possible exception gracefully in the context of a NodeJS process.

Global Error Handler Middleware:

This is the middle ware that handles any exception in any of your route in app.js. So you can add this after all the routes are defined.

app.use((err, req, res, next) => {
// Handle the error gracefully
console.error('Unhandled Error:', err);
res.status(500).json({ error: 'Internal Server Error.. please try again.'});
});

For example, the following client errors can be caught by this middleware.
app.get('/errorRoute', (req, res) => {
throw new Error('This is an uncaught exception!');
});

Unhandled Promise Rejection:

process.on('unhandledRejection', (reason, promise) => {  
console.error('Unhandled Promise Rejection:', reason);
});

//The following code can cause to throw such a condition

const fs = require('fs').promises;

async function readFile(filePath) {
const fileContents = await fs.readFile(filePath, 'utf8');
return fileContents;
}

async function main() {
const fileContents = await readFile('nonexistent-file.txt');
console.log(fileContents);
}
main();

Since readFile returns a promise that can be rejected if the file doesn't exist, we need to handle any rejections by attaching a catch block to the promise. However, in this example, we don't handle the rejection, so it becomes an unhandled rejection.

Uncaught Exception:

There are some scenarios where an error may not be caught by a global error handler middleware, but will be caught by process.on('uncaughtException').

process.on('uncaughtException', (err) => {
console.error('Uncaught Exception:', err);
});

Here are a few examples:

  1. Syntax errors in module loading: If there is a syntax error in a module that is loaded during the startup of your application, the error may not be caught by your global error handler middleware because the module loading happens outside of the request/response cycle. In this case, process.on('uncaughtException') can still catch the error.
  2. Errors thrown from child processes: If your application spawns child processes using the child_process module, errors thrown by those child processes may not be caught by your global error handler middleware. In this case, process.on('uncaughtException') can still catch the error.
  3. Errors thrown during application shutdown: If your application encounters an error during the shutdown process, your global error handler middleware may not be able to catch the error because the middleware is no longer active. In this case, process.on('uncaughtException') can still catch the error.

So what will happen to your request from the client, if you end up in any of these handlers?

// Handle unhandled promise rejections
process.on('unhandledRejection', (reason, promise) => {
console.error('Unhandled Promise Rejection:', reason);
});

// Handle uncaught exceptions
process.on('uncaughtException', (err) => {
console.error('Uncaught Exception:', err);
});

No response will be sent and your request will be timed out. If it is AWS API Gateway, it will wait for the default 30 sec before timing out.

Conclusion

After adding these global error handlers, we were able to make sure that our process continues running even in the event of unhandled error conditions in the application code. It is an essential thing to have a stable and scalable system, irrespective of the coding environment you have chosen.

--

--

Sudheer Kumar

Experienced Cloud Architect, Infrastrcuture Automation, Technical Mentor