📘 Use Case
During various OIC projects, we identified recurring issues that impacted logging, error tracking, retry handling, and monitoring through tools like DataDog. Here’s a list of key observations and the solutions we applied.
1. Suppressed Error Details
-
Observation:
OIC sends only a generic error message to DataDog or external logs, hiding the actual root cause. -
Solution:
Capture the actualfaultMessage
in error handlers and send it to DataDog along with other details for better troubleshooting.
2. No Retry for Temporary Errors
-
Observation:
Transient connectivity or network issues fail immediately without any retry attempt. -
Solution:
Add retry logic using fault handlers or scopes for specific error types (like connection timeouts or 5xx errors).
3. Missing Correlation ID for Fusion Failures
-
Observation:
When a Fusion ESS job fails, the logs don’t include any identifier like the ESS Job ID or request ID, making it hard to trace. -
Solution:
Extract and log the ESS request ID or other correlation IDs from Fusion and include them in your custom logs.
4. Payload Not Validated
-
Observation:
OIC flows sometimes try to process empty or null payloads, which leads to schema errors or misleading messages. -
Solution:
Add condition checks early in the flow to verify if payloads contain data before proceeding to mappings or invokes.
5. Only Errors Logged, Not Success
-
Observation:
DataDog or similar tools receive only error logs, and successful integrations are not tracked, affecting KPI reporting. -
Solution:
Log success cases as well, including important business identifiers like invoice number, PO number, or employee ID for better tracking.
6. Integration Timeout Not Handled
7. Overuse of Hardcoded Values
8. No Archival or Logging of Request Payloads
9. Overloaded Error Handlers Catching Everything
10. Lack of Version Control or Documentation
11. Poor Use of Data Stitching (Unnecessary Variables)
12. Integration Not Idempotent
🎯 Outcome
Implementing these improvements helped us:
- Get full visibility into success and failure cases
- Reduce debugging time
- Improve monitoring accuracy in tools like DataDog
- Increase reliability of integrations with retry logic
No comments:
Post a Comment