Logging can be a verbose and noisy experience, especially if you aren’t familiar with the underlying codebase. It can also be the most powerful tool in your arsenal as a Jenkins Administrator. Most Jenkins users are reliant on the outcome of their project as a pass or fail; what if you could empower those users to self-service and reduce the cycle time to issue resolution?
Where do I find the logs?
As with any enterprise Java application, Jenkins outputs application-level logging as well as logging at the JVM level. Most Jenkins Administrators are familiar with the System Log that is accessible from the user interface by visiting Manage Jenkins > System Log
The System log provides visibility into java.util.logging output related to Jenkins. In CloudBees Jenkins Distributions, the Support Core Plugin which is installed by default, provides added logging to ${JENKINS_HOME}/logs/jenkins.log which makes sending log messages to a central log server easy. You can read more about default logging in the Jenkins Wiki .
Additionally, once you employ our documented best practices , there will be added logging endpoints around JVM activity, most notably:
-Xloggc:${PATH},-XX:ErrorFile=${PATH} and -XX:LogFile=${PATH}
Reading a stack trace
When things go wrong in Jenkins, you will see a stack trace that looks similar to this:
“What is all this mess?” you might say. Well, in short, this is the Java application responding in kind, “Look here buddy, this is exactly where the problem is.” Usually, a stack trace is shown when an Exception is not handled correctly in code. This may be one of the built-in Exception types or a custom Exception created by a program or a library.
Learning how to read a stack trace is a powerful tool to add to your toolbelt as a Jenkins Administrator, as it can show you a wide range of data including the line number in the codebase of where the problem originated.
Gather all the data
At times, as Jenkins Administrators, we can get caught up on error finality which oftentimes is the result of other errors that happened earlier during runtime. One common pitfall we see often is that we are provided the error, without the full stack trace that precedes the error. For instance, let’s look at the following build error that was provided to the Jenkins Administrator by an end-user that is reporting their build has failed:
ERROR: Step ?Archive the artifacts? failed: no workspace for Development/My_Dev_Job #2 ERROR: Step ?Publish JUnit test result report? failed: no workspace for Development/My_Dev_Job #2 ERROR: Step ?Record JaCoCo coverage report? failed: no workspace for Development/My_Dev_Job #2 Sending e-mails to: administrator@business.com Finished: FAILURE
Now, at first glance, we can see that the build is failing, and we see several ERROR messages. This is leading us to believe that the workspace this build is not available. In theory, this could lead us down several paths to diagnose the issue:
Storage
Networking
Host
Pipeline Code
So, how do we narrow down the issue? The first step is gathering the data in its entirety. In this case, as the Jenkins Administrator, I know that this error is coming from Build #2 of Development/My_Development_Job so I could gather the console logs there which may provide me some additional data.
BINGO! - in doing so, I am able to see the full stack trace that prefaced the errors I was provided:
Note that the error is FATAL , and the cause appears to be that the remoting channel has lost connection and has closed. From these console logs, I can gather the following data:
Time of the failure
Cause of the failure
Since I know this build issue appears to have occurred at 2020-01-22 20:06:58 , I should check the Jenkins logs for anything that seems suspicious around this time.
Sure enough, in looking at the Jenkins logs, I see the following:
2020-01-22 20:06:58.217+0000[id=124635]INFO hudson.slaves.ChannelPinger$1#onDead: Ping failed. Terminating the channel JNLP4-connect connection from ip-10-10-10-101.us-west-2.compute.internal/10.10.10.101:55555. java.util.concurrent.TimeoutException: Ping started at 1579723378215 hasn't completed by 1579723618216 at hudson.remoting.PingThread.ping(PingThread.java:134) at hudson.remoting.PingThread.run(PingThread.java:90)
Searching the CloudBees Support Knowledge Base , I found this article which describes that this error is caused by the Ping Thread, and in order to resolve the issue, it is recommended to disable the ping thread . Keep in mind I was only able to get to the root of the issue when I had all the data in-hand. Can you imagine how much time I just saved, for both myself and my end-users?
Log like a pro
Come join Alex Taylor and me on Wednesday, January 29th for the latest in our Jenkins controllerclass series where we dive further into these topics, as we discuss real-world cases where logging can be the equalizer in the battle to tame your CI/CD pipeline. In this free, 45 minute, expert-led presentation, learn how and when to approach the Jenkins logs without in-depth knowledge, where to go when you need additional logging, and how to read those logs so that you can gain understanding. You will walk away with a powerful skill set to use when there is no obvious root cause presented to you otherwise.