# Using Cache in Pipeline In CI, pipelines are often used to perform tasks such as compiling and building. In modern languages, whether it's Java, NodeJS, Python, or Go, dependencies need to be downloaded to execute the build tasks. This process often consumes a large amount of network resources, slowing down the pipeline build speed and becoming a bottleneck in CI/CD, reducing our production efficiency. Similarly, this includes cache files generated by syntax checks, cache files generated by Sonarqube scanning code. If we start running these processes from scratch every time, we will not be able to effectively utilize the cache mechanism of the tools themselves. The application workspace itself provides a cache mechanism based on `hostPathVolume` in K8S, using the local path of the node to cache default paths such as `/root/.m2`, `/home/jenkins/go/pkg`, `/root/.cache/pip`. However, in the multi-tenant scenario of DCE 5.0, more users want to maintain cache isolation to avoid intrusion and conflicts. Here, we introduce a cache mechanism based on the Jenkins plugin [Job Cacher](https://plugins.jenkins.io/jobcacher/). Through Job Cacher, we can use AWS S3 or S3-compatible storage systems (such as MinIO) to achieve pipeline-level cache isolation. ## Preparation 1. Provide an S3 or S3-like storage backend, you can refer to [Create MinIO Instance - DaoCloud Enterprise](https://docs.daocloud.io/middleware/minio/user-guide/create.html) to create a MinIO on DCE 5.0, create a bucket, and prepare `access key` and `secret`. ![Prepare S3](https://docs.daocloud.io/daocloud-docs-images/docs/amamba/images/job-cacher01.png) 2. In Jenkins, go to **Manage Jenkins** -> **Manage Plugins** and install the job-cacher plugin: ![Install Plugin](https://docs.daocloud.io/daocloud-docs-images/docs/amamba/images/job-cacher02.png) 3. If you want to use S3 storage, you also need to install the following plugins: ```yaml - groupId: org.jenkins-ci.plugins artifactId: aws-credentials source: version: 218.v1b_e9466ec5da_ - groupId: org.jenkins-ci.plugins.aws-java-sdk artifactId: aws-java-sdk-minimal # dependency for aws-credentials source: version: 1.12.633-430.vf9a_e567a_244f - groupId: org.jenkins-ci.plugins artifactId: jackson2-api # dependency for other plugins source: version: 2.16.1-373.ve709c6871598 ``` !!! note The Helm Chart provided by Amamba v0.3.2 and earlier corresponds to Jenkins version 2.414. It has been tested that this version of Job Cacher 399.v12d4fa_dd3db_d cannot correctly recognize S3 configuration. Please pay attention to using an upgraded version of Jenkins and Job Cacher. ## Configuration In the **Manage Jenkins** interface, configure the parameters for S3 as follows: ![Configure Plugin](https://docs.daocloud.io/daocloud-docs-images/docs/amamba/images/job-cacher03.png) Alternatively, you can modify the ConfigMap via CasC to persist the configuration. An example of the corrected YAML is as follows: ```yaml unclassified: ... globalItemStorage: storage: nonAWSS3: bucketName: jenkins-cache credentialsId: dOOkOgwIDUEcAYxWd9cF endpoint: http://10.6.229.90:30404 region: Auto signerVersion: parallelDownloads: true pathStyleAccess: false ``` ## Usage After completing the above configuration, we can use the `cache` function provided by Job Cacher in the Jenkinsfile. For example, in the following pipeline: ```groovy pipeline { agent { node { label 'nodejs' } } stages { stage('clone') { steps { git(url: 'https://gitlab.daocloud.cn/ndx/engineering/application/amamba-test-resource.git', branch: 'main', credentialsId: 'git-amamba-test') } } stage('test') { steps { sh 'git rev-parse HEAD > .cache' cache(caches: [ arbitraryFileCache( path: "pipeline-template/nodejs/node_modules", includes: "**/*", cacheValidityDecidingFile: ".cache", ) ]){ sh 'cd pipeline-template/nodejs/ && npm install && npm run build && npm install jest jest-junit && npx jest --reporters=default --reporters=jest-junit' junit 'pipeline-template/nodejs/junit.xml' } } } } } ``` This pipeline defines two stages, clone and test. In the test stage, we cache all files under node_modules to avoid fetching npm packages every time. We also define the .cache file as the unique identifier for the cache. This means that once there is any update in the current branch, the cache will become invalid and npm packages will be fetched again. After completion, you can see that the second run of the pipeline significantly reduces the time: ![Run Pipeline Log](https://docs.daocloud.io/daocloud-docs-images/docs/amamba/images/job-cacher04.png) ![Run Pipeline Result](https://docs.daocloud.io/daocloud-docs-images/docs/amamba/images/job-cacher05.png) For more options, you can refer to the documentation: [Job Cacher | Jenkins plugin](https://plugins.jenkins.io/jobcacher/) ## Other - **About Performance**: job-cacher is also implemented based on `MasterToSlaveFileCallable`, which is based on remote calls to directly upload and download in the agent, rather than the agent -> controller -> S3 way; - **About Cache Size**: job-cacher supports various compression algorithms, including `ZIP`, `TARGZ`, `TARGZ_BEST_SPEED`, `TAR_ZSTD`, `TAR`, with `TARGZ` as the default; - **About Cache Cleanup**: job-cacher supports setting `maxCacheSize` per pipeline.