Latest Submission Grade 100%
1.
Question 1
What are the three key components we should consider when serving an ML Model in a production environment? (Select all that apply)

1 / 1 point

> An interpreter

Correct
Right on track! An Interpreter encapsulates a pre-trained model in which operations are executed for inference.


An orchestrator


> Input Data

Correct
You’ve got it!  The model executed on-device makes predictions based on the input data.


> A model

Correct
Correct! Providing the algorithm and training the ML model is the first step towards putting it into production.

2.
Question 2
What happens after a while in operation to an offline-trained model dealing with new real-live data?

1 / 1 point

The model adapts to new patterns.


The model abruptly forgets all previously learned information.


> The model becomes stale.

Correct
Good job!  The model performance deteriorates to the point of the model not being any longer fit for purpose. This phenomenon is called model decay and should be carefully monitored.

3.
Question 3
In applications that are not user-facing, is throughput more critical than latency for customer satisfaction?

1 / 1 point

No, because users might complain that the app is too slow.


> Yes, in this case, we are concerned with maximizing throughput with the lowest CPU usage.

Correct
Correct! Latency is not a key concern for back-end services.

4.
Question 4
Nowadays, developers aim to minimize latency and maximize throughput in customer-facing applications. However, in doing so, infrastructure scales and costs increase. So, what strategies can developers implement to balance cost and customer satisfaction? (Select all that apply)

1 / 1 point

> GPU sharing

Correct
Nailed it! This strategy reduces the cost of GPU-accelerated computing.


> Multi-model serving

Correct
Yes! This approach scales back infrastructure.


> Optimizing inference models

Correct
Right on track! Optimization modifies a model to handle a higher load, reducing costs as a result.


Stress testing