# Paul Graham Essay Dataset ## CLI Usage You can download `llamadatasets` directly using `llamaindex-cli`, which comes installed with the `llama-index` python package: ```bash llamaindex-cli download-llamadataset PaulGrahamEssayDataset --download-dir ./data ``` You can then inspect the files at `./data`. When you're ready to load the data into python, you can use the below snippet of code: ```python from llama_index import SimpleDirectoryReader from llama_index.llama_dataset import LabelledRagDataset rag_dataset = LabelledRagDataset.from_json("./data/rag_dataset.json") documents = SimpleDirectoryReader( input_dir="./data/source_files" ).load_data() ``` ## Code Usage You can download the dataset to a directory, say `./data` directly in Python as well. From there, you can use the convenient `RagEvaluatorPack` llamapack to run your own LlamaIndex RAG pipeline with the `llamadataset`. ```python from llama_index.llama_dataset import download_llama_dataset from llama_index.llama_pack import download_llama_pack from llama_index import VectorStoreIndex # download and install dependencies for benchmark dataset rag_dataset, documents = download_llama_dataset( "PaulGrahamEssayDataset", "./data" ) # build basic RAG system index = VectorStoreIndex.from_documents(documents=documents) query_engine = index.as_query_engine() # evaluate using the RagEvaluatorPack RagEvaluatorPack = download_llama_pack( "RagEvaluatorPack", "./rag_evaluator_pack" ) rag_evaluator_pack = RagEvaluatorPack( rag_dataset=rag_dataset, query_engine=query_engine ) ############################################################################ # NOTE: If have a lower tier subscription for OpenAI API like Usage Tier 1 # # then you'll need to use different batch_size and sleep_time_in_seconds. # # For Usage Tier 1, settings that seemed to work well were batch_size=5, # # and sleep_time_in_seconds=15 (as of December 2023.) # ############################################################################ benchmark_df = await rag_evaluator_pack.arun( batch_size=20, # batches the number of openai api calls to make sleep_time_in_seconds=1, # seconds to sleep before making an api call ) ```