\cvsection{Experience} {\small \cvevent{Senior Data Scientist $\rightarrow$ Lead Data Scientist}{Dataside}{Oct 2023 -- Now}{São José dos Campos, Brazil (Remote)} \begin{itemize} \item Initially joined as Data Scientist; promoted to Lead Data Scientist from mid 2024 onwards. \item Mentored junior data scientists in ML, generative AI, and MLOps; led client engagement and solution design for successful project acquisition. \item Built Agentic RAG platforms for law and analytics with multi-format ingestion (PDF / Excel / Image / Text), and automated workflows (translation, summarization, legal analytics, NL database queries), reducing manual workload and costs. These RAG solutions combined hybrid search (embeddings + BM25) and metadata filtering with self-querying retrievers. \item As solution architect, designed and partially implemented an automated candidate resume evaluation system leveraging embeddings and reranking to match candidates to job descriptions. Integrated agentic workflows to auto-generate interview questions and dynamically rank candidates as evaluations progressed. The system continuously updated rankings and interview questions to address gaps identified in prior candidates, ensuring targeted assessments. Additionally, optimized an existing version for lower latency and reduced token usage. \item Developed advanced sales forecasting models with conformal prediction and quantile regression to minimize underestimation risk and control overestimation. \item Created hybrid classification systems (TF-IDF + LLM embeddings) for product categorization with calibrated probability rejection, improving accuracy and reducing revision time. \item Used the DsPy framework to build few-shot classifiers and RAG solutions from client databases, optimizing few-shot example selection for prompting via Bayesian search. \item Delivered agentic RAG solutions for structured data extraction from unstructured documents (publishers, law firms) using multimodal ingestion and table detection, achieving high accuracy. Leveraged hybrid search (embeddings + BM25 or TF-IDF), metadata filtering via self-querying retrievers, parent document retrieval, and structured outputs with Pydantic validation for LLM-based metadata enrichment. \item Built computer vision systems for health and food sectors: (1) liquid volume estimation from photos; (2) food tray detection/classification for automated consumption tracking. \item Additional: custom chatbots, knowledge extraction, clustering, and outlier detection systems. \item Tech stack: \begin{itemize} \item \textbf{Languages}: {\color{accent2}Python, Julia, Javascript, C\#, SQL, Bash} \item \textbf{AI/ML}: {\color{accent2}PyTorch, Lightning, MLflow, Scikit-Learn, Optuna, PyCaret} \item \textbf{LLM/NLP}: {\color{accent2}Azure OpenAI, LangChain, RAG, HuggingFace, CrewAI, PydanticAI, LangGraph} \item \textbf{Data}: {\color{accent2}Databricks, Pandas, Polars, NumPy, Dask, PySpark, Pinecone, Weaviate, Chroma, PostgreSQL, Redis, DsPy} \item \textbf{MLOps}: {\color{accent2}Azure, Docker, CI/CD, GitHub Actions} \item \textbf{APIs}: {\color{accent2}FastAPI, Flask} \item \textbf{CV}: {\color{accent2}OpenCV, Open3D, Scikit-Image, Shapely} \end{itemize} \end{itemize} \cvevent{Large Language Models Consultant}{Vortigo}{Nov 2023 -- Jun 2024}{Porto Alegre, Brazil (Remote)} \begin{itemize} \item As LLM consultant, I collaborated with a Brazilian tech company in the design and implementation of assistant ChatBots informed by proprietary source code and spreadsheet knowledge bases by leveraging OpenAI's paid API and pretrained Large Language Models such as GPT3.5 and GPT4. \item Tech stack: \begin{itemize} \item \textbf{Languages \& Core}: {\color{accent2}Python} \item \textbf{AI \& ML}: {\color{accent2}PyTorch, PyTorch Lightning, Pandas, Scikit-Learn, NumPy, Jupyter Notebooks, AWS Sagemaker} \item \textbf{LLM \& NLP}: {\color{accent2}OpenAI API, HuggingFace, LangChain, BertTopic} \end{itemize} \end{itemize} \cvevent{Sabbatical Period}{To focus on my generative art projects}{Apr 2023 -- Oct 2023}{Porto Alegre, Brazil} \begin{itemize} \item I took a short sabbatical period to focus on my generative art projects and mantain / improve existing Python packages I had built to help me in my artistic process, including \href{https://github.com/marcelopprates/prettymaps}{prettymaps} \end{itemize} \divider \cvevent{Machine Learning / Computer Vision Consultant}{ConstructIN}{Mar 2022 -- Apr 2023}{Porto Alegre, Brazil (Remote)} \begin{itemize} \item As a senior ML \& Computer Vision consultant, I spearheaded the design and implementation of advanced computer vision solutions for automated construction site monitoring using 360-degree photography, leveraging state-of-the-art deep learning architectures. \item Led a team of 3 data scientists in developing and deploying 4 production-ready computer vision applications, including a comprehensive analytics dashboard that enabled real-time construction progress monitoring through custom semantic segmentation models. The solution significantly improved project tracking efficiency and decision-making capabilities for clients. \item Tech stack: \begin{enumerate} \item \textbf{Languages \& Core}: {\color{accent2}Python} \item \textbf{AI \& ML}: {\color{accent2}TensorFlow, Keras, PyTorch, PyTorch Lightning, Pandas, Scikit-Learn, SciPy, NumPy, Matplotlib} \item \textbf{Computer Vision}: {\color{accent2}Open3D, OpenCV, Scikit-Image} \item \textbf{MLOps \& Infrastructure}: {\color{accent2}AWS, Docker} \item \textbf{APIs \& Services}: {\color{accent2}Flask, Django} \end{enumerate} \end{itemize} \divider \cvevent{Generative Art Teacher}{Responsive Cities}{Nov 2022}{Porto Alegre, Brazil} \begin{itemize} \item Taught a course on generative art history \& principles and on useful tools and libraries for creative coding \end{itemize} \divider \cvevent{Senior Data Scientist}{Condati}{Nov 2021 -- Dec 2023}{Menlo Park, California (Remote)} \begin{itemize} \item As Senior Data Scientist, I led the redesign and optimization of ML solutions for digital marketing campaign bid strategies, achieving significant ROI improvements and meeting client KPIs through: \begin{itemize} \item Implementation of advanced forecasting models and automated bidding systems \item Development of robust monitoring and validation frameworks \item Design of novel optimization algorithms for real-time bid adjustments \item End-to-end MLOps pipeline implementation for model deployment and monitoring \end{itemize} \item Successfully diagnosed and resolved critical performance issues, bringing model effectiveness back to target levels within one year. \item Tech stack: \begin{enumerate} \item \textbf{Languages \& Core}: {\color{accent2}Python, Julia} \item \textbf{AI \& ML}: {\color{accent2}PyTorch, TensorFlow.jl, Torch.jl, MLJ.jl, Flux.jl, Pandas, SciPy, NumPy} \item \textbf{Data Engineering}: {\color{accent2}MySQL} \item \textbf{MLOps \& Infrastructure}: {\color{accent2}AWS Sagemaker} \end{enumerate} \end{itemize} \divider \cvevent{Senior AI Researcher \& Project Lead - ML for Health}{Samsung Research Brazil}{Mar 2020 -- Nov 2021}{Campinas, Brazil (Remote)} \begin{itemize} \item Led the development of ML-powered health monitoring solutions for Samsung wearables, resulting in global implementation in the Galaxy Watch line. \item Designed robust data collection protocols and ML architectures for physiological signal analysis and health metric estimation. \item Developed memory-optimized, real-time health monitoring algorithms for resource-constrained wearable devices, ensuring high accuracy and efficiency. \item Deployed production models on Samsung Tizen OS, using custom Python-to-C transpilers and ONNX for efficient inference. \item Presented project outcomes directly to Samsung HQ, leading to worldwide adoption and impact. \item Led and mentored cross-functional teams of researchers and engineers, driving innovation in wearable health technology while meeting strict performance and resource constraints. \item Tech stack: \begin{enumerate} \item \textbf{Languages \& Core}: {\color{accent2}Python, Julia, C/C++} \item \textbf{AI \& ML}: {\color{accent2}TensorFlow, Keras, PyTorch, PyTorch Lightning, MLJ.jl, Flux.jl, Pandas, Scikit-Learn, SciPy, NumPy, Matplotlib} \item \textbf{MLOps \& Infrastructure}: {\color{accent2}AWS Sagemaker} \item \textbf{Deployment}: {\color{accent2}C, ONNX, Custom Python-to-C transpilers, Samsung Tizen OS} \end{enumerate} \end{itemize} \divider \cvevent{Mid-level Data Scientist}{Poatek IT Consulting}{Jun 2019 -- Mar 2020}{Porto Alegre, Brazil} \begin{itemize} \item As a Data Scientist, I led multiple high-impact projects across different domains, delivering innovative solutions through: \begin{itemize} \item Development of exact and heuristic algorithms for complex vehicle routing optimization \item Implementation of computer vision and NLP pipelines for automated document processing and data extraction \item Design of advanced NLP solutions for Named Entity Recognition and sentiment analysis \item Creation of sophisticated credit risk modeling systems \item Development of geospatial data analysis and visualization frameworks \end{itemize} \item Successfully integrated various ML/DL technologies including CNNs, ensemble methods, and pre-trained language models to enhance solution performance. \item Tech stack: \begin{enumerate} \item \textbf{Languages \& Core}: {\color{accent2}Python, Julia, C/C++} \item \textbf{AI \& ML}: {\color{accent2}TensorFlow, Keras, PyTorch, PyTorch Lightning, Pandas, GeoPandas, Scikit-Learn, SciPy, NumPy, Matplotlib} \item \textbf{Computer Vision}: {\color{accent2}OpenCV, Scikit-Image} \item \textbf{MLOps \& Infrastructure}: {\color{accent2}Docker} \item \textbf{APIs \& Services}: {\color{accent2}Flask, Django} \item \textbf{Optimization}: {\color{accent2}JuMP, Google OR-Tools} \end{enumerate} \end{itemize} \divider}