Paulo Silva, Developer in Belo Horizonte - State of Minas Gerais, Brazil
Paulo is available for hire
Hire Paulo

Paulo Silva

Verified Expert  in Engineering

Statistics Developer

Belo Horizonte - State of Minas Gerais, Brazil
Toptal Member Since
August 29, 2022

保罗是一名数据科学家,在多个业务领域拥有四年的经验. With Python as the main stack, he worked with numerous machine learning algorithms, data analysis, visualization, and hypothesis testing such as A/B, statistical analysis, and even data engineering work. 保罗有工程背景,解决问题对他来说很自然.


Oko Exchange Inc.
Quero Quitar
Databricks, Python,数据分析,数据科学,大数据,NumPy, Streamlit
CBD Industries, LLC
Data Science, Data Visualization, SQL, Databases, Qlik Sense...




Preferred Environment

Python, Google Cloud Platform (GCP), Amazon Web Services (AWS), Jupyter Notebook, PyCharm, Visual Studio Code (VS Code)

The most amazing...


Work Experience

Data Scientist

2023 - PRESENT
Oko Exchange Inc.
  • Used OpenAI's Large Language Models (LLMs) APIs (GPT-3.Turbo和GPT-4)将数据从非结构化文本解析为结构化格式.
  • 利用Azure文档智能(以前的Azure表单识别器)从文件中提取文本,并利用LangChain和矢量存储来利用llm处理大型文本文件.
  • 使用AWS Lambda提供模型服务,使用Amazon S3 (AWS S3)存储文件.
Technologies: Python,数据科学,机器学习,数据可视化,数据工程, Data Analysis, Azure ML Studio, Azure, Artificial Intelligence (AI), OpenAI GPT-4 API, OpenAI GPT-3 API, GPT, AWS Lambda, Amazon S3 (AWS S3), Natural Language Processing (NLP), OpenAI, ChatGPT, Large Language Models (LLMs), NumPy, Amazon SageMaker

Data Scientist

2023 - 2023
Quero Quitar
  • 创建预测模型,指导债务回收机构向谁求助.
  • Built models to direct the chance to contact the debtor, making the company's approaches more effective.
  • Migrated from Pandas to Databricks to process large heaps of data.
技术:Databricks, Python,数据分析,数据科学,大数据,NumPy, Streamlit

Data Engineer/Analyst for Qlik Sense

2022 - 2022
CBD Industries, LLC
  • 在Qlik Sense内部开发了ETL(提取-转换-加载)架构.
  • Integrated multiple third-party APIs into Qlik Sense.
  • Utilized AWS services to scale the solution for a big data context.
Technologies: Data Science, Data Visualization, SQL, Databases, Qlik Sense, Amazon Web Services (AWS), AWS Lambda, Amazon S3 (AWS S3), APIs, API Integration, Data Analysis, Data Lakes, Data Warehousing, Statistical Analysis, Cloud, Statistical Data Analysis, Mathematical Analysis, Mathematics, Apache Spark

Data Scientist

2021 - 2022
  • Developed a dynamic pricing algorithm for a hotel chain.
  • Performed ad-hoc data analysis to help drive the business forward.
  • Helped data analysts with their research to find inconsistencies, give feedback and provide overall technical support.
Technologies: Python, Google Cloud Platform (GCP), Machine Learning, GitHub, Data Analysis, Analytics, Data Science, Data, Relational Databases, BigQuery, Databases, Data Visualization, Software Development, Algorithms, Git, Jupyter Notebook, PyCharm, Statistics, Pandas, SQL, ETL, ETL Tools, Data Reporting, Data Analytics, Big Data, Linear Regression, Clustering, Dashboards, Predictive Modeling, Predictive Analytics, Amazon Web Services (AWS), TensorFlow, Python 3, Hospitality, Google BigQuery, Statistical Analysis, Cloud, XGBoost, Statistical Data Analysis, Mathematical Analysis, Mathematics, Statistical Methods, Tableau, NumPy

Data Scientist

2021 - 2021
  • 为企业开发了一种动态定价算法,以连接瓶装天然气的买家和卖家.
  • Helped with experiments to roll out new features in a data-driven way.
  • 与公司的分析部门合作,传播数据驱动的文化.
Technologies: Python, R, Docker, Machine Learning, Data Analysis, Git, Tableau, Back-end, APIs, API Integration, Analytics, Business Intelligence (BI), Data Science, Data, Database Design, Relational Databases, BigQuery, Databases, Data Visualization, Software Development, Algorithms, GitHub, Jupyter Notebook, Statistics, Pandas, SQL, Pytest, ETL, ETL Tools, Data Engineering, Data Reporting, Data Analytics, Data Mining, Web Scraping, Big Data, Linear Regression, Clustering, Dashboards, Predictive Modeling, Predictive Analytics, TensorFlow, Python 3, Data Pipelines, Postman, REST APIs, Data Integration, Kubernetes, Swagger, Google BigQuery, Data Warehousing, Statistical Analysis, Cloud, XGBoost, Statistical Data Analysis, Mathematical Analysis, Mathematics, Statistical Methods, Azure ML Studio, Apache Airflow, NumPy, Amazon SageMaker

Data Scientist

2020 - 2021
  • 帮助公司识别在旅行或与客户会面时滥用食品支出的员工.
  • 协助公司找出对客户收费不正确的领导, causing money loss.
  • 创建了一个模型,帮助运营部门了解他们是否有足够的电脑供新员工使用, based on past hiring behavior.
Technologies: Python, Google Cloud Platform (GCP), BigQuery, Google Data Studio, Machine Learning, Data Analysis, Git, Analytics, Business Intelligence (BI), Data Science, Data, Database Design, Relational Databases, Databases, Data Visualization, Software Development, Algorithms, GitHub, Jupyter Notebook, PyCharm, Statistics, Pandas, SQL, ETL, ETL Tools, Data Engineering, Data Reporting, Data Analytics, Web Scraping, Big Data, Linear Regression, Clustering, Dashboards, Predictive Modeling, Predictive Analytics, Python 3, Google BigQuery, Data Warehousing, Statistical Analysis, Cloud, XGBoost, Statistical Data Analysis, Mathematical Analysis, Mathematics, Statistical Methods, Tableau, Artificial Intelligence (AI), NumPy

Data Scientist

2019 - 2020
CRM Educacional
  • 开发了一个领先的评分模型,帮助私立大学获得更多的学生.
  • 创建了一个模型来识别学生放弃大学的风险,并提供了避免这种风险的必要步骤.
  • 改进了公司数据管道的工作,因为它是为小数据构建的, which became unfeasible.
Technologies: Python, Azure DevOps, SQL Server 2016, Azure, Machine Learning, Microsoft Power BI, Back-end, APIs, API Integration, Analytics, Business Intelligence (BI), Data Science, Data, Database Design, Relational Databases, Databases, Data Visualization, Software Development, Algorithms, GitHub, Git, Jupyter Notebook, Statistics, C#, Pandas, SQL, ETL, ETL Tools, Data Engineering, Data Reporting, Data Analytics, Data Mining, Web Scraping, Big Data, Linear Regression, Clustering, Azure Data Factory, Dashboards, Predictive Modeling, Predictive Analytics, Python 3, Data Pipelines, Postman, REST APIs, Data Integration, Swagger, Data Analysis, Data Lakes, Data Warehousing, Statistical Analysis, Cloud, XGBoost, Statistical Data Analysis, Mathematical Analysis, Mathematics, Statistical Methods, Tableau, Apache Spark, Azure ML Studio, NumPy, Amazon SageMaker

Data Scientist

2019 - 2019
  • 开发了一个模型,根据跟踪器数据和之前已知的用户行为来预测汽车是否被盗.
  • 使用Spark改进了公司的数据管道,因为之前的管道对于处理的数据量不再可行.
  • 分析数据以确定一些先前开发的模型是否如预期的那样工作.
Technologies: Python, MongoDB, Redis, Machine Learning, Data Analysis, Spark, Git, Back-end, APIs, Data Science, Data, Databases, Data Visualization, Software Development, Algorithms, GitHub, Jupyter Notebook, PyCharm, Statistics, Pandas, SQL, ETL, ETL Tools, Data Engineering, Data Reporting, Data Analytics, Data Mining, Web Scraping, Big Data, Linear Regression, Clustering, Dashboards, Predictive Modeling, Predictive Analytics, Amazon Web Services (AWS), Python 3, Data Pipelines, Postman, REST APIs, Data Integration, Data Warehousing, Statistical Analysis, Cloud, XGBoost, Statistical Data Analysis, Mathematical Analysis, Mathematics, Statistical Methods, Apache Spark, NumPy

Data Scientist

2019 - 2019
  • Created a model to predict a cow milk yield in a day.
  • 根据牛奶生产商的公开数据,帮助公司找到新的营销场所.
  • Developed an IoT device to monitor the milk quality in a tank.
Technologies: Python, Machine Learning, Data Analysis, MongoDB, MySQL, JavaScript, Back-end, APIs, API Integration, Data Science, Data, Database Design, Relational Databases, Databases, Data Visualization, Software Development, Algorithms, GitHub, Git, Jupyter Notebook, Statistics, C#, Android, React Native, PostgreSQL, Pandas, SQL, Pytest, ETL, ETL Tools, Data Engineering, Data Reporting, Data Analytics, Data Mining, Web Scraping, Linear Regression, Clustering, Dremio, Dashboards, Predictive Modeling, Predictive Analytics, Amazon Web Services (AWS), Python 3, Data Pipelines, Postman, REST APIs, Data Integration, Swagger, Data Lakes, Data Warehousing, Statistical Analysis, Cloud, XGBoost, Statistical Data Analysis, Mathematical Analysis, Mathematics, Statistical Methods, Apache Spark, Artificial Intelligence (AI), Web Development, NumPy

College Dropout Prediction

Privately owned colleges in Brazil have a significant problem. Since they are not the top colleges in the country, as the federal universities are, 加入他们的学生通常来自低收入家庭,往往经常辍学.

Students drop out mainly because they face financial hardships, live too far from the campus, can't manage to work and study simultaneously, or even struggle academically and think it's not worth the effort.

辍学对学校来说是一个大问题,因为学校将失去从这些学生身上获得的多年收入. 因此,从长远来看,大学提供短期激励措施来留住学生是有益的.

With that in mind, 我开发了一个机器学习模型来识别辍学的风险和原因. 最后,我提出了学院可以提供什么激励措施来留住学生的见解.

Car-theft Prediction Using Tracking Data

一些保险公司要求他们的客户允许在他们的车上安装跟踪设备,因为通过跟踪汽车的位置, it is easier to retrieve it. Typically, it takes a while for a theft to be reported, 有时候已经太晚了,因为小偷要么移除追踪器,要么转移到警察避免去的地方(通常是贫民窟),因为这对他们来说太危险了.

The project I worked on revolved around tracking the users' data, 使用一个机器学习模型建立用户的典型行为, 然后使用另一种机器学习模型预测汽车是否被盗. 目标是在用户报告这些事件之前预测这些事件,以加快取回汽车的过程.

For this project, I used Python as the programming language. For the data processing part, 我们在Databricks平台上使用了Apache Spark,因为它有很多数据, 单台机器上的处理对于需求来说太慢了(时间敏感)。. The historical data storage was on a MongoDB database, and the API we used to serve the model was Flask.

Dynamic Pricing to Sell Cooking Gas Bottles

In Brazil, there's a peculiar industry for selling cooking gas in cans. This industry has operated primarily analogically for a long time. The client would call the vendors closest to them and ask for a delivery, 或者,供应商会开着卡车在社区里转悠,提供服务.

However, once the gas runs out while a person is cooking, 他们想要一个新罐头尽快送到他们家,因为没有它可能会毁了他们的饭.

考虑到这一点,该公司的业务通过一款移动应用程序将供应商和客户联系起来. 问题是这些供应商不习惯激烈的竞争,对我们很不满意.

To calm the situation, 我们开发了一种动态定价算法,使用机器学习将价格维持在供应商的可持续水平,同时对客户也有利.

For this project, I used Python for the programming part, Flask to serve my model, and Docker to containerize the model with the API.


Python, SQL, Python 3, C, R, JavaScript, C#


Pandas, REST APIs, XGBoost, NumPy, TensorFlow


BigQuery, Tableau, GitHub, PyCharm, Git, Postman, Amazon SageMaker, Microsoft Power BI, Pytest, Qlik Sense, Azure ML Studio, Apache Airflow


数据科学、ETL、数据库设计、Azure DevOps、商业智能(BI)


Jupyter Notebook, Google Cloud Platform (GCP), Visual Studio Code (VS Code), Amazon Web Services (AWS), Docker, Azure, Android, Kubernetes, AWS Lambda, Databricks


Data Pipelines, Databases, SQL Server 2016, MySQL, Redis, Relational Databases, Data Integration, MongoDB, PostgreSQL, Amazon S3 (AWS S3), Data Lakes


Machine Learning, Data Analysis, Data Visualization, Software Development, Statistics, Algorithms, API Integration, Analytics, Data, ETL Tools, Data Reporting, Data Analytics, Big Data, Linear Regression, Clustering, Dashboards, Predictive Modeling, Predictive Analytics, Statistical Analysis, Statistical Data Analysis, Mathematical Analysis, Mathematics, Statistical Methods, Artificial Intelligence (AI), OpenAI, ChatGPT, Back-end, APIs, Data Engineering, Data Mining, Signal Processing, Hospitality, Google BigQuery, Data Warehousing, Cloud, Web Development, Large Language Models (LLMs), Industrial IT, Google Data Studio, Natural Language Processing (NLP), Web Scraping, Azure Data Factory, Dremio, GPT, Generative Pre-trained Transformers (GPT), OpenAI GPT-4 API, OpenAI GPT-3 API


Flask, Apache Spark, Streamlit, Spark, React Native, Swagger

2012 - 2017

Bachelor's Degree in Control and Automation Engineering


2015 - 2016

Master's Degree in Control Engineering

Lund University - Lund, Skane, Sweden


Natural Language Processing Nanodegree


Collaboration That Works

How to Work with Toptal



Share your needs


Choose your talent


Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring