How I built an AI reporting stack by just… describing it

Originally published in a different form on LinkedIn. Lightly edited for clarity and formatting.

Like many organizations, at Kentico, we needed to answer some very basic, but crucial questions:

Who’s using our AI tools?
Who has access to what?
How is our usage?
Are people actively using our tools?

There are tools available on the market that can answer these and many more questions, but procurement takes time, and we needed something sooner. Instead of launching a week-long internal project, or manually going through each app’s dashboard, getting the data and doing the analysis myself, I decided to try building an app that does just that using an AI agent.

The end state (so you know where this is going)

About 95% of the code was AI‑generated: a Node.js app with a live dashboard, a cron job firing every 6 hours to pull fresh data, integrations into GitHub and Cursor analytics, enrichment of our internal org lookup data before it flows into the reports, a bunch of small CLI commands the agent (or I) can trigger on demand, and a structure that makes adding new metrics basically just “describe the next one” and plug it in.

Is this something I’d feel comfortable putting live in production and shipping? Not without some extra polish, that’s for sure. On the other hand, it’s a great internal tool that solves a real problem without requiring a large investment. I have published a functional version of it on my GitHub.

Why not just script it by hand?

I know exactly what needs to be done, I have done multiple integrations with APIs and data analysis services. I could code it myself but using an agent to code to my strict specifications while I’m working on other things is much faster and it allows me to focus on what we’re trying to do, not how we are doing it.

Someone without much technical expertise can also follow the same steps and reach a similar state. Also keep in mind, as this is an internal tool meant to solve one very specific problem for a specific timeframe, code quality and maintainability is far less of a concern.

My development phases

1. Creating the plan

Normally I would start with pen and paper to write down my requirements, roughly design the solution in my head, and logically tackle the solution. In this case, I started by speaking into a microphone, describing my requirements to my VS Code agent prompt. It was basically:

I need a reporting system across Copilot + Cursor. I want it to fetch data, run analysis, and create reports. Suggest a file structure and technology stack

After a few turns, we settled on this file structure:

ai-metrics-report/
├── data/ # for storing all source data
├── documentation/ # for project documentation
├── logs/ # to store debug and operation logs
├── output/ # to store the generated data
├── src/ # the source of the application

Then I followed up with some housekeeping tasks: make it a git repository, add a .gitignore file, create a .env file to store my API keys, and added a README.

2. Adding my first data source (GitHub)

I decided to start with GitHub Copilot. Normally I would just Google it, but since now I have an agent I went with this prompt: “I need to fetch copilot usage data from GitHub, please review their API documentation and suggest how I can achieve this”. The agent replied with a link to the API documentation, along with a summary of what it found. From there, I continued and with the agent’s help, set up my API key, authorized it, and made sure our Organization supports metrics.

Once everything was in place, it was time to “write” some code:

Create a service that fetches the data from the GitHub API (https://docs.github.com/en/rest/copilot/copilot-user-management?apiVersion=2022-11-28). The JSON returned should be stored under data/github/copilot-seat-assignments_{orgname}_{timestamp}.json. Use the .env file for the GH_TOKEN and GH_ORG values.

After a brief wait, the agent had created the script, and it prompted me to run it. I ran the script, it encountered an error, the agent amended it and then I re-run it. In the end, it successfully grabbed the data from the GitHub API and stored it on my machine. Afterwards, I repeated the same process for the metrics endpoint which tells us more about how people use Copilot.

3. Adding my second data source (Cursor)

Once you have a concept already in your application, it’s easier to duplicate it. Sure, the Cursor API requires a new token, has different routes and responses, but there are common elements both you and the agent can reference to add more data sources.

Because of that, I looked into creating instruction files for the agent. I’m using Cursor so I followed this reference (https://cursor.com/docs/context/rules), but all AI agents have a way to do this. Instructions and how to create them is a whole other topic, but think of them as instructions an agent must follow every time you ask it to do something, without you having to explicitly say it every time.

In my case, the rule file looked something like this:

### Environment
We are on Windows.
### Data Handling
- Always use the most recent data files when generating reports
- Check for data freshness before analysis (warn if data > 7 days old)
- Preserve raw data files - never modify original fetched data
### Output Management
- Always timestamp generated reports with ISO format
- Save reports to output/reports/ directory
### API Best Practices
- Use environment variables for all API keys (GH_TOKEN, CURSOR_API_KEY)
- Implement rate limiting for GitHub API calls
- Cache API responses to avoid redundant calls
### Reporting Standards
- Always include summary statistics at the top of reports
- Use consistent date ranges (default: last 7 days)
- Include data source timestamps in all reports

Once you have them, you can also include the different scripts an agent can run to achieve a task. For example:

## Key Commands
- **One-shot report**: `import { runOneShotReport } from './scripts/one-shot-report.js'; await runOneShotReport(true);`
- **Data fetching**: `node bin/ai-metrics-report.js cursor fetch-all`

Now with all the housekeeping done, I was ready to tackle adding another data source. Cursor APIs are slightly different. The way you access the data, what is available and what is returned is different. Normally I would have to understand the data, create new handlers, and so on, but in this case I asked the agent to:

Using the API reference for Cursor https://cursor.com/docs/account/teams/admin-api add new services that get the metrics and user data.

The agent created the relevant services, tested if the integration is working, and in the end the retrieved JSON was stored under data/ as expected.

One thing I didn’t foresee is how granular or different the data can be, so at this stage I asked the agent to:

Update all services so any data grabbed from GitHub gets stored under data/github, and any data from Cursor is stored under data/cursor. The JSON should be stored in a subdirectory according to the API route that fetched it, for example data/cursor/usage-events/

Now I had all my data, stored in sensible locations, in a sensible format so I committed my changes to my repository – you could also ask the agent to do it, but I used it as an opportunity to review it.

4. Adding more data and enriching it

One thing I realized at this stage was that I have the data from our services, but not the data from our organization. Depending on what tools you are using this step can change, but for me it meant I had to go into our internal systems, grab the data I needed which was a list of R&D staff with their job titles and emails.

I created a new directory under data/ simply called Kentico and I added a .txt file with all the users. I then asked the agent to:

Use the rnd-people.txt file to generate an rnd-lookup-table.csv under data/kentico. The headings should be Name, Work Email, GitHub Username, Title, Has Cursor, Has Copilot.

Now I had my CSV, but it wasn’t enough. I cannot go into details because it has to do with protected data, but I used the agent to help me match people’s names with their GitHub usernames and update the lookup table with access data based on what we get from the APIs.

This was a one-time operation, so I focused less on reusability, and I focused on what I needed the agent to do for me. Some of it was done by one-time scripts, some of it the agent could do by itself. In the end I had a file containing all the data I would need to cross-reference for reporting purposes.

5. Reporting

Now that I had all the required data, I could ask the agent for specific one-time analysis. “How many users do we have on copilot?”, “Who hasn’t been active in Cursor” and so on. It works reasonably well and generally the agent will create a temporary script to get this data, but I needed something reusable. My next step was to create a “Wishlist” of reports I would like to have: weekly active users per tool and combined, lines of code generated vs accepted across the whole toolset, usage in Cursor – overall, median, average and top 10%, list of users and the tools they have access to, and finally these results filtered so they only count software engineers.

For each of these reports I prompted the agent:

Create a script that generates a report under output/reports containing {the report content}, I should be able to run it via a command.

Once that was all done, I asked the agent to document the various commands so either I or another agent can easily find and use them.

I wasn’t 100% happy with the report content, so I pointed the agent at each report, told it what I didn’t like and requested a list of changes until I was satisfied. This also allowed me to sanity check the numbers reported and at times I would ask the agent to “prove to me that the number of active users is X”, or similar.

6. Dashboard

I’m thinking that at some point this might get hosted internally rather than exclusively running on my machine, so I wanted to quickly add a dashboard that exposes the report and some of the functionality.

This step was the simplest because I had no strict expectations for the front end. So, I prompted the agent to

"Use react to render a dashboard containing a quick overview of the statistics and also links to all the reports so a user could view them without downloading them."

The agent thought for a bit longer (I was using GPT5), and generated a few files related to the dashboard, including relevant libraries, and assisted me in running the application.

As with the previous steps, I reviewed the result and asked for tweaks and fixes.

7. Scheduling

At this stage I realized that I don’t want to ask the agent to run the commands for me repeatedly, so I needed something that runs on a schedule. The simplest way I saw was to introduce a CRON mechanism – essentially, every X hours run this script – and instead of multiple commands, it should just run the whole process at once.

Both were quite easy to achieve, I asked the agent to introduce a “one-shot” script that fetches new data and generates new reports for everything, and then a CRON to run that one shot report every six hours.

8. Fine-tuning and the future

As I now have something that effectively answers the important questions, a valuable database, and an application that can be extended, I’m able to leverage the solution for other tasks, extend the functionality as needed, and even polish it further.

I have added tests to check data validity, a configuration file to account for nicknames, “public” endpoints that could be used with a more robust dashboard like Grafana, improved the documentation, and it’s now my go-to source to analyze AI tooling-related data.

Key Takeaways

Throughout the project, I experimented with a few ways to prompt, but the most boring and specific ones worked the best. I also started focusing more on outcomes rather than dictating how it is built. Here’s some prompt patterns that worked well:

Architecture:

I need a Node.js project that fetches from GitHub Copilot + Cursor APIs, stores dated JSON + CSV, enriches with a lookup table, produces Markdown + metrics endpoints. Suggest folders + core modules only.

Enhancement:

Enhance user-lookup merge so manual edits in user-lookup.csv are never overwritten even if API returns different casing or name.

Error handling:

Add cron scheduling (every 6h) to run full cycle. On failure of one fetch, continue the rest, collect errors, surface in dashboard status area.

Refactoring:

Refactor the commands so they all use the same base services to fetch, store and process the data found under src/common

Update all scripts and services so names are normalized when used. Keep them lowercase & strip any special characters.

Add a summary to the header of this report that shows the weekly active users by department. Use the data in rnd-lookup.csv to determine job titles.

Things I explicitly asked for that saved time later

A dated directory structure – keeps things organized well for human consumption, creates a historical database that can be used later.
Environment variables configuration – NEVER hardcode secrets in code, even if it’s not an external tool.
Structured operation logs with timestamps – I can always ask the agent to analyze them if anything goes wrong.
Layer separation – By logically splitting the app in layers – API client / data analysis / dashboard / command orchestration – I’m able to use the agent to change things without having to take time to update, and potentially break, everything.
CSV + Markdown outputs – both are very easily consumed by both an agent and a human with maximum tool compatibility.
Tests to validate the data – With large data sets it’s easy to get lost and let errors slip. Understanding how the data is tested is much easier and it’s an additional confidence signal.

Common changes I requested

Verbosity – Agents love adding comments on trivial code and writing very lengthy documentation files. I often had to ask the agent to summarize or tighten the writing.
Language levels – It’s a simple reporting tool; I don’t need a university professor of linguistics to answer it. I added a rule to generate simple English targeting an ESL (English as a second language) audience.
Duplication – Especially when using thinking models, they tended to create duplicate functionality, or add things I didn’t explicitly asked for, even when focusing on the outcomes only. I often had to ask the agent to merge, or extend files and remove unnecessary additions.

Reality check: where AI helped vs where it didn’t

Helped:

Speed up the implementation
Keep naming consistent across the application
Handle boring boilerplate tasks
Generate a first draft of commands
Save me time reading documentation

It was a letdown for:

Catching data validation / quality issues
Avoiding over-engineering a simple solution
Defaulting to a very-verbose language style

Lessons

Describe outcomes, not libraries (unless there’s a need). It will pick workable defaults.
Ask for production concerns in the first prompt (logging, retries, config). Retrofits cost you time.
Work in small increments. One feature per prompt. Avoid “and also” stacking.
Review often. Delete fluff aggressively.
Use a new agent chat window often, especially when switching to a different context.
At the end of each task, ask the agent to write a summary of the prompts, actions, changes into a file you can reference later if needed.
You can describe a problem to the agent and ask it to break it down for you into logical steps. Then you can refine that and use it to progress.
The agent can assist with more than coding. For example, if you are unsure how to create a token in Github, ask the agent to guide you through the process step-by-step.

Closing Thoughts

This approach fits amazingly well for internal tools, automating administrative tasks, small services with a clear purpose, and it enables less technical people to achieve real results in a matter of hours. It’s not something I would recommend for critical services, performance-intensive operations, or domains with strict rules.

No-code solutions, and AI fully replacing a developer is pure marketing. This was a spec-driven code generation. It is an excellent way to try building with the assistance of an AI agent and getting valuable experience with the tooling. That way you can then apply that experience to other aspects of your work.

Next time you catch yourself spinning up yet another scratch repo for a metrics sidecar, try describing it first. Worst case: you throw away a generated draft. Best case: you ship in a day instead of a fortnight.