Hallucinations, Ethics and Limitations of ChatGPT

Why This Chapter Matters

ChatGPT is extraordinarily capable, but it is also capable of being wrong in ways that look completely right. It writes confidently. It cites sources that do not exist. It produces legal or medical advice that sounds authoritative but may be dangerous. It can reflect the biases baked into its training data. And if you paste the wrong information into it, you may be putting sensitive data at risk.

None of this means ChatGPT is not useful — this entire tutorial series proves that it is. But using it without understanding its limitations is like driving a car without understanding that the brakes can fail on a wet road. This chapter is the safety briefing.

What Are Hallucinations?

In AI terminology, a hallucination is when a language model generates text that is factually incorrect but presented with the same fluent confidence as correct text. The model does not know it is wrong — it has no internal fact-checking mechanism.

Why Do Hallucinations Happen?

To understand this, remember what ChatGPT actually is: a next-token prediction engine trained on an enormous amount of text. It learned patterns of language — which words follow which, how arguments are structured, what a citation looks like — but it did not learn a ground truth database of facts.

When you ask "What is the population of Surat?", the model does not look up an authoritative source. It generates the most statistically plausible answer based on patterns in its training data. If the training data contained inconsistent or outdated population figures, the model averages across them — and can produce a number that is wrong.

The problem is worsened by the fact that the model is trained to sound helpful and confident. Saying "I'm not sure, please verify" is statistically less common in training data than confidently stating an answer. So the model defaults to confidence even when it should hedge.

Types of Hallucinations

Type	Example
Fabricated citations	"According to a 2023 study by IIM Ahmedabad published in the Journal of Finance..." — the study does not exist
Wrong statistics	Citing an incorrect GDP figure or population number
False historical facts	Getting a date, name, or sequence of events wrong
Non-existent people	Inventing a professor or politician and attributing quotes to them
Made-up laws or regulations	Citing a section of the Companies Act 2013 that does not exist
Plausible-but-wrong code	Code that looks correct and even runs, but produces the wrong output for edge cases

A Real-World Example

In 2023, two American lawyers submitted a legal brief that cited case law generated by ChatGPT. The cases did not exist. The court sanctioned both lawyers. This is the highest-stakes version of a hallucination — but smaller versions happen every day to people who do not verify.

The Knowledge Cutoff

ChatGPT's training data has a cutoff date. The model knows nothing about events after that date unless you tell it or it uses the web search tool.

As of GPT-4o in 2026: The knowledge cutoff is approximately early 2024. Events, prices, laws, regulations, and company structures that changed after that date are not known to the model.

What this means in practice:

Question: "What is the current interest rate on SBI home loans?"
ChatGPT answer: May quote a rate that was accurate in 2023 but has since changed.

Question: "Who is the current CEO of Infosys?"
ChatGPT answer: May be outdated if there has been a leadership change.

Question: "Is GST applicable on software subscriptions?"
ChatGPT answer: May not reflect the most recent CBIC clarifications.

Rule of thumb: For anything that changes over time — prices, interest rates, tax rules, government policies, sports results, company information — always verify with a current authoritative source.

Confidently Wrong Outputs

The most dangerous category of error is not the obvious mistake — it is the confident, detailed, plausible-sounding wrong answer.

If you ask ChatGPT "What is 2 + 2?" and it says "5", you will catch that immediately. But if you ask "What are the TDS deduction rates for Section 194C for FY 2025-26?" and it gives you a detailed, formatted table with slightly wrong rates — you may not catch it without specifically cross-referencing the Income Tax Act or the Income Tax department's website.

High-Risk Domains for Confident Wrong Outputs

Legal: Contract clauses, court procedures, rights under Indian law, SEBI regulations
Medical: Drug interactions, dosages, diagnostic criteria
Financial: Tax slabs, deduction limits, mutual fund SEBI rules, RBI guidelines
Technical: Specific API versions, library functions that have been deprecated, security configurations
Academic: Research findings, author attributions, publication dates

In these domains, always treat ChatGPT as a starting point for research, never as the final word.

Bias in Training Data

ChatGPT was trained on text from the internet, books, and other sources. That corpus reflects the biases of the people who wrote it — cultural, political, gender-based, socioeconomic, and geographic.

What This Looks Like in Practice

Geographic bias: Training data is skewed heavily toward English-language Western content. When ChatGPT answers questions about Indian culture, history, or business, it sometimes applies Western frameworks or gets details wrong that an Indian subject matter expert would not.

Gender bias: Historically, occupations like "engineer" or "doctor" produced more male-associated pronouns in early AI systems. While OpenAI has invested in mitigating this, subtle associations still exist.

Recency bias in prominence: Well-documented, widely-discussed events are better represented than local or under-reported events. A protest in a tier-2 Indian city that was extensively covered in English-language national media may be known; one covered only in regional language press may not be.

Confirmation of majority views: The model tends to reflect the statistical majority of opinions in its training data, which can make minority viewpoints — even when important or correct — less prominent in its outputs.

What This Means for You

Do not take ChatGPT's framing of cultural or political topics as neutral
For India-specific regulatory, historical, or cultural questions, cross-reference Indian authoritative sources
When the stakes are high, get a human expert involved

Privacy Risks: What NOT to Paste into ChatGPT

Every message you send to ChatGPT over the web interface is transmitted to OpenAI's servers. By default, OpenAI may use these conversations to improve future models (you can opt out in Settings).

Never paste into ChatGPT:

- Full names + Aadhaar numbers + PAN numbers of any person
- Customer personal data (names, phone numbers, addresses, purchase history)
- Employee salary details or HR records with identifying information
- Proprietary business data under NDA (client names, contract values, internal strategies)
- Source code from a codebase your employer owns, if confidentiality is required
- Unpublished financial results, earnings data before announcement
- Medical records (yours or a patient's)
- Login credentials, API keys, passwords

The Samsung Leak Case

In 2023, Samsung engineers accidentally leaked confidential chip design information by pasting proprietary source code into ChatGPT to debug it. Samsung subsequently banned ChatGPT use internally. This is a well-documented case of what can happen when employees use AI tools without a clear privacy policy.

Safer Alternatives

Anonymise before pasting: Replace "Ravi Sharma, Aadhaar 1234 5678 9012" with "Customer A, ID XXXX"
Use local models: Tools like Ollama let you run an AI model on your own laptop, so data never leaves your device
Enterprise subscriptions: ChatGPT Enterprise and Microsoft Copilot for Microsoft 365 both offer data agreements where your inputs are not used to train models

Plagiarism and Copyright Concerns

When You Use ChatGPT's Output as Your Own

If you submit ChatGPT-written text as your own original work — in a client deliverable, a published article, or academic assignment — without disclosure, this raises plagiarism and dishonesty concerns. The output is not plagiarised from a specific human author (usually), but it is also not your own original intellectual work.

Many institutions and employers are developing clear policies on AI disclosure. Follow those policies explicitly.

Copyright of ChatGPT's Output

In most jurisdictions including India, copyright requires a human author. Text generated entirely by an AI model currently does not qualify for copyright protection. This means:

You cannot copyright purely AI-generated content
However, if you substantially edit and add to AI output, the original additions may be yours
This area of law is evolving rapidly — keep up with developments from the Copyright Office of India

What ChatGPT Might Reproduce

While ChatGPT is designed not to reproduce copyrighted text verbatim, it can sometimes output passages that are close to training data. For published creative or journalistic work, run outputs through a plagiarism checker before submitting.

Academic Integrity

If you are a student, this section is critical.

Most Indian universities and coaching institutes are either prohibiting AI-generated submissions or requiring disclosure. The exact rules vary by institution, but the ethical principle is universal: submitting AI-generated work as your own without disclosure is academic dishonesty, equivalent to plagiarism.

What is generally acceptable (varies by institution):

- Using ChatGPT to understand a concept you are struggling with
- Asking it to explain a topic in simpler terms
- Using it to check grammar on an essay you wrote yourself
- Using it to brainstorm ideas, then writing the response yourself

What is generally not acceptable:

- Pasting the assignment question → copying the output → submitting as your answer
- Having ChatGPT write your thesis chapter
- Generating lab reports or case studies with AI without disclosure
- Using it in closed-book exams (even if technically possible on a phone)

AI Detection Tools

Institutions increasingly use AI detection tools such as Turnitin's AI detection feature. These are imperfect — they produce false positives and can be fooled by paraphrasing — but their use is growing. The more important reason to be honest is integrity, not just detection risk.

How to Verify ChatGPT's Answers

The verification principle: Use ChatGPT to find the answer; use authoritative sources to confirm it.

For Indian Legal and Regulatory Questions

Topic	Authoritative Source
Income Tax	`incometax.gov.in`
GST	`gst.gov.in`
SEBI regulations	`sebi.gov.in`
RBI policy, repo rate	`rbi.org.in`
Company law (MCA)	`mca.gov.in`
Labour law	`labour.gov.in`
Consumer protection	`consumeraffairs.nic.in`

For Financial Data

BSE/NSE for stock prices: bseindia.com, nseindia.com
Mutual fund NAVs: mfamc.in (AMFI official)
Gold prices: mcxindia.com or goodreturns.in
Loan rates: the bank's own website directly

For Medical and Health Information

Always consult a qualified doctor. Online sources like www.nhp.gov.in (National Health Portal of India) are more reliable than AI summaries for health topics.

The Citation Test

When ChatGPT cites a study, paper, or news article, search for it yourself. If you cannot find the exact citation on Google Scholar or the publisher's website within 2 minutes, assume it may be fabricated.

Verification prompt:

"Verify the claim you just made. What specific sources would I check
 to confirm this? Provide the exact URL or publication name."

This forces the model to either give you checkable sources or acknowledge uncertainty.

Common Pitfalls

1. Treating ChatGPT as a search engine Search engines index real web pages. ChatGPT generates plausible text. These are fundamentally different — one retrieves, the other predicts. Treat every factual claim as unverified until you check it.

2. Using it for medical or legal decisions "Can I take ibuprofen with my current prescription?" or "Is my employer's notice period clause legal?" require a qualified professional. ChatGPT can explain concepts, but the stakes of a wrong answer are too high to rely on an AI.

3. Not opting out of data training By default, OpenAI may use your conversations for training. Go to Settings → Data Controls → toggle off "Improve the model for everyone" if you handle any sensitive topics.

4. Assuming that a long, detailed answer is a correct answer Length and detail increase perceived credibility. A 500-word confident hallucination is more dangerous than a 10-word wrong answer — it is harder to notice.

5. Copy-pasting outputs without reading The most basic version of the hallucination problem. Even if you intend to verify later, the act of reading before submitting catches most obvious errors.

6. Ignoring the knowledge cutoff for time-sensitive queries Setting a reminder: whenever your question has the word "current", "latest", "now", "today", or a specific recent year, verify with a live source rather than trusting the training data.

Practice Exercises

Ask ChatGPT to cite three academic studies supporting a claim of your choice (any topic). Then search for each citation on Google Scholar. Report what you find — do all three exist?
Ask ChatGPT "What is the current TDS rate under Section 194C for individual contractors?" Then verify the answer on incometax.gov.in. Note any discrepancy.
Look at your ChatGPT data settings (Settings → Data Controls). Is "Improve the model for everyone" currently on or off? Change it to your preferred setting and note what changed.
Find an example of a public case or news story where someone was misled by AI-generated information (a quick web search for "AI hallucination news" will surface several). Write a 150-word summary of what happened and what could have prevented it.
Write a short paragraph on a topic you know well (your field of study or work). Then ask ChatGPT to write a paragraph on the same topic. Compare the two for accuracy. Identify at least one thing ChatGPT got subtly wrong or oversimplified.

Summary

Hallucinations occur because ChatGPT predicts plausible text, not verified facts — it has no internal fact-checking mechanism and will state incorrect information with full confidence
The knowledge cutoff means anything that changed after early 2024 is unknown to the model — always verify current prices, rates, laws, and leadership information
Confidently wrong outputs are the most dangerous category — detailed, structured, fluent hallucinations are harder to spot than simple errors
Training data biases are real: geographic, gender, and cultural biases affect ChatGPT's outputs, particularly for under-represented topics and non-Western contexts
Never paste personal identifying information, proprietary data, confidential code, or financial data that is not yours to share into the public ChatGPT interface
Plagiarism and copyright: AI-generated text is generally not copyrightable, and submitting it as your own without disclosure violates academic integrity norms
Verification is non-negotiable for high-stakes domains — use Indian government websites, official regulators, and peer-reviewed sources to confirm any factual claim ChatGPT makes